Features


bsed: A Stream Editor for Binary Files

Steven G. Isaacson

Here's a quick (and dirty) way to patch binary files with a simple software tool.


There are many ways to substitute one string of text for another in a text file, particularly in a Unix development environment. But what if we wanted to do the same thing in a binary file? Recently we had to do just that. We had a dozen programs, compiled on 18 different platforms (SCO, HP, Sun, etc.), and we needed to change the banner page text.

One solution would have been to correct the source code and then recompile all programs on all platforms. But this was not practical, and in some cases no longer possible. We could possibly have used a binary editor, since binary editors handle binary files with ease. The problem in this case was the likely introduction of human error. Each of the dozen programs had slightly different requirements. Some programs needed changes to only one section of code, while others required several changes in several locations (for example, error and help text, in addition to the banner page). Using a binary editor meant loading the file into the editor, searching for the target string and its variants, replacing the string, then searching for the next occurrence, and so on.

All of those operations performed a dozen times, times 18 platforms — there had to be a better way. If we could somehow automate the task with an awk- or sed-like program that did the job the same way every time, then we could change 18 or 1,800 programs and know it was done without error. To that end I wrote bsed, a simple stream editor that works on binary as well as ASCII files.

How It Works

bsed (Listing 1) employs a simple algorithm:

First, bsed reads stdin (in BUFSIZE chunks) to populate the buffer. It then scans the buffer, using the memchr library function, to see if the first character of the search string is in the buffer. If the first character is not in the buffer, the complete search string cannot be there either. So bsed writes the buffer's contents to stdout and flushes the buffer.

This situation represents the simplest case. As long as no matches are found, bsed just repeatedly fills the buffer from stdin, writes it to stdout, and flushes the buffer.

The work begins when a possible match shows up in the buffer. Suppose bsed reads in 100 characters and a possible match shows up at position 50. The first order of business is to write out those first 50 non-matching characters to stdout. (Those characters are no longer of any interest.) When that's done, bsed shifts the remaining 50 characters to the left, leaving the buffer half-empty. It then reads from stdin again to fill up the buffer.

Once the buffer is full again (or stdin has been exhausted), bsed turns its attention back to the first character of the potentially matching substring. This character was originally found at location 50, was shifted 50 places left, and now resides at the start of the buffer. Now bsed must determine if the substring starting with this character matches the search string. If the substring matches, bsed writes out the replacement string to stdout, throws away the matching substring in the buffer, and shifts the contents of the buffer all the way to the left again, so it can continue looking. If the search string does not match what's in the buffer, bsed continues to loop through the buffer, looking for additional first-character matches with the search string.

Eventually bsed comes to the end of stdin, and then it's finished.

Using bsed

bsed works like a traditional Unix utility. It reads its input from stdin, and writes its output to stdout. The string to be replaced, and its replacement, are specified on the command-line.

Here are some examples of how bsed can be used:

bsed "advanced solutions" "Advanced
    Solutions" \< oldprog > newprog

cat oldprog | bsed Change_This To_This >
    newprog

echo "Now is the tmie" | bsed tmie time

Of course the editing of binary files is not to be taken lightly. It's okay to replace a string of text in a binary file if you replace it with the same number of characters, and if that text serves no other purpose than as text for display — a banner page, for example. But if you start mucking around with other parts of the code, or if you replace one string of text with a string of a different length, then your executable program will quickly turn to mush.

Other Uses

bsed could be made faster and more general-purpose (for example, allowing more than one search-replace pattern), but usually that means linking in a regular expression package (such as regcmp(3C)), which increases the code size, complexity, and porting requirements. In its present form, bsed is simple enough that it can easily be extended. For example, we currently use a variant of bsed in CGI programs to convert hexadecimal strings to their ASCII equivalents.

bsed may also be useful in other stream-based applications, wherever a simple, easy-to-understand stream editor is called for.

Steven G. Isaacson currently works with the porting group at Endura Software Corporation (http://www.enduracorp.com). He may be reached via email at steven.isaacson@enduracorp.com.