Issue 15, Fall 1999
Version Control with makepatch
Johan Vromans
"When you have two copies of a piece of information, at
least one of them is wrong."
This theorem is often used in information technology to
emphasize that you should avoid copying information, because when
you do, you have to spend effort keeping all the copies up to
date.
However, copying often cannot be avoided; programs, data
files, documents, and web pages are frequently copied all over
the world. This article describes a technique to help keeping
copies of sets of documents consistent and up to date. Although
the technique is most widely used in software development, it is
applicable to virtually any type of data.
diff and patch
When people collaborate on a program, how can they keep all
the programs consistent, so that the changes from different
people don't conflict? One solution is to ship the latest version
to everyone after every change, but that's not feasible if the
program is large. Another solution is to publish changes as
individual update files in the format generated by the Unix
diff program (also available for Win32; see Listing 1: How diff and
patch Work & diff and patch for
Win32 and the Mac). With diff and patch, you
can apply someone else's changes to your own program (or
document, or data file, or web page).
The patch program, which automates the process of
integrating the update files to a set of source files, was
written by none other than Larry Wall, author of Perl.
patch solved only part of the problem, however.
Synchronizing two versions of a source tree (or web site)
requires more than just changing individual source files.
Sometimes new files need to be created, and obsolete files need
to be removed. diff and patch don't do these
things well.
In the rest of this article, I'll assume that we're talking
about a program with multiple files of source code, although the
techniques apply to any collection of files.
The Problem
To properly update a source tree, we need to
worry about a few things:
- Verify that the update file was not damaged during
transport over the Internet.
- Apply the changes generated by the diff program to
each source file. This is what the patch program
does.
- Create any new files required. patch can do this,
but some versions can only create files in existing
directories.
- Create any new directories required. patch can do
this, but some versions can only create directories if new
files are being created there simultaneously.
- Remove obsolete files. Some versions of patch can
handle this.
- Remove obsolete directories.
- Adjust access (read, write, and execute permissions) of
files and directories.
- Adjust the file dates (time stamps) of the modified files.
Some versions of patch can handle this under certain
circumstances.
The makepatch Package
The makepatch package
performs all of the tasks that diff and patch don't. It
contains two Perl programs: makepatch and
applypatch. makepatch builds a patch kit that
can be applied reliably; applypatch integrates the patch
kit on the receiving end.
This article describes version 2.00a of the makepatch
package.
Generating The Patch Kit
makepatch generates a
patch kit from two source trees: the original, and the new tree.
Here's how it does that:
- It traverses the tree directories and runs the
diff program on each pair of corresponding files,
accumulating the output into a patch kit.
- It knows about certain conventions for patch kits. For
example, it knows that the list of files is usually specified
in a file called MANIFEST. If a file named
patchlevel.h exists, it is handled first so
patch can verify the version of the source tree.
- To deal with the imperfect versions of patch out
there, it supplies Index: and Prereq: lines
so that patch can unambiguously locate the files to
patch and verify them if possible.
- Last but not least, it relocates the patch to the
current directory to avoid problems when patch needs to create
new files.
The generated patch kit is valid input for the patch
program, making use of patch's feature of ignoring
everything it does not understand.
As a special service, makepatch prepends a small
shell script to the patch kit that, when fed to a standard Bourne
shell, creates the necessary directories and files and removes
obsolete ones. Of course, this requires that the receiving
platform supports both the shell and Unix filename conventions,
so the shell script is pretty much useful only for Unix. These
limitations can be overcome by using the applypatch
utility instead.
Applying the patch kit
applypatch takes care of
everything that patch doesn't:
- applypatch verifies that the patch kit is complete
and has not been corrupted during transfer.
- It applies some heuristics to verify that the directory in
which the patch is going to be applied really does
contain a source tree.
- It creates new directories and files as necessary.
- It applies the patch by running the patch program
for you.
- Upon completion, obsolete files, directories, and
patch backup files (.orig files) are removed.
The access modes of new files are set, and the timestamps of
all the modified files are adjusted.
To allow applypatch to do its job, makepatch
appends additional information (like checksums) to the patch
kit.
applypatch only requires Perl and patch; no
other operating system support is necessary. This makes it
possible to apply patches on any operating systems supporting
these two programs.
General Usage
Suppose you have an archive
pkg-1.6.tar.gz, containing the sources for the
pkg package version 1.6. You also have a directory tree
pkg-1.7 containing the sources for version 1.7. The
following command generates a patch kit that updates the 1.6
sources into their 1.7 versions:
makepatch pkg-1.6.tar.gz pkg-1.7 > pkg-1.6-1.7.patch
By default, makepatch provides a few lines of
progress information:
Extracting pkg-1.6.tar.gz to /tmp/mp21575.d/old ...
Manifest MANIFEST for pkg-1.6 contains 1083 files.
Manifest MANIFEST for pkg-1.7 contains 1292 files.
Processing the filelists ...
Collecting patches ...
266 files need to be patched.
216 files and 8 directories need to be created.
7 files need to be removed.
To apply the generated patch kit, go to the directory
containing the 1.6 sources and feed the kit to applypatch:
cd old/pkg-1.6
applypatch pkg-1.6-1.7.patch
applypatch verifies that it is executing in the right
place and makes all neccessary updates. The program provides no
feedback information by default.
Over the last couple of years, makepatch has been
used extensively by several developers and teams all over the
Internet, including the Perl 5.6 development team. The program
has evolved from a simple wrapper around the diff
program to a tool that provides a lot of interesting features for
everyone involved in maintaining source documents. I'll mention
just a few of these.
Fetching Source Files From Archives
The set of sources
makepatch operates on need not be explicitly present on
disk. makepatch can process files that are archived in
any of several popular archive formats (.tar,
.tar.gz, .tgz, .tar.bz2 and
.zip). Other archive formats can be easily added without
changing the program.
Selecting The Source Files
The list of files constituting
the source tree can be specified in a MANIFEST file, but
it can also be generated on the fly by recursively traversing the
source tree. File names can be excluded using shell style
wildcards and Perl regular expression patterns. There are
predefined patterns to exclude the version control files
generated the revision control systems, and they can be activated
with a single command line option.
A Word About Manifest Files
A manifest file lists the files comprising a package. Manifest
files are traditionally called MANIFEST and reside in
the top level directory of the package. Although there is no
formal standard for the contents of manifest files,
makepatch uses the following rules:
- If the second line of the manifest file looks like a
separator line (for example, it's empty, or contains only
dashes), it is discarded and so is the first line.
- Empty lines and lines that start with a # are ignored.
- If there are multiple space-separated "words" on a line,
the first word is considered to be the filename.
makepatch Options
makepatch accepts lots of
options. Full detail is available in the documentation provided
with the package, but here are brief descriptions:
- -description text provides descriptive text for
the patch.
- -diff command uses command to generate
the differences between the two versions of the files.
- -patchlevel pfile specifies an alternate file to
be used in lieu of patchlevel.h.
- -automanifest mfile specifies an alternate
manifest file.
- -nomanifest says not to use a manifest file.
- -manifest mfile indicates the name of the current
manifest file.
- -oldmanifest omfile indicates the name of the
manifest file for the old source tree. It's meant to be used in
conjunction with -newmanifest.
- -newmanifest nmfile indicates the name of the
manifest file for the new source tree.
- -[no]recurse prevents recursion beyond the initial
directories.
- -[no]follow traverses symbolic links to
directories as if they were real directories.
- -infocmd command adds the output of
command before each patch chunk.
- -exclude pattern excludes files that match the
given shell pattern.
- -exclude-regex pattern excludes files that match
the given pattern.
- -[no]exclude-vc excludes files and directories
that belong to the CVS, RCS, and SCCS revision control
systems.
- -extract pattern=command defines additional
extraction rules for archives.
- -[file]list instructs makepatch to read a
manifest file, and outputs the list of files included in the
manifest.
- -prefix string prefixes every entry in the
manifest file with string.
- -nosort retains the order of filenames from the
manifest file.
- -[no]ident reports the program name and
version.
- -[no]verbose displays information about
makepatch activity to STDERR.
- -[no]quiet is the opposite of
-verbose.
- -[no]help displays a short help message and
exits.
These options needn't be specified on the command line.
makepatch looks for options in the following order:
- The environment variable MAKEPATCHINIT. When this
environment variable is set, its contents are considered to be
command line options that are processed upon startup. All
normal options are allowed, plus one: -rcfile
filename.
- On startup, makepatch first tries to process a
file named /etc/makepatchrc, if it exists.
- Next, makepatch processes a file named
.makepatchrc in the user's home directory, if it
exists.
- After processing this file, makepatch processes a
.makepatchrc in the current directory, if it exists.
An alternative name for this file can be specified with the
-rcfile option in the MAKEPATCHINIT
environment variable.
In all option files, empty lines and lines starting with ; or
# are ignored. All other lines are considered to contain options
exactly as if they had been supplied on the command line.
- Finally, makepatch looks for options on the
command line.
For an extensive list of the possible options, see the
makepatch documentation.
Current status and future directions
The current version
of the makepatch package is 2.00a found at
authors/id/JV/makepatch-2.00a.tar.gz on the CPAN. It
requires Perl 5, and a suitable version of the diff and
patch programs.
The next version of applypatch will apply its own
patches, eliminating the need for the patch program.
Also, a future version of makepatch might be able to
generate the patch information, eliminating the need for the
diff program on the source platform. This will be
especially interesting for users on platforms like Windows, where
these programs are not available by default.
__END__
Johan Vromans (jvromans@squirrel.nl) has been
engaged in software engineering since 1975. He has been a Perl
user since version 2 and participated actively in its
development. Besides writing makepatch, he also
wrote Getopt::Long, the Perl5 Pocket Reference, and
co-authored The Webmaster's Handbook. He offers Perl
consulting and courses with the Squirrel Consultancy (http://www.squirrel.nl).