Unison:
A File Synchronization Tool
Mihalis Tsoukalos
Unison is an open source file synchronization tool for both text
and binary files. Although Unison also has a GUI, this article covers
only the command-line version. Unison is convenient when you are
working with more than two computers and you want your files synchronized.
It can be securely used through the SSH service, but it can also
be used through rsh (which is not recommended for security reasons)
and works equally well on both Unix (Linux, Solaris, etc.) and Windows
(98, 2000, XP) systems.
Unison has been inspired by the rsync utility. Unison differs
from rsync in that rsync is a mirroring tool, whereas Unison is
a synchronization tool that identifies the files that have been
changed since the last synchronization process and decides how the
changes are going to be propagated.
Installing Unison
Most Linux distributions have Unison as a package ready for installation
so that you do not have to compile it. On my Apple Power Mac G5,
which is running Tiger (a.k.a. version 10.4), I had to compile it.
However, the compilation was straightforward. The current stable
version of Unison is 2.10.2, but this article uses Unison version
2.9.1. Every machine that is part of the synchronization process
must have a copy of the command-line version of Unison installed.
Additionally, this copy of Unison must be in the default path. I
put mine in /usr/bin instead of changing the default PATH shell
variable.
How to Set Up SSH
The single most timesaving step is to set up SSH so that you will
not need to enter your password every time you synchronize your
data. The procedure is easy and involves the following two steps:
1. In your local server, run ssh-keygen -t dsa -f ~/.ssh/id_dsa
-C "username@remote_machine". You will have to enter a passphrase
twice (please do remember the passphrase!). Two files are going
to be created: ~/.ssh/id_dsa and ~/.ssh/id_dsa.pub.
2. Copy the contents of file ~/.ssh/id_dsa.pub from your local
server inside file ~/.ssh/authorized_keys in the remote server.
When you try to connect to the remote machine, you will get something
like the following output:
dialup25:~ mtsouk$ ssh mtsouk@pluto.REMOTE.DOMAIN.gr
Enter passphrase for key '/Users/mtsouk/.ssh/id_dsa':
Last login: Sat Jul 16 11:00:05 2005 from dialup25.chi.sch.gr
[mtsouk@pluto mtsouk]$
You can see that it is not so timesaving to enter the passphrase from
step 1 instead of the real password. Using the following two steps,
you can get away with this:
1. Run eval 'ssh-agent' (for bash shell)
2. Run ssh-add ~/.ssh/id_dsa and you will be asked to type
the passphrase for the last time for this particular bash shell.
A Basic Unison Setup File
Unison can run from the command line without using configuration
files, but having a configuration file available greatly simplifies
its use. In this article, I will not deal much with the command-line
options of Unison. In the unusual case that you have trouble working
with Unison, you may run it using the -debug all command-line
option so that you can better trace and resolve errors. The following
is a simple configuration file of Unison, named SysAdmin.prf that
is located inside .unison directory, which is the directory in which
Unison does its housekeeping:
big:~ mtsouk$ cat .unison/SysAdmin.prf
# Saturday 25 June 2005
root = /Users/mtsouk
root = ssh://racoon//Users/mtsouk
# Paths to synchronize
path = docs/DSMS
path = docs/article
path = docs/SysAdmin
path = docs/PIK
path = Desktop/Eugenia
path = Sites/MacLand
Lines starting with a # denote a comment and are being ignored.
Lines starting with root = declare the machines that are going
to participate in the synchronization process as well as the directories
that are considered root directories for each machine. After those
important declarations, the directories to be synchronized are listed.
In this particular example, we have six directories. The full
path of the first one is /Users/mtsouk/docs/ DSMS for the local
machine, the machine whose declaration does not begin with ssh://,
and /Users/mtsouk/docs/DSMS for the machine called racoon. Each
remote machine starts with root = ssh://. The command that
must be used to run Unison using the SysAdmin.prf configuration
file is "unison SysAdmin", provided that SysAdmin.prf is located
inside the .unison directory.
Unison Examples
Unison may be slow the first time you run it, especially if you
have many files to synchronize. This happens only once, so subsequent
synchronizations will be much faster.
The following configuration file is used as a simple, complete
example of Unison:
big:~ mtsouk$ cat .unison/PLUTO.prf
# Saturday 25 June 2005
root = /Users/mtsouk
root = ssh://pluto....gr//home/mtsouk
# Paths to synchronize
path = Sites/PHP
# Log file
logfile = /Users/mtsouk/.unison/unison.log
# Backup files
backup = Name *
big:~ mtsouk$
If you have never run the command unison PLUTO before, you
are going to see an output that is similar to that of Figure 1. Note
that the directory /home/mtsouk/Sites must already exist at the remote
server or the synchronization will fail.
Figure 2 shows another example of running Unison using a configuration
file called DSMS.prf. In this example, it is dictated that Unison
should:
Use /Users/mtsouk/.unison/unison.log as its log file.
Take backup copies of all the files.
During the synchronization process, ignore files with names
ending in .DS_Store:
big:~ mtsouk$ cat .unison/DSMS.prf
# Thursday 14 August 2003
root = /Users/mtsouk
root = ssh://racoon//Users/mtsouk
# Paths to synchronize
path = docs/DSMS
path = docs/article
path = Desktop/docs.var
path = Sites/PHP
# Log file
logfile = /Users/mtsouk/.unison/unison.log
# Backup files
backup = Name *
ignore = Name *.DS_Store
ignore = Name .DS_Store
big:~ mtsouk$
The keyword backupversions in a configuration file tells
how many preceding versions of each file will be stored. If the
backupversions keyword is not defined, a default value
of 2 is attached to the keyword, which means that the last plus
one versions of the file are kept inside the ~/.unison/backup
directory. Please note that if you are synchronizing big or huge
files, a backupversions option with a value of 4 means
that each file, including its backup copies, may exist five times
and occupy five times its space.
For a comprehensive tutorial on Unison, type unison -doc
topics at the command line of your terminal.
There are rare occasions -- usually due to user error -- when
Unison will not be able to determine whether a file or directory
has changed on the local or the remote server. In such situations,
Unison asks for our help so that it will not mistakenly proceed
using the wrong file or directory. Figure 3 shows this situation
as well as another error situation where a file or directory
has changed during the synchronization process. Unison outputs
the following error message:
The file /Users/mtsouk/docs/article/unison.SysAdmin/article.txt\n
has been modified during synchronization: transfer aborted
and does not update that particular file in order to avoid further
faults. Figure 4 shows the contents of the .unison directory in
my local machine.
Unison can also utilize external programs to perform merging
on conflicting versions of a file. The keyword merge
defines how the merging process is going to happen. Please use
this option only if you know exactly what you are doing.
Unison Development
The Unison project was led by Benjamin C. Pierce at the University
of Pennsylvania. Unison began as a research project but it is
no longer one. Benjamin C. Pierce is now leading the Harmony
Project, which is also related to file synchronization. Nevertheless,
Harmony is still in its early stages.
The people interested in Unison maintain the following three
mailing lists:
1. unison-announce -- New Unison release announcements.
2. unison-users -- General discussion of Unison.
3. unison-hackers -- Informal discussion for developers and
experts.
Conclusions
This article described some of the uses of Unison. There are
many more things to do with Unison: run it as a cron job at
nights, synchronize Web servers, keep backups of configuration
files (note that Unison cannot replace backup procedures), etc.
For non-critical data files, you may run Unison once a day,
but for critical data you should run it more often.
Acknowledgments
I thank Nikos Platis for letting me use his machine for the
purposes of this article.
References
Unison home page: http://www.cis.upenn.edu/~bcpierce/unison/
Unison manual: http://www.cis.upenn.edu/~bcpierce/ \ unison/download/stable/latest/unison-manual.html
Harmony Project: http://www.cis.upenn.edu/~bcpierce/ \
harmony/index.html
OpenSSH key management: http://www-128.ibm.com/ \ developerworks/library/l-keyc.html
Mihalis Tsoukalos lives in Greece with his wife, Eugenia,
and works as a high school teacher. He holds a B.Sc. in Mathematics
and a M.Sc. in IT from University College London. Before teaching,
he worked as a Unix systems administrator and an Oracle DBA.
Mihalis can be reached at: tsoukalos@sch.gr. |