Portage
Bill Longman
One of the biggest hurdles to overcome with any
operating system is the coordination of software dependencies over the
lifetime of a given system. Several package management devices have been
developed to help extricate administrators from the ever-increasing
interdependencies that invariably develop as a system ages. One well-known
tool is called Portage. It provides a simple user interface, is readily
extended, and can be deployed on several platforms.
Portage from 35,000 Feet
Portage, in itself, is merely a set of Python and bash
scripts to manage software packages. It provides a mechanism for defining a
system's overall configuration, a framework for managing the
dependencies of packages, and a robust means to administer software
packages. Its database consists of the current set of "ebuilds"
that fall into many categories, from app-admin and app-arch, through
dev-java, media-gfx, net-dns, sci-astronomy, and sys-power all the way to
x11-wm. In fact, I've found that looking for software usually begins
by browsing the Portage tree since some of the most useful software has
already made its way there.
Setup is done through a simple configuration file. If
you are familiar with setting environment variables, you can configure
Portage. The part that might throw you is how to define your USE flags (see
"Waive the USE Flags" sidebar).
Although the lion's share of ebuilds are
gcc-based source code, Portage can manage binary components just as easily.
For instance, some 3-D graphics card drivers are distributed as binary-only
packages. Another example is one of the virtual machine applications
distributed in binary form. Also, some of the Java code requires a separate
download.
The Velocity of Money
I was surprised to learn, one day at the annual
company meeting many years ago, that money has a "velocity". If
you can stock a warehouse for as short a time as possible, other things
being equal, your profits rise. This very oversimplified view of enterprise
resource planning has a parallel in the software world. Software tends to
change, no? The struggle we face is how adequately to manage this
"velocity of change". One can and must accept a certain
software configuration upon delivery of the OS and its attendant
applications. The part that is up for debate is the rate of software change that is
acceptable over the lifetime of that system. That depends on cost, security
requirements, technical expertise, time, application availability, etc.
It is the management of this "velocity of
change" at which Portage excels. Linux, being the relatively new OS
that it is, is at present going through rapid change. Just as a weather
system is able to cross an entire continent with inexorable progress, so a
worldwide consortium of developers provides enormous resources to advance
all manner of software. With Portage, you can determine your acceptable
rate of change, as well as which components your system will use.
With Portage driving the configuration, the current
state of Portage defines the running "version" of your system.
There really is no numbered version of the OS on my laptop or my server.
The system merely reflects the current state of the Portage
programmers' efforts in wrapping applications in an ebuild. This is
as liberating a concept as virtual machines. For example, if my software
development system needs to run for 2 years, I can define a Portage state
that updates all packages weekly or limit updates to "system"
packages only on a quarterly basis. If an ERP system's lifetime is
expected to be long, the "system" and "world"
packages can be used to keep the underlying OS moving along at a
predictable pace as well as providing acceptable updates to application
programs, respectively.
"emerge -av newbie"
The user interface to Portage is provided by several
command-line utilities. The "emerge" script is the main one,
though. With it, you can query the database for software titles and
descriptions. After that, you typically test how an application would be
installed and then, well, install it.
Learning to Crawl
To find out which packages are available in Portage,
you would use the search function. Let's look for one of my favorite
astronomy packages, XEphem:
$ emerge -s xephem
Searching...
[ Results for search key : xephem ]
[ Applications found : 1 ]
* sci-astronomy/xephem
Latest version available: 3.6.4
Latest version installed: 3.6.4
Size of files: 9,787 kB
Homepage: http://www.clearskyinstitute.com/xephem
Description: XEphem is the X Windows Ephemeris, and
provides a scientific-grade solar system
model, star charts, sky views, plus a whole lot more.
License: as-is
These results show several important facts. First, the
package xephem was found in the sci-astronomy category. Second, the latest
stable version in Portage is 3.6.4, and the currently installed version on
this system is also 3.6.4. Third, it shows the URL to the package home page
as well as a short description of the package. Two other niceties are the
size of the package's files and also its licensing terms.
So, that's all well and good. I've been
able to find a suitable astronomy package for the new year's Messier
Marathon. Now I need to install it. But the single biggest point to take
away when learning about Portage is that you should look before you leap.
In terms of emerge, you should always use the -pv or -av flags. (See also "Emerging Flags" sidebar.)
Here's what happens when I pretend to install
XEphem on my existing server:
$ emerge -pv xephem
These are the packages that would be merged, in order:
Calculating dependencies... done!
[ebuild R ] sci-astronomy/xephem-3.6.4 0 kB
Total size of downloads: 0 kB
The ebuild R means that this is a re-installation. If it had been a
new install, it would look something like this:
$ emerge -pv orsa
These are the packages that would be merged, in order:
Calculating dependencies... done!
[ebuild N ] sci-libs/gsl-1.4 2,159 kB
[ebuild N ] sci-astronomy/orsa-0.6.1 \
USE="opengl qt3 -cln -fftw -ginac -gsl -mpi" 745 kB
Total size of downloads: 2,905 kB
This instance shows me that the gsl library will be
installed before orsa gets installed. It also shows me the USE flags that
are active for the set of packages. (This information is the biggest reason
you want to look before you leap.) Here I can see that orsa uses OpenGl and
Qt3, but it will not use several other options. If these options were not
what I wanted, I could simply stop here and reconfigure the packages so
that they would be built according to my requirements. It looks like XEphem
is ready to install, and I don't really need orsa this time around.
So, here's how I'd install XEphem:
# emerge -av xephem
These are the packages that would be merged, in order:
Calculating dependencies... done!
[ebuild R ] sci-astronomy/xephem-3.6.4 0 kB
Total size of downloads: 0 kB
Would you like to merge these packages? [Yes/No] y
<lots of compile output snipped>
USE flags are the Portage interface to all those
options you can use when you configure a source code package. (If you have ever used ./configure --help=short, you'll know what I mean.) If your package has options for different
libraries, there's typically a parallel USE flag for those options in
that application's ebuild. (See Figure 1.)
Here's another example. I need to tune my
guitar, and k3guitune is a nifty little gizmo that will help me do that.
Here's what my default installation of k3guitune will do if I emerge
it on my server:
$ emerge -pv k3guitune
These are the packages that would be merged, in order:
Calculating dependencies... done!
[ebuild N ] media-sound/k3guitune-0.4.1 \
USE="alsa arts -debug -oss -xinerama" 304 kB
Total size of downloads: 304 kB
Wait a minute! I don't want it to use aRts
because I'm hardly ever in KDE while I'm strumming the
six-string. Here's what the uninitiated would do to fix this:
$ USE="-arts" emerge -pv k3guitune
These are the packages that would be merged, in order:
Calculating dependencies... done!
[ebuild N ] media-sound/k3guitune-0.4.1 \
USE="alsa -arts -debug -oss -xinerama" 304 kB
Total size of downloads: 304 kB
However, the preferred way to provide specific
settings to individual applications is through the package.use file in
/etc/portage. In this file, you simply define the category and application
in the first column and its appropriate USE flags after that.
$ cat /etc/portage/package.use
media-sound/k3guitune -arts
sys-libs/glibc userlocales
Trees
The Portage tree provides a structure of applications
and contains the ebuilds for the individual sources (see Figure 2). In
/usr/portage, we find the overall category directories. Within those we see
the directories for each application. The sci-astronomy category directory
contains this:
celestia maestro-data orsa pyephem stellarium xephem
maestro metadata.xml predict setiathome wcstools
All but metadata.xml are directories that hold the
specific ebuilds for each application. (The metadata.xml file holds a
description of the category.)
You can also see that the specific application
directory contains several ebuilds for the application and some management
files (see Figure 1). Depending upon the history of a particular
application, there can be several ebuilds for different versions of a given
package. Often, these are different only in their stability on a given
platform such as x86, ppc, or mips, as examples.
That's pretty much the way Portage takes care
of describing an application -- an ebuild enumerates how to compile the package and which USE flags it employs, and that
ebuild lives in a directory in the Portage tree. Of course, the contents of
the ebuild file are rather extensive in many applications and quite simple
in others. Many packages with simple build options are short ebuilds, and
those with greater dependencies are more involved.
Everyday People
In normal practice, Portage boils down to keeping the
Portage tree up to date and applying the latest updates. As we discussed
above, it's then a matter of cost that determines the frequency of these steps. In my environment, I have a laptop, a
development server, and a production server. I also have a server at home
that I use for my file storage and general purpose computing. My
development server is a high-frequency update machine, while my server at
home is the lowest of the bunch. Here's a typical day in the life of
managing the Portage environment among these:
1. Production server sync's Portage: emerge --sync, daily
2. Development server sync's to production: emerge --sync, daily
3. Development server update: emerge -avuD system/world, daily or weekly
4. Laptop sync and update, weekly
5. Home server sync and update, monthly
Changing the World
There are several concepts one should understand about
Portage. One of them is that it keeps track of all the packages that
you've installed. This information is tracked in the
"world" file, and it's such an important concept that
there's even a meta-package named "world". Portage
scrutinizes this collective list whenever you need to update your system.
Additionally, there is an auxiliary inferred meta-package named
"system" that is also extremely useful when updating the base
components of your machine. Let's see how these two ideas work
together to make it easy to manage the software configuration on a typical
server machine:
axon# emerge -avuDN --nospinner system
These are the packages that would be merged, in order:
Calculating system dependencies ... done!
[ebuild U ] sys-devel/binutils-2.16.1-r3 [2.16.1-r2] \
USE="-multislot -multitarget -nls -test -vanilla" 0 kB
[ebuild U ] sys-devel/gcc-config-1.3.13-r3 [1.3.13-r2] 0 kB
[ebuild U ] sys-devel/autoconf-wrapper-3.2 [3-r1] 0 kB
[ebuild U ] sys-devel/automake-1.9.6-r2 [1.9.6-r1] 0 kB
[ebuild U ] sys-apps/sysvinit-2.86-r5 [2.86-r3] \
USE="-bootstrap -build -static" 0 kB
[ebuild U ] sys-apps/baselayout-1.11.15-r3 [1.11.14-r8] \
USE="unicode -bootstrap -build -static" 0 kB
[ebuild NS ] sys-kernel/gentoo-sources-2.6.16-r11 \
USE="-build -doc -symlink" 0 kB
[ebuild U ] sys-apps/file-4.17-r1 [4.13] USE="python \
-build" 0 kB
[ebuild U ] x11-terms/xterm-215 [212-r3] USE="truetype \
unicode -Xaw3d -toolbar" 0 kB
[ebuild U ] sys-process/psmisc-22.2 [22.1] USE="X -ipv6 \
-nls" 0 kB
[ebuild U ] sys-apps/gawk-3.1.5-r1 [3.1.5] USE="-build \
-nls" 0 kB
Total size of downloads: 0 kB
Here we see the "internals side" of the
configuration on this server. These packages in "system" are a
subset of the "world" packages, and they differ from the
"world" packages by the fact that these packages were often,
but not always, implied by the installation of the packages in
"world".
In contrast, we see user applications built when we
"emerge world". Again, since "system" is a subset
of "world", the set of packages built when updating
"world" would include the "system" packages. In
everyday use, it's practical to emerge "system" before
emerging "world" so that the libraries are updated and their
configuration files can be adjusted before the applications.
In my experience, it's been fruitful to update
"system" more frequently than "world" because
libraries tend to be more stable than applications. Additionally, the
configuration of libraries relies more upon the lower level components of a
machine and so do not change with the rapidity that is found for a
re-factored UI application. With application packages, developers will opt
for new configuration files to enable new features so in those cases, it
takes more time to understand and merge the original configuration file
with an updated version thereof.
The Portage tree has another (large) component --
source code. In /usr/portage/distfiles live the hundreds of tar, zip, and
tgz files we all know and love. Unlike the ebuilds, the source code is not
updated when you sync Portage. Only when an application is emerged does the
source code get downloaded from the mirrors and saved into
/usr/portage/distfiles. And doesn't this make sense? Why would I want
the source code for Fortran development tools if I could only write in
Pascal? (See Figure 3.)
Saving the World
Saving the world's bandwidth is always a good
thing. Most companies can easily set up their own Portage cache and
synchronize their local machine with their own cache. There are two
concepts to understand in this respect. One is the Portage tree itself, and
the second is the method for accessing this tree. Emerge --sync merely
sychronizes (via rsync) the database of available packages. It's not
the packages themselves, just the index thereof. The second means of saving
bandwidth is to proxy the requests for the actual source once the packages
are emerged.
As mentioned above, the distfiles directory is a good
candidate for "sharing the wealth" since this is a sizable pool
of code after even a short amount of time. If you run even a small shop,
consider the http-replicator package to proxy source code distribution at
your site. As if that weren't simple enough, it is even easier to set
up a local Portage tree at your site by deploying an rsync server. Then
your group gets local network speed access to the Portage tree without undo
strain on the little wire that connects your site to the 'net.
Profiling
Up to this point, I've looked at specific
applications in the context of Portage. Here, I'd like to describe
how the overall system and any similarly configured systems can be combined
into a "profile". This is truly the crux of Portage's
potential. With different profiles, you can define for yourself different
sets of Portage behavior depending upon your needs.
An easy way to provide standard setups resides in the
system profile. This provides the default Portage settings for the system.
With a specific profile, you can define a set of USE variables and FEATURES
specific for that environment. (See "Waive the USE Flags"
sidebar.)
Profiles aren't related to system or world for
the most part. A profile merely describes the minimum packages for the
system, and so is useful for rollouts of different system types. What you
install on the machine itself defines world and, therefore, system by
implication.
Here, for instance, are the USE flags from the
standard x86 profile, default-linux/x86/2006.0/make.defaults:
USE="alsa apache2 apm arts avi cups eds emboss encode esd \
foomaticdb gdbm gif gnome gpm gstreamer gtk gtk2 imlib jpeg \
kde libg++ libwww mad mikmod motif mp3 mpeg nptl ogg opengl \
oss pdflib png qt qt3 qt4 quicktime sdl spell truetype udev \
vorbis X xml xmms xv"
And here are the package requirements for this profile:
>=sys-apps/baselayout-1.11.12-r4
>=sys-devel/binutils-2.15.90.0.3-r4
>=sys-devel/gcc-3.3.4-r1
>=sys-libs/glibc-2.3.3.20040420-r1
Portage -- Now Featuring FEATURES
The process of managing software dependencies and
compiling source code has lots of variables within its inner workings. The
various options one can apply to so large a system are easily changed with
the FEATURES variable. As with the other Portage variables, like USE and
MAKEOPTS, the FEATURES variable is merely an environment variable that can
be set to a default in the /etc/make.conf file (see sidebar "Making a
make.conf File") and overridden at the command line. One of the top
favorite FEATURES for anyone with multiple machines at their disposal is
"distcc". This option enables the distcc compiler to spread out
compilation across hosts on your network. A simple way to provide compiled
code to the slow machines on your network is to use the
"getbinpkg" FEATURE on the slow
machines and use the "buildpkg" FEATURE on a central build
machine. If your architectures are conducive to this, it's an
effective means of saving CPU time on some clients.
Transparent Overlays
One of the most useful aspects of Portage is its
extensibility. The Portage designers understood that local variations
needed to live within the existing framework and so provided a method of
integrating third-party packages into Portage while allowing those new
packages to be managed under the umbrella of all those great tools provided
by Portage. The key to this subsystem is the idea of
"overlays". Quite simply, overlays are "mini"
Portage trees that house the build frameworks for your local packages.
Sticky Wickets
Sometimes, your packages may misbehave because a
library just has to be hard-coded. Or perhaps a dependency isn't a direct one but
it's a direct or indirect reverse dependency. A critical tool,
revdep-rebuild, is a savior of the Portage sys admin. It walks the
dependency tree of your installed applications, uncovering those that need
to be rebuilt. There's also a great tool, module-rebuild, that
automates rebuilding packages that are dependent upon the current kernel.
Roundup
Portage grew out of the need to balance more
efficiently the need to keep code up to date, the asynchronous nature of
code development across interdependent packages, and the requirement to
bring these two disparate functions together over the lifespan of a given
system. Additionally, Portage allows the systems administrator to tune
these settings, extend them, bring new applications into the system, and
manage it all through a similar interface.
Links
Gentoo Handbook -- http://www.gentoo.org/doc/en/handbook/handbook-x86.xml
Portage Tips -- http://gentoo-wiki.com/Index:TIP#Portage
Official Overlays -- http://overlays.gentoo.org/
Other Overlays -- http://gentoo-wiki.com/Portage_Overlay_Listing
Portaris -- http://www.portaris.org/wiki/Main_Page
Bill Longman lives in the southwestern part of the
northwestern-most state in the Pacific Northwest. There, he provides
network and computing resources to Sharp Laboratories of America. He loves
his God, his family, his dogs, and his computers -- in that order.
Send your love to him at: longman@sharplabs.com.
|