Article	Figure 1	Figure 2	Figure 3	Sidebar 1
Sidebar 2	Sidebar 3	dec2006.tar

Portage

Bill Longman

One of the biggest hurdles to overcome with any operating system is the coordination of software dependencies over the lifetime of a given system. Several package management devices have been developed to help extricate administrators from the ever-increasing interdependencies that invariably develop as a system ages. One well-known tool is called Portage. It provides a simple user interface, is readily extended, and can be deployed on several platforms.

Portage from 35,000 Feet

Portage, in itself, is merely a set of Python and bash scripts to manage software packages. It provides a mechanism for defining a system's overall configuration, a framework for managing the dependencies of packages, and a robust means to administer software packages. Its database consists of the current set of "ebuilds" that fall into many categories, from app-admin and app-arch, through dev-java, media-gfx, net-dns, sci-astronomy, and sys-power all the way to x11-wm. In fact, I've found that looking for software usually begins by browsing the Portage tree since some of the most useful software has already made its way there.

Setup is done through a simple configuration file. If you are familiar with setting environment variables, you can configure Portage. The part that might throw you is how to define your USE flags (see "Waive the USE Flags" sidebar).

Although the lion's share of ebuilds are gcc-based source code, Portage can manage binary components just as easily. For instance, some 3-D graphics card drivers are distributed as binary-only packages. Another example is one of the virtual machine applications distributed in binary form. Also, some of the Java code requires a separate download.

The Velocity of Money

I was surprised to learn, one day at the annual company meeting many years ago, that money has a "velocity". If you can stock a warehouse for as short a time as possible, other things being equal, your profits rise. This very oversimplified view of enterprise resource planning has a parallel in the software world. Software tends to change, no? The struggle we face is how adequately to manage this "velocity of change". One can and must accept a certain software configuration upon delivery of the OS and its attendant applications. The part that is up for debate is the rate of software change that is acceptable over the lifetime of that system. That depends on cost, security requirements, technical expertise, time, application availability, etc.

It is the management of this "velocity of change" at which Portage excels. Linux, being the relatively new OS that it is, is at present going through rapid change. Just as a weather system is able to cross an entire continent with inexorable progress, so a worldwide consortium of developers provides enormous resources to advance all manner of software. With Portage, you can determine your acceptable rate of change, as well as which components your system will use.

With Portage driving the configuration, the current state of Portage defines the running "version" of your system. There really is no numbered version of the OS on my laptop or my server. The system merely reflects the current state of the Portage programmers' efforts in wrapping applications in an ebuild. This is as liberating a concept as virtual machines. For example, if my software development system needs to run for 2 years, I can define a Portage state that updates all packages weekly or limit updates to "system" packages only on a quarterly basis. If an ERP system's lifetime is expected to be long, the "system" and "world" packages can be used to keep the underlying OS moving along at a predictable pace as well as providing acceptable updates to application programs, respectively.

"emerge -av newbie"

The user interface to Portage is provided by several command-line utilities. The "emerge" script is the main one, though. With it, you can query the database for software titles and descriptions. After that, you typically test how an application would be installed and then, well, install it.

Learning to Crawl

To find out which packages are available in Portage, you would use the search function. Let's look for one of my favorite astronomy packages, XEphem:

$ emerge -s xephem 
Searching...   
[ Results for search key : xephem ] 
[ Applications found : 1 ] 

*  sci-astronomy/xephem 
      Latest version available: 3.6.4 
      Latest version installed: 3.6.4 
      Size of files: 9,787 kB 
      Homepage:      http://www.clearskyinstitute.com/xephem 
      Description:   XEphem is the X Windows Ephemeris, and          
                     provides a scientific-grade solar system 
                     model, star charts, sky views, plus a whole lot more. 
      License:       as-is

These results show several important facts. First, the package xephem was found in the sci-astronomy category. Second, the latest stable version in Portage is 3.6.4, and the currently installed version on this system is also 3.6.4. Third, it shows the URL to the package home page as well as a short description of the package. Two other niceties are the size of the package's files and also its licensing terms.

So, that's all well and good. I've been able to find a suitable astronomy package for the new year's Messier Marathon. Now I need to install it. But the single biggest point to take away when learning about Portage is that you should look before you leap. In terms of emerge, you should always use the -pv or -av flags. (See also "Emerging Flags" sidebar.)

Here's what happens when I pretend to install XEphem on my existing server:

$ emerge -pv xephem

These are the packages that would be merged, in order:

Calculating dependencies... done! 
[ebuild   R   ] sci-astronomy/xephem-3.6.4  0 kB  

Total size of downloads: 0 kB

The ebuild R means that this is a re-installation. If it had been a new install, it would look something like this:

$ emerge -pv orsa 

These are the packages that would be merged, in order: 
 
Calculating dependencies... done! 
[ebuild  N    ] sci-libs/gsl-1.4  2,159 kB 
[ebuild  N    ] sci-astronomy/orsa-0.6.1  \
  USE="opengl qt3 -cln -fftw -ginac -gsl -mpi" 745 kB  

Total size of downloads: 2,905 kB

This instance shows me that the gsl library will be installed before orsa gets installed. It also shows me the USE flags that are active for the set of packages. (This information is the biggest reason you want to look before you leap.) Here I can see that orsa uses OpenGl and Qt3, but it will not use several other options. If these options were not what I wanted, I could simply stop here and reconfigure the packages so that they would be built according to my requirements. It looks like XEphem is ready to install, and I don't really need orsa this time around. So, here's how I'd install XEphem:

# emerge -av xephem 

These are the packages that would be merged, in order:    

Calculating dependencies... done! 
[ebuild   R   ] sci-astronomy/xephem-3.6.4  0 kB  

Total size of downloads: 0 kB 

Would you like to merge these packages? [Yes/No] y 

<lots of compile output snipped>

USE flags are the Portage interface to all those options you can use when you configure a source code package. (If you have ever used ./configure --help=short, you'll know what I mean.) If your package has options for different libraries, there's typically a parallel USE flag for those options in that application's ebuild. (See Figure 1.)

Here's another example. I need to tune my guitar, and k3guitune is a nifty little gizmo that will help me do that. Here's what my default installation of k3guitune will do if I emerge it on my server:

 
$ emerge -pv k3guitune 

These are the packages that would be merged, in order: 
    
Calculating dependencies... done! 
[ebuild  N    ] media-sound/k3guitune-0.4.1 \

  USE="alsa arts -debug -oss -xinerama" 304 kB 

Total size of downloads: 304 kB

Wait a minute! I don't want it to use aRts because I'm hardly ever in KDE while I'm strumming the six-string. Here's what the uninitiated would do to fix this:

$ USE="-arts" emerge -pv k3guitune 

These are the packages that would be merged, in order:    

Calculating dependencies... done! 
[ebuild  N    ] media-sound/k3guitune-0.4.1 \
  USE="alsa -arts -debug -oss -xinerama" 304 kB 

Total size of downloads: 304 kB

However, the preferred way to provide specific settings to individual applications is through the package.use file in /etc/portage. In this file, you simply define the category and application in the first column and its appropriate USE flags after that.

$ cat /etc/portage/package.use 
media-sound/k3guitune -arts 
sys-libs/glibc        userlocales

Trees

The Portage tree provides a structure of applications and contains the ebuilds for the individual sources (see Figure 2). In /usr/portage, we find the overall category directories. Within those we see the directories for each application. The sci-astronomy category directory contains this:

celestia  maestro-data  orsa     pyephem     stellarium  xephem 
maestro   metadata.xml  predict  setiathome  wcstools

All but metadata.xml are directories that hold the specific ebuilds for each application. (The metadata.xml file holds a description of the category.)

You can also see that the specific application directory contains several ebuilds for the application and some management files (see Figure 1). Depending upon the history of a particular application, there can be several ebuilds for different versions of a given package. Often, these are different only in their stability on a given platform such as x86, ppc, or mips, as examples.

That's pretty much the way Portage takes care of describing an application -- an ebuild enumerates how to compile the package and which USE flags it employs, and that ebuild lives in a directory in the Portage tree. Of course, the contents of the ebuild file are rather extensive in many applications and quite simple in others. Many packages with simple build options are short ebuilds, and those with greater dependencies are more involved.

Everyday People

In normal practice, Portage boils down to keeping the Portage tree up to date and applying the latest updates. As we discussed above, it's then a matter of cost that determines the frequency of these steps. In my environment, I have a laptop, a development server, and a production server. I also have a server at home that I use for my file storage and general purpose computing. My development server is a high-frequency update machine, while my server at home is the lowest of the bunch. Here's a typical day in the life of managing the Portage environment among these:

1. Production server sync's Portage: emerge --sync, daily

2. Development server sync's to production: emerge --sync, daily

3. Development server update: emerge -avuD system/world, daily or weekly

4. Laptop sync and update, weekly

5. Home server sync and update, monthly

Changing the World

There are several concepts one should understand about Portage. One of them is that it keeps track of all the packages that you've installed. This information is tracked in the "world" file, and it's such an important concept that there's even a meta-package named "world". Portage scrutinizes this collective list whenever you need to update your system. Additionally, there is an auxiliary inferred meta-package named "system" that is also extremely useful when updating the base components of your machine. Let's see how these two ideas work together to make it easy to manage the software configuration on a typical server machine:

axon# emerge -avuDN --nospinner system 

These are the packages that would be merged, in order:
    
Calculating system dependencies ... done! 
[ebuild     U ] sys-devel/binutils-2.16.1-r3 [2.16.1-r2] \
  USE="-multislot -multitarget -nls -test -vanilla" 0 kB 
[ebuild     U ] sys-devel/gcc-config-1.3.13-r3 [1.3.13-r2] 0 kB 
[ebuild     U ] sys-devel/autoconf-wrapper-3.2 [3-r1] 0 kB 
[ebuild     U ] sys-devel/automake-1.9.6-r2 [1.9.6-r1] 0 kB 
[ebuild     U ] sys-apps/sysvinit-2.86-r5 [2.86-r3] \
  USE="-bootstrap -build -static" 0 kB 
[ebuild     U ] sys-apps/baselayout-1.11.15-r3 [1.11.14-r8] \
  USE="unicode -bootstrap -build -static" 0 kB 
[ebuild  NS   ] sys-kernel/gentoo-sources-2.6.16-r11 \
  USE="-build -doc -symlink" 0 kB 
[ebuild     U ] sys-apps/file-4.17-r1 [4.13] USE="python \
  -build" 0 kB 
[ebuild     U ] x11-terms/xterm-215 [212-r3] USE="truetype \
  unicode -Xaw3d -toolbar" 0 kB 
[ebuild     U ] sys-process/psmisc-22.2 [22.1] USE="X -ipv6 \
  -nls" 0 kB 
[ebuild     U ] sys-apps/gawk-3.1.5-r1 [3.1.5] USE="-build \  
  -nls" 0 kB 

Total size of downloads: 0 kB

Here we see the "internals side" of the configuration on this server. These packages in "system" are a subset of the "world" packages, and they differ from the "world" packages by the fact that these packages were often, but not always, implied by the installation of the packages in "world".

In contrast, we see user applications built when we "emerge world". Again, since "system" is a subset of "world", the set of packages built when updating "world" would include the "system" packages. In everyday use, it's practical to emerge "system" before emerging "world" so that the libraries are updated and their configuration files can be adjusted before the applications.

In my experience, it's been fruitful to update "system" more frequently than "world" because libraries tend to be more stable than applications. Additionally, the configuration of libraries relies more upon the lower level components of a machine and so do not change with the rapidity that is found for a re-factored UI application. With application packages, developers will opt for new configuration files to enable new features so in those cases, it takes more time to understand and merge the original configuration file with an updated version thereof.

The Portage tree has another (large) component -- source code. In /usr/portage/distfiles live the hundreds of tar, zip, and tgz files we all know and love. Unlike the ebuilds, the source code is not updated when you sync Portage. Only when an application is emerged does the source code get downloaded from the mirrors and saved into /usr/portage/distfiles. And doesn't this make sense? Why would I want the source code for Fortran development tools if I could only write in Pascal? (See Figure 3.)

Saving the World

Saving the world's bandwidth is always a good thing. Most companies can easily set up their own Portage cache and synchronize their local machine with their own cache. There are two concepts to understand in this respect. One is the Portage tree itself, and the second is the method for accessing this tree. Emerge --sync merely sychronizes (via rsync) the database of available packages. It's not the packages themselves, just the index thereof. The second means of saving bandwidth is to proxy the requests for the actual source once the packages are emerged.

As mentioned above, the distfiles directory is a good candidate for "sharing the wealth" since this is a sizable pool of code after even a short amount of time. If you run even a small shop, consider the http-replicator package to proxy source code distribution at your site. As if that weren't simple enough, it is even easier to set up a local Portage tree at your site by deploying an rsync server. Then your group gets local network speed access to the Portage tree without undo strain on the little wire that connects your site to the 'net.

Profiling

Up to this point, I've looked at specific applications in the context of Portage. Here, I'd like to describe how the overall system and any similarly configured systems can be combined into a "profile". This is truly the crux of Portage's potential. With different profiles, you can define for yourself different sets of Portage behavior depending upon your needs.

An easy way to provide standard setups resides in the system profile. This provides the default Portage settings for the system. With a specific profile, you can define a set of USE variables and FEATURES specific for that environment. (See "Waive the USE Flags" sidebar.)

Profiles aren't related to system or world for the most part. A profile merely describes the minimum packages for the system, and so is useful for rollouts of different system types. What you install on the machine itself defines world and, therefore, system by implication.

Here, for instance, are the USE flags from the standard x86 profile, default-linux/x86/2006.0/make.defaults:

USE="alsa apache2 apm arts avi cups eds emboss encode esd     \
  foomaticdb gdbm gif gnome gpm gstreamer gtk gtk2 imlib jpeg \
  kde libg++ libwww mad mikmod motif mp3 mpeg nptl ogg opengl \
  oss pdflib png qt qt3 qt4 quicktime sdl spell truetype udev \
  vorbis X xml xmms xv"

And here are the package requirements for this profile:

>=sys-apps/baselayout-1.11.12-r4 
>=sys-devel/binutils-2.15.90.0.3-r4 
>=sys-devel/gcc-3.3.4-r1 
>=sys-libs/glibc-2.3.3.20040420-r1

Portage -- Now Featuring FEATURES

The process of managing software dependencies and compiling source code has lots of variables within its inner workings. The various options one can apply to so large a system are easily changed with the FEATURES variable. As with the other Portage variables, like USE and MAKEOPTS, the FEATURES variable is merely an environment variable that can be set to a default in the /etc/make.conf file (see sidebar "Making a make.conf File") and overridden at the command line. One of the top favorite FEATURES for anyone with multiple machines at their disposal is "distcc". This option enables the distcc compiler to spread out compilation across hosts on your network. A simple way to provide compiled code to the slow machines on your network is to use the "getbinpkg" FEATURE on the slow machines and use the "buildpkg" FEATURE on a central build machine. If your architectures are conducive to this, it's an effective means of saving CPU time on some clients.

Transparent Overlays

One of the most useful aspects of Portage is its extensibility. The Portage designers understood that local variations needed to live within the existing framework and so provided a method of integrating third-party packages into Portage while allowing those new packages to be managed under the umbrella of all those great tools provided by Portage. The key to this subsystem is the idea of "overlays". Quite simply, overlays are "mini" Portage trees that house the build frameworks for your local packages.

Sticky Wickets

Sometimes, your packages may misbehave because a library just has to be hard-coded. Or perhaps a dependency isn't a direct one but it's a direct or indirect reverse dependency. A critical tool, revdep-rebuild, is a savior of the Portage sys admin. It walks the dependency tree of your installed applications, uncovering those that need to be rebuilt. There's also a great tool, module-rebuild, that automates rebuilding packages that are dependent upon the current kernel.

Roundup

Portage grew out of the need to balance more efficiently the need to keep code up to date, the asynchronous nature of code development across interdependent packages, and the requirement to bring these two disparate functions together over the lifespan of a given system. Additionally, Portage allows the systems administrator to tune these settings, extend them, bring new applications into the system, and manage it all through a similar interface.

Links

Gentoo Handbook -- http://www.gentoo.org/doc/en/handbook/handbook-x86.xml

Portage Tips -- http://gentoo-wiki.com/Index:TIP#Portage

Official Overlays -- http://overlays.gentoo.org/

Other Overlays -- http://gentoo-wiki.com/Portage_Overlay_Listing

Portaris -- http://www.portaris.org/wiki/Main_Page

Bill Longman lives in the southwestern part of the northwestern-most state in the Pacific Northwest. There, he provides network and computing resources to Sharp Laboratories of America. He loves his God, his family, his dogs, and his computers -- in that order. Send your love to him at: longman@sharplabs.com.