Article Figure 1 Table 1 Table 2 Table 3
Table 4 apr2007.tar

SolarisTM 10 Resource Management

Scott Cromar

Solaris 10 Resource Management represents a significant enhancement of the facilities in previous versions. Resources can be managed at a zone, project, task, or process level. This article focuses mainly on project, task, and process-level resource management. Zones are easily a topic all to themselves and will be left to another article.

The new resource management framework is a significant improvement over the traditional SVR4 model in a number of ways, including:

  • System parameters that were previously set in /etc/system and enabled on a reboot can now be enabled on the fly, without any interruption of services.
  • Workloads can be aggregated and separated on a very granular level, on the fly, and with only minimal setup required.
Projects and Tasks

Projects and tasks were introduced as a separate administrative framework to allow for resource management of collections of processes. (POSIX defines a similar entity in its concept of a session, but projects and tasks allow much more flexibility.)

Projects are collections of related processes. Each project has a unique project ID number. Users and groups may be associated with one or more projects. When processes are started, they inherit their parent's project membership, unless another project is specified.

Some commands (specifically, login, cron, setproject and su) may result in another project being assigned. Where the project is not explicitly set with a setproject command, the default project is assigned as specified in the getdefaultproj() man page (see Figure 1).

Tasks are collections of processes contained within a project. A new task is started in a project when a new session is opened via any of the following: login, cron, newtask, setproject, and su. Each process belongs to only one task, which in turn belongs to only one project. As new tasks are created, they are dynamically assigned a numerical taskid. It is possible to have more than one policy in place for a particular object at a time. For example, a process may be assigned both a task and a process constraint. In this case, the smallest container's control is enforced first.

The Project Database

Projects are maintained via the /etc/project file (or by the associated NIS or LDAP mapping, if that is specified in the nsswitch.conf). The default version of the Solaris 10 /etc/project file contains the following projects:

  • system:0::::(all system processes and daemons)
  • user.root:1::::(all root processes)
  • noproject:2::::(IPQoS)
  • default:3::::(default project assigned to every non-administrative user)
  • group.staff:10::::(project used for all users in the "staff" group)
The fields in an /etc/project entry are:

  • projname -- Name of the project.
  • projid -- Unique numerical project identifier less than UID_MAX (2147483647).
  • comment -- Project description.
  • user-list -- Comma-separated list of usernames. (Wildcards are allowed.)
  • group-list -- Comma-separated list of groups. (Wildcards are allowed.)
  • attributes -- Semicolon-separated list of name-value pairs, such as resource controls, in a name[=value] format.
Resource constraints are set by adding them to the last field of the project entry:

example:101::::task.max-lwps=(privileged,100,deny)
Changes to /etc/project only become available as new tasks are started in a project.

Managing a Project

Table 4 at the end of the article contains a listing of the primary commands used to manage projects and their resource constraints. The following command creates a project named "example" with appropriate user and group memberships and several resource controls set:

projadd -p 111 -U username1,username2 -G groupname1,groupname2 \
  -c "Example Project" -K "rcap.max-rss=10GB" \
  -K "process.max-file-size=(priv,50MB,deny)" \
  -K "task.max-lwps=(priv,100,deny)" example
This command would produce the following entry in /etc/project:

example:111:Example Project:username1,username2:groupname1, \
  groupname2:process.max-file-size=(priv,52428800,deny); \
  rcap.max-rss=10737418240;task.max-lwps=(priv,100,deny)
We can start up a task under this project, by running the following:

newtask -p example command
To verify the project governing the current shell, we would run:

id -p
All existing projects can be listed with:

projects -l
A process's project id can be displayed with:

ps -o projid -p PID
To match project or task ids for pgrep, pkill, or prstat commands, use the -T or -J options:

pgrep -J project-IDs
pkill -T task-IDs
prstat -J
A running process can be associated with a new task:

newtask -v -p project-name -c PID
Resource Controls

The Solaris 10 IPC resource management framework was designed to overcome several shortcomings of the older SVR4-based system. Several parameters are now dynamically resized, the defaults have been increased, the names are now more human-readable, resource controls are more granular (rather than being system-wide), and reboots are no longer required for many types of changes.

The Solaris 10 system allows changes to be associated with a project and monitored via prctl. Changes can be made on the fly and associated with separate projects and tasks on an individual basis.

For the purposes of IPC resource management, see Table 1 for the important parameters. (The other parameters are either obsolete or size themselves dynamically.)

Besides being renamed, several of these parameters have had their defaults increased. In many cases, the defaults may be large enough that old /etc/system settings may no longer be needed. For example, Oracle 9i requires several settings that are no longer necessary. See Table 2 for a listing of parameters along with Oracle requirements and the new defaults.

It is likely that max-shm-memory parameter will be the only one requiring adjustment. If there are multiple Oracle instances on a single system, it makes sense to set up a specific project for each. The projmod command can be used to set the shmmax for each project to the desired level (default is 1/4 physical memory):

projmod -sK \
  "project.max-shm-memory=(privileged,gigabytes-sharedGB,deny)" \
  project-name
To ensure that each instance actually starts up in the proper project, the startup scripts will need to specify a new task with a command like the following:

newtask -p project-name
We can see how a project's IPC objects are allocated against existing limits, by running: ipcs -J .

The new Solaris 10 resource controls include compatibility interfaces to the old rlimit-style resource controls. Existing applications using the old interfaces can continue to run unchanged.

Additional resource controls are listed in Table 3. A full description of the new resource controls is available on the resource_controls man page. To view resource constraints for a process, we would run something like the following:

prctl -n resource-name -i process PID
To check resource constraints for the current shell, we could run:

prctl $$
Or, to temporarily set resource constraints on a particular project, we could run something like:

prctl -n project-name -t privilege-level \
  -v value -e action -i project project-name
rcapd

rcapd is a user-level daemon that caps memory usage within a project. Where zones are used, it is only able to manage projects within its own local zone.

In each zone, rcapd can be enabled via rcapadm -E, which will start rcapd and enable it under SMF so that it will be restarted automatically. projmod can be used to set the memory cap for a project:

projmod -s -K rcap.max-rss=sizeMB \
  project-name
or the rcap.max-rss control can be set directly in /etc/project.

Note that rcapd does not account for shared memory in an intuitive way, and Sun is reported to be changing their calculation algorithm. To be safe, allow enough room for shared memory to be included under your cap, and do not depend solely on rcapd to keep process memory usage under control.

Fair Share Scheduler

The default user-level scheduling class in Solaris (TS, or "timesharing") attempts to give relatively equal CPU access to all regular processes running on the system. There is some limited ability to regulate process priority using the nice command. The Fair Share Scheduler (FSS) implemented in Solaris 9 provides a more structured way to manage process priorities.

Each project is allocated a certain number of CPU shares via the project.cpu-shares resource control. Each project is allocated CPU time based on a weighted average of the active running projects with a non-zero cpu-shares setting. The allocation formula for each project is its cpu-shares value divided by the sum of the cpu-shares values for all active projects. Note that this means that anything with a zero cpu-shares value will not be granted CPU time until all projects with non-zero cpu-shares are done with the CPU.

The maximum number of shares that can be assigned to any one project is 65535. The Fair Share Scheduler can be assigned to processor sets. This allows more sensitive control of priorities on a server than processor sets allow on their own. The dispadmin command controls the assignment of schedulers to processor sets, using a form like:

dispadmin -d FSS
To enable this change now, rather than after the next reboot, run a command like the following:

prioctl -s -C FSS
The shares assigned to an active project can be dynamically altered with the prioctl command. The command would have a form something like:

prioctl -r-n project.cpu-shares -v number-shares -i project \
  project-name
The Fair Share Scheduler should not be combined with the TS, FX (fixed-priority) or IA (interactive) scheduling classes on the same CPU or processor set. All of these scheduling classes use priorities in the same range, so unexpected behavior can result from combining FSS with any of these. (There is no problem, however, with running TS and IA on the same processor set.)

Resource Control Parameter Attributes

The following types of parameters are defined for resource controls:

Logging

Global logging can be enabled by setting syslog=level with rctladm, where level is one of the usual syslog levels: debug, info, notice, warning, err, crit, alert, or emerg.

To activate logging on a global resource control facility, run something like:

rctladm -e syslog=level resource-name
Privilege Levels

Each resource control threshold needs to be associated with one of the following privilege levels:

  • basic -- Can be modified by owner of calling process.
  • privileged -- Only modifiable by superuser
  • system -- Fixed for the duration of the operating system instance
A given resource control can have an associated threshold for up to three privilege levels.

Actions

It is possible to use rctladm to specify one of the following actions on a process that violates the control:

  • none -- No action taken (useful for monitoring).
  • deny -- Denies request.
  • signal= -- Enable a signal. See the rctladm man page for a list of allowed signals.
Additional Reading

Cromar, Scott. 2007. Solaris Troubleshooting and Performance Tuning at Princeton University -- http://www.princeton.edu/~unix/Solaris/troubleshoot/index.html

Galvin, Peter Baer. 2003. "Solaris Resource Management", Sys Admin 11(4):49-51.

System Administration Guide: Solaris Containers-Resource Management and Solaris Zones (May 2006). Sun Microsystems Inc.

Foxwell, Harry J., et al. 2006. The Sun Blueprints Guide to Solaris Containers. Sun Microsystems Inc.

McDougall, Richard and Mauro, Jim. 2006. Solaris Internals. Prentice Hall.

Scott Cromar has been working with Unix and Linux for longer than he cares to admit in public. Along the way, he created Princeton University's "Solaris Troubleshooting" Web site: http://www.princeton.edu/~unix/Solaris/troubleshoot/index.html. He can be contacted through that site's blog.