Learning DTrace -- Part 1
Chip Bennett
One of the most innovative features of Solaris 10 is a
facility that allows administrators to dynamically track down problems and
performance issues on an active production system with almost no impact on
that system. This facility is called DTrace, which stands for Dynamic
Tracing. This is the first of a series of articles designed to help you
understand how DTrace works and give you the basic knowledge necessary to
use DTrace to solve your own system problems.
If you're familiar with C programming and awk
programming, you already know half of what you need to know to run DTrace.
DTrace uses a scripting language called D, which has a similar syntax to C,
except that it doesn't support flow control (if, while, for, etc.). Because of the nature of system tracing, the DTrace
framework has its own flow, which is tailored using a pattern match syntax that is much like awk.
DTrace does its job by allowing you to insert probes
(also called "instrumenting" probes) at specific points in the
kernel or user program. When these probes are
reached, data is recorded that provides information regarding the event.
The information provided varies according to the type of probe. You tell
DTrace, through the D language, where you want probes instrumented and how you want the event data reported.
Listing Probes
Before actually writing scripts, the simplest way to start looking at DTrace is by using it from the
command line. The simplest command option in DTrace is the one that lists
all of the possible probes than can be instrumented on your system:
# dtrace -l
ID PROVIDER MODULE UNCTION NAME
1 dtrace BEGIN
2 dtrace END
3 dtrace ERROR
4 syscall nosys entry
5 syscall nosys return
6 syscall rexit entry
7 syscall rexit return
8 syscall forkall entry
9 syscall forkall return
...
38328 fbt zmod huft_build return
38329 fbt zmod _info entry
38330 fbt zmod _info return
38331 fbt zmod z_strerror entry
38332 fbt zmod z_strerror return
38333 fbt zmod z_uncompress entry
38334 fbt zmod z_uncompress return
Each probe is characterized by a four-part tuple:
provider, module, function, and name. The above list shows the four parts
of each probe, plus the unique probe identification number. Most of the
above list is removed for brevity, but if you look at the last line,
you'll see that my system has 38,334 available probes. Your mileage
may vary.
The probe provider is a kernel program that performs a
specific type of instrumentation. For example, the "syscall"
provider provides probes into the kernel system call interface (read,
write, fork, etc.). In this case, there are two possible probes for each
system call: one for entry into the system call, and one for exit from the
system call. Another provider, which instruments in a completely different
way, is the "profile" provider. This provider allows for probes
that record events at specific time intervals. You'll see more about
the "profile" provider later.
The other parts of the probe tuple further define the
location of the probe:
- The "module" where the probe
is located (if the probe corresponds to a program location).
- The "function" within the module.
- The "name" that generally
defines what event the probe is recording, such as "entry" or
"return".
Some probes don't have the "module"
part of the tuple, and some additionally don't have the
"function" part. However, all probes have at least a
"provider" and a "name".
We can further define which probes we want to deal
with by specifying a colon-delimited tuple. For example, the tuple
"lockstat:genunix:mutex_exit:adaptive-release" refers to a
probe named "adaptive-release", which is in the
"mutex_exit" function in the "genunix" module and
is provided by the "lockstat" provider. To list such a probe,
we could specify:
dtrace -ln lockstat:genunix:mutex_exit:adaptive-release
ID PROVIDER MODULE FUNCTION NAME
470 lockstat genunix mutex_exit adaptive-release
The -n flag allows us to specify a tuple by name, that is, DTrace
assumes that the last component of the tuple specified is the name
component.
There are three other flags that allow us to specify
the other components as the last one in the tuple:
# dtrace -lf lockstat:genunix:mutex_exit
ID PROVIDER MODULE FUNCTION NAME
470 lockstat genunix mutex_exit adaptive-release
# dtrace -lm lockstat:genunix
ID PROVIDER MODULE FUNCTION NAME
467 lockstat genunix mutex_enter adaptive-acquire
468 lockstat genunix mutex_enter adaptive-block
469 lockstat genunix mutex_enter adaptive-spin
470 lockstat genunix mutex_exit adaptive-release
471 lockstat genunix mutex_destroy adaptive-release
...
# dtrace -lP lockstat
ID PROVIDER MODULE FUNCTION NAME
467 lockstat genunix mutex_enter adaptive-acquire
468 lockstat genunix mutex_enter adaptive-block
469 lockstat genunix mutex_enter adaptive-spin
470 lockstat genunix mutex_exit adaptive-release
471 lockstat genunix mutex_destroy adaptive-release
...
When we only partially specify a tuple in this way, we
generally match more than one probe. Also, note that specifying all of the
lockstat probes is the same as specifying all of the lockstat:genunix
probes, because in the case of lockstat, there is only one module.
We can also match more than one probe by using the
colon as a placeholder and leaving out one or more parts of the tuple. For
example, the following are equivalent probe specifications and will match
the same set of probes:
dtrace -lP lockstat
dtrace -lm lockstat:
dtrace -lf lockstat::
dtrace -ln lockstat:::
These two dtrace commands will match all of the available probes:
dtrace -l
dtrace -ln :::
Instrumenting Probes
Although it's useful to be able to list the
probes in which we have interest, it's even more useful to actually
instrument them. To instrument the same probes that we have been listing,
simply remove the -l flag from the command line. For example, the following
commands all instrument the same set of probes -- all of the probes
provided by lockstat:
dtrace -P lockstat
dtrace -m lockstat:
dtrace -f lockstat::
dtrace -n lockstat:::
When probes are instrumented, the default behavior is
to write (to standard out) as each probe fires, the CPU# where the probe
fired, the ID# of the probe, the probe function, and the probe name:
# dtrace -P lockstat | head
dtrace: description 'lockstat' matched 22 probes
CPU ID FUNCTION:NAME
0 470 mutex_exit:adaptive-release
0 470 mutex_exit:adaptive-release
0 467 mutex_enter:adaptive-acquire
0 470 mutex_exit:adaptive-release
0 467 mutex_enter:adaptive-acquire
0 470 mutex_exit:adaptive-release
0 477 lock_try:spin-acquire
0 478 lock_clear:spin-release
0 473 lock_set:spin-acquire
This dtrace session instruments all of the probes provided by lockstat
(22 probes). As each probe fires, a line of output is generated. At this
point, the purpose of the lockstat provider should be pretty clear: provide
a way of tracing lock events (mutexes and spinlocks) in the kernel. The
trace above is listing each lock event as it occurs. (Note that if we had
not piped the dtrace command to head, it would have run forever and would have had to
be interrupted with a kill command or control-C.)
DTRACE HINT: When running your own DTrace commands and
scripts, it's important not to overload the system with too many
instrumented probes. You can actually instrument all of the probes on a
system and get away with it, but it's rather CPU intensive, plus you
end up with a lot of dropped events as the event queues fill up. (DTrace
tells you when it drops events.) Most of the time, if you are instrumenting
thousands of probes, you are probably collecting too much information. An
easy way to safeguard against this is to always run your dtrace command or script the first
time with the -l flag. This will only list the probes you are instrumenting, rather than actually instrument
them. If you want a count of the number of probes you would have
instrumented, pipe the output of the list form of your DTrace command to wc -l and subtract one for
the header.
Inside DTrace
At this point, you may be asking how Solaris does all
this without impacting the rest of the system [1]. The DTrace facility is a
collection of producers and consumers. The DTrace probes, when they fire,
produce events that are placed into buffers (one per processor per DTrace
consumer). The DTrace consumers (i.e., the dtrace command) pull events out
of the buffers and process them. Figure 1 shows the relationship beteen the
DTrace producers and consumers [2]. For additional information on the
internals of DTrace, see "Dynamic Instrumentation of Production
Systems" by Bryan M. Cantrill, Michael W. Shapiro, and Adam H.
Leventhal:
http://www.sun.com/bigadmin/content/dtrace/dtrace_usenix.pdf
Since you only instrument the probes of interest, the
amount of data being collected is kept at a minimum. Additionally, DTrace
limits the number of events that have to be post-processed, using
predicates, which are relational expressions based on data from the probes.
We're not limited to the default action of
writing the CPU, probe ID, function, and name. We can tailor the actions
that occur when a probe fires and perform different actions based on which
probes fire and the contents of the event data. These actions make use of
the D language, which I'll cover in part two of this series.
Summary
In part 1 of this series, we've only looked at
the preliminaries of using DTrace. In segments to come, I'll explain
the D programming language and show you how to use DTrace to solve system
problems. DTrace allows you to peer a lot deeper into a running system than
previous tools allowed. By the time we're finished with this series,
I think you'll agree that DTrace is indeed one of the most innovative
features of Solaris 10.
Endnotes
1. Being a fan of "Star Trek"(R), I enjoy
the correlation between "system monitoring impact" and
"transporter technology". Werner Heisenberg (1901-1976),
renowned physicist, proposed what is referred to as the Heisenberg
Uncertainty Principle. In the area of system monitoring, the principle
loosely defines how the monitor impacts the system. That is, the more in
depth and detailed you monitor and describe what is happening on your
system, the more the monitor itself changes and impacts the system.
The fun correlation to "Star Trek" is that
the principle is also used to describe the impact of transporter technology
on the thing being transported: the more you scan to copy an object, the
more you disrupt it (see "Quantum Teleportation": http://www.research.ibm.com/quantuminfo/teleportation/). Of course, because the
show was based on a certain amount of real science, the engineers in
"Star Trek" had at their disposal a device called the
"Heisenberg Compensator", which not only added a safety
mechanism to the transporter, but also gave the writers something accurate
sounding to toss around in the script (i.e., "we had to modify the
Heisenberg Compensator", or "the transporter failed because of
a problem with the Heisenberg Compensator"). ("Star Trek"
is a registered trademark of CBS Studios Inc.)
2. From "Solaris Dynamic Tracing Guide",
Sun Microsystems, Inc., Part No. 817-6223-10 -- http://docs.sun.com/app/docs/doc/817-6223/6mlkidlf1?a=view
Chip Bennett (cbennett@laurustech.com) is a systems engineer for Laurus Technologies (http://www.laurustech.com), a
Sun Microsystems partner. He has more than 20 years experience with Unix
systems, and holds a B.S. and M.S. in Computer Science. He consults with
Laurus and Sun customers on a variety of Solaris related topics, and he
co-chairs OS-GLUG, the OpenSolaris Great Lakes User Group (Chicago).
|