The SmlMC Model Checker

Dr. Dobb's Journal March 2003

Verifying that implementation meets the design specs

By Wouter Boeke

Wouter formerly worked in the R&D department of Lucent Technologies. He can be contacted at w.m.boeke@chello.nl.

Model checking is a method for verifying whether an implementation satisfies a design specification. To model check a system, you must first create a model from which you can derive the system states and the transitions between those states. Furthermore, you must also be able to describe some properties of the system. If the verification tool can show that the state/transition structure satisfies the properties, it is referred to as a "model checker."

Models and properties are usually expressed in special-purpose languages. The model checker parses the model description code, examines the semantics, and enumerates all states and transitions; likewise with the properties. However, SmlMC, the model checker I present here, takes a different approach—its model and properties are specified in the Standard ML (SML) programming language (http://www.diku.dk/users/tofte/publ/tips.ps). The advantage of this approach is that users have greater freedom in writing the model by using all kinds of data types, as well as access to structuring and debugging options provided by a real programming language. (On the downside, it's not always possible to implement state reduction techniques, which are often necessary because of the enormous number of states that can be reached with model checkers. If, for instance, three independent processes can each be in 100 states, then together they form a transition system with 1 million states!)

Features that functional languages such as SML provide—which procedural languages like C++ don't—include:

Variables that keep their value as long as they are reachable.
Undeceivable type checking at compile time, yet you do not have to specify types. They are inferred by the compiler.
Functional values that can be specified and used like any other value.
Polymorphism through pattern matching.
No crashes at run time.

SML implementations differ with respect to libraries and compilation strategy. For this project, I chose Moscow ML (http:// www.dina.kvl.dk/~sestoft/mosml.html) because, apart from being well designed, some components in its library are coded in C, facilitating the access of internal data structures. This yields a hash table that works for arbitrary data types. Without the polymorphic hash function, SmlMC users would have to write dedicated hash functions for each new data type that they use.

Event-Driven Process Synchronization

Suppose, as in Figure 1, there are three communicating processes. Further, suppose that several events by which the global state may be modified are pending. If an event fires, the global system state is evaluated and stored in a hash table if it is a new state. However, it is also possible that the events are known to occur at (or about) the same moment, although their relative order is unknown. In this case, all pending events are fired in some order, and at the end the new global state is evaluated. Next, a backtrack to the old state is performed, after which all pending events are fired again, but in a different order. This goes on until all permutations in the order of events have been exercised.

What happens when many events are pending, as the number of permutations will grow with a factorial rate? You could say that the state explosion in other model checkers is replaced by a computation explosion. SmlMC addresses this by making a limited number of random choices between all possible permutations. The threshold where this occurs is user specified.

What happens between the events? In all likelihood, some calculations are done. With most model checkers, after each atomic calculation the global state is recalculated because, for example, interaction via global variables can occur. Hence, impressive numbers of states may be reported. With SmlMC, I assume that calculations are always atomic and performed in zero time between events.

To illustrate the reduction of the number of states that can occur, consider the Leader Election Protocol, a well-known benchmark for model checkers. In the case of five interconnected nodes, the (reduced) state space, as reported by the model checker Spin (see "The Spin Model Checker" by Gerald J. Holzmann, DDJ, October 1997), is 3008 states (without property checking). With SmlMC, the number of states is 1445 if all states are stored individually, and 8 if events are supposed to be concurrent and applied with all permutations as previously described. The choice between the two modes of operation is controlled by the Boolean value with_timing (Listing One).

The Algorithm

What makes one model checker different from another is the way processes communicate. Communication is usually accomplished via FIFO queues with zero, bounded, or infinite length. However, this approach has the disadvantage that mutual exclusion constructs and buffer overflow must often be modeled—both of which are considered too low level for a decent model checker. The way processes communicate in SmlMC is borrowed from Lotos (an ISO standard language), where two or more processes synchronize via common "gates" and possibly instantaneously interchange data in any direction. Consequently, users need to organize their model as follows:

A number of components that contain data, but cannot manipulate these data themselves.
A number of events that can fire if they are activated. An event is stateless and has access to the data of all components. It can replace this data with new content. It can reactivate itself or other events, if needed, with a delay.

Conceptually, the components are the blocks in Figure 1, whereas the events are the connections between them.

Listing One is an SmlMC file that models a buffer that is filled and emptied. (The complete SML source code for SmlMC is available electronically; see "Resource Center," page 5.) The line starting with val buffer, where a value of type Component is declared, is a reference that can be reassigned. Two events (e1 and e2) are declared. These are also references, initialized to a no-operation functional value. Later, the events are reassigned by function init_ev and placed in vector events.

In the body of the functional value assigned to e1, there is the line if n < 3 then buffer := Buffer{cnt=n+1}. In SML, a record always has to be fully specified—a partial assignment is impossible. The line act e1 reactivates event e1 after it has been fired. Running the program only starts if at least one event has been activated; this is done in val _ = (act e1;act e2). The last line in Listing One specifies how the generated states have to be displayed on the screen.

Listing One is the user-defined part. Also required are two common files: SmlMC_head.sml, which contains definitions, and SmlMC_gen.sml, where states, transitions, and other output are generated. The three files are compiled with make, controlled by a standard makefile. An executable is generated that, when run, writes the output as in Listing Two.

As you can see, all states (sta:) and transitions (tra:) are listed in a single output. This is also true for other generated information. The idea is to perform postprocessing on the output by piping it through grep, sed, or awk in case simple inspection isn't enough.

Nondeterministic choice is a standard feature of model checkers. In this case, it is performed by specifying an event differently: Normally, for instance, you write ev ev1 fv1 where ev1 is an event and fv1 a functional value. The code:

ev ev1 fv1,

ev_choice ev2 [fv20,fv21,fv22],

ev ev3 fv3

is expanded internally into three sequences (each that will be permuted):

fv1 - fv20 - fv3

fv1 - fv21 - fv3

fv1 - fv22 - fv3

Different functionalities can be specified in fv20, fv21, and fv22.

Normally, an event is activated by something like act e1. It can also be activated after a delay (for instance, act_timed e1 3), then the event is placed in a timing queue. At run time, the head of this timing queue is used as the new list of activated events. Of course, no real delays are involved; they're only symbolic.

Backtracing is another standard model-checking feature. Backtracing makes it possible to list the traversed states from the current state back to the start state, and is activated by calling function backtrace. This is usually done if some test yields an interesting result. In SmlMC, states are searched breadth first, so the shortest path will always be reported.

The global state consists of:

The values of the passive components.
A list of the activated events.
A list of future events in the timing queue.

These parts are taken together and stored as a key in a hash table. A hash table from the SML library doesn't make copies of keys, which implies that keys must be of an immutable data type. So your components may contain vectors and lists, but no references or arrays.

Value perm_max controls the maximum number of permutations between concurrent events. For example, if perm_max is set to 4, then at most 4!=24 random choices between all possible permutations will be performed.

Model Checking

Two dominant logics exist to express properties to be checked: LTL and CTL, both of which specify paths through the states. Formulas in these logics are transformed to state/transition structures and their intersections with the state/transition structure of the model is calculated. In this project, it is done differently: Properties are described using components and events just like for the model.

To check the property of a path through the state space, one special event (path_test) is activated at the end of each event firing cycle. By default, path_test does nothing and should be reassigned. Arbitrary path properties can be checked, like "Value X first will be 2, then 4, then keep that value forever."

Final states can also be checked. In Figure 2, which depicts a hypothetical state/transition structure, different cases can be distinguished:

State 0 is the start state.
State 2 has no successor, it will be reported as a deadlock.
States 4, 5, and 6 can always be reached from each other. In graph theory these are called strongly connected components.
States 7, 8, 9, and 10 are also strongly connected, and at that they cannot be left. This is an example of a live lock; I call these states final.
State 3 is a single final state.

Eventually, a system always cycles through one of the groups of final states. So if users are interested in the "steady state" of their model, then they should have the ability to traverse the groups of final states. This is provided by the reach_test function that, by default, does nothing and should be reassigned. Then properties can be checked like, "The system will always halt after an alarm" or "Value Y will stay constant after the initialization phase."

This functionality is only provided if variable model_check has been set to true. This is because users often want to see only a listing of the states and transitions.

An Example: Checking the SSM Protocol

The SSM telecom protocol ensures that interconnected network elements are clocked by one single clock source—the highest quality clock available within the network. Since cable breaks are anticipated, the elements are connected into rings. The widely used SSM protocol is interesting here because it is flawed: It is possible that a network element thinks it receives a high-quality clock—thus synchronizes its internal clock to this external one—whereas it actually is originating from its own clock. Consequently, the clock frequency will start to slowly drift away.

Figure 3 shows a ring of four network elements. Listing Three is the complete model together with the properties to be checked. The quality of the SSM signal ranges from PRC to undef. To enforce a default synchronization direction, the ports of a network element are provisioned with a priority, ranging from pr_1 to dis. Activation of events is arranged like this: If event w1 has fired, then component ne1 probably will have been modified. So event w1 should activate w2 and w11 because these are dependent on ne1's data.

In this example, two disturbances are modeled. As a first case, the connection between network elements ne1 and ne2 can break, after which the system should stabilize toward a different configuration. In the second one, the clock source connected to ne0 might fail. In this case, the internal clock of one of the network elements should take control (quality-level SEC). Now model checking reveals "Clocking loop found!" if the cable breaks and the clock fails in a certain sequence. This is a simplified model for a simple network. It is also possible that several rings of network elements are interconnected.

DDJ

Listing One

open SmlMC_head

val with_timing = false
val model_check = false

datatype Component =
  Buffer of { cnt:int }
val buffer = ref(Buffer{cnt=0})
val components = #[ buffer ]
val (e1,e2) = (nop(),nop())
val get_b = fn ref(Buffer d) => d
val events = init_ev
[ ev e1
  ( fn() =>
    ( let val n = #cnt (get_b buffer)
      in
        if n < 3 then buffer := Buffer{cnt=n+1} else ()
      end;
      act e1
    )
  ),
  ev e2
  ( fn() =>
    ( let val n = #cnt (get_b buffer)
      in
        if n > 0 then buffer := Buffer{cnt=n-1} else ()
      end;
      act e2
    )
  )
]
val _ = ( act e1; act e2 )
val print_comp = fn
  Buffer{cnt=n} => pr.printq `buffer(^(pr.d n))`

Back to Article

Listing Two

start: buffer(0)
  sta: 0 - buffer(0) ev:2
  sta: 1 - buffer(0) ev:1
  tra: 0 1
  sta: 2 - buffer(1) ev:1
  tra: 0 2
  sta: 3 - buffer(2) ev:1
  tra: 2 3
  tra: 1 1
  sta: 4 - buffer(3) ev:1
  tra: 3 4
  tra: 4 4
5 states

Back to Article

Listing Three

structure dut = struct
local
  open SmlMC_head
in
val with_timing = true
val model_check = true
val _ = perm_max := 3
val (PRC,SEC,DNU,undef) = (4,3,2,1)
val (pr_1,pr_2,pr_3,dis) = (4,3,2,1)

type Ssm = { ql:int }

datatype Component =
  Clock of Ssm
| Ne of string * int * Ssm * Ssm * Ssm * Ssm
        (* id,q,in1,in2,out1,out2 *)
| Con of bool

structure ne = struct
  val get = fn ref(Ne t) => t | _ => err "Ne"
  fun set this f = this := Ne(f(get this))
  fun id   t = #1 (get t)
  fun q    t = #2 (get t)
  fun in1  t = #3 (get t)
  fun in2  t = #4 (get t)
  fun out1 t = #5 (get t)
  fun out2 t = #6 (get t)
end

datatype Var_Fix =
  Ne_vf of { var:Component ref, fix:{ prov:int Vector.vector, 
                                                    i3:Component ref } }
val get_con = fn ref(Con on) => on | _ => err "get_con"
val get_ck = fn ref(Clock d) => d | _ => err "get_ck"
val [ck,w1,w2,w3,w4,w11,w12,w13,w14,con] = List.tabulate(10,fn _ => nop())
val dummy = { ql=undef }
val clock = { ql=PRC }
val no_clock = ref(Clock dummy)
val okay = ref(Con true)
val ck1=ref(Clock clock)
val con1=ref(Con true)

fun init id = (id,0,dummy,dummy,dummy,dummy)
val (ne0,ne1,ne2,ne3) =
( Ne_vf{ var=ref(Ne(init "ne0")), 
                            fix={ prov= #[0,dis,pr_2,pr_1], i3=ck1 } },
  Ne_vf{ var=ref(Ne(init "ne1")), 
                            fix={ prov= #[0,pr_1,pr_2,dis], i3=no_clock } },
  Ne_vf{ var=ref(Ne(init "ne2")), 
                            fix={ prov= #[0,pr_1,pr_2,dis], i3=no_clock } },
  Ne_vf{ var=ref(Ne(init "ne3")), 
                            fix={ prov= #[0,pr_1,pr_2,dis], i3=no_clock } }
)
val components =
  let fun get (Ne_vf{ var=v, ...}) = v
  in #[ ck1,get ne0,get ne1,get ne2,get ne3,con1 ]
  end
val \ = fn(a,c) => Vector.sub(a,c)
infix 9 \
fun for(a,b,f) = if a>b then () else (f(a); for(a+1,b,f))

datatype Sig_flow = o1_i1 | o2_i2

fun ne_update (Ne_vf snd) (Ne_vf rcv) sig_flow con =
  let
  fun sel_alg({ql=ql1},{ql=ql2},{ql=ql3},q) =
    let
      val ql = #[undef,ql1,ql2,ql3]
      val q = ref q
      val prov = #prov(#fix rcv)
    in
      if prov\(!q) = dis orelse ql\(!q) <= DNU then q := 0 else ();
      for (1,3,fn p =>
        if prov\p <> dis andalso ql\p > DNU andalso
           ( ql\p > ql\(!q)
             orelse
             ql\p = ql\(!q) andalso prov\p > prov\(!q)
           )

        then q := p
        else ()
      );
      (ql\(!q),!q)
    end
  val (i1,i2) = if sig_flow = o1_i1
    then (if (get_con con) then ne.out1(#var snd) else dummy,ne.in2(#var rcv))
    else (ne.in1(#var rcv),if (get_con con) then ne.out2(#var snd) else dummy)
  val (ql_in,q) = sel_alg(i1,i2,get_ck(#i3(#fix rcv)),ne.q(#var rcv))
  val (ql_out,q) = (if ql_in < SEC then (SEC,0) else (ql_in,q))
  val (out1,out2) =
    case q of
        3 => ({ ql=ql_out }, { ql=ql_out })
      | 2 => ({ ql=DNU },    { ql=ql_out })
      | 1 => ({ ql=ql_out }, { ql=DNU })
      | 0 => ({ ql=ql_out }, { ql=ql_out })
      | _ => err "q"
  in
    ne.set (#var rcv) (fn(id,_,_,_,_,_)=>(id,q,i1,i2,out1,out2))
  end

fun evn(snd,rcv,sflow,act1,act2,on) =
  fn() =>
  ( ne_update snd rcv sflow on;
    act act1; act act2
  )
val events = init_ev
[ ev_choice ck [ fn() => (ck1 := Clock dummy; act ck),
                 fn() => (ck1 := Clock clock; act ck)
               ],
  ev w1  (evn(ne0,ne1,o1_i1,w2,w11,okay)),
  ev w11 (evn(ne1,ne0,o2_i2,w1,w14,okay)),
  ev w2  (evn(ne1,ne2,o1_i1,w3,w12,con1)),
  ev w12 (evn(ne2,ne1,o2_i2,w2,w11,con1)),
  ev w3  (evn(ne2,ne3,o1_i1,w4,w13,okay)),
  ev w13 (evn(ne3,ne2,o2_i2,w3,w12,okay)),
  ev w4  (evn(ne3,ne0,o1_i1,w1,w14,okay)),
  ev w14 (evn(ne0,ne3,o2_i2,w4,w13,okay)),
  ev_choice con [ fn() => (con1 := Con false; act con),
                  fn() => (con1 := Con true; act con)
                ]
]
val _ = (
  act w1; act con; act ck;
  reach_test :=
  ( fn init =>
    let
      val getq = fn Ne_vf ne => ne.q(#var ne)
      val q0 = getq ne0
      val rec all_eq = fn
        [] => false | [x] => x = q0 | x::xs => x = q0 andalso all_eq xs
    in
      if all_eq [getq ne1,getq ne2,getq ne3]
      then
      ( print "Clock loop detected!\n";
        backtrace();
        BasicIO.exit 0
      ) else ()
    end
  )
)
val print_comp = fn
  Clock{ql=n} => pr.printq `ck1(ql=^(pr.d n)) `
| Ne d =>
  let val r=ref(Ne d) in
    pr.printq `^(ne.id r)(q=^(pr.d(ne.q r)) `;
    pr.printq `o1=^(pr.d(#ql(ne.out1 r))) o2=^(pr.d(#ql(ne.out2 r)))) `
  end
| Con on => pr.printq `con1(^(if on then "on" else "off")) `

end end

Back to Article