The Tool Sycraft


Content


Introduction

Distributed processing brings a high level of computing power as well as robustness to computer systems. However, distribution of resources is often subject to unanticipated events called faults (e.g., message loss, processor crash, etc.) imposed by the environment that the program is running in. Since identifying the set of all possible faults that a distributed system is subject to is almost unfeasible at design time, it is desirable for designers of fault-tolerant distributed programs to have access to synthesis methods that transform existing fault-intolerant distributed programs to a fault-tolerant version. 

Sycraft (SYmboliC synthesizeR and Adder of Fault-Tolerance) is a tool for transforming distributed fault-intolerant programs to distributed fault-tolerant programs. In Sycraft, a distributed fault-intolerant program is specified in terms of a set of processes and an invariant. Each process is specified as a set of actions in a guarded command language, a set of variables that the process can read, and a set of variables that the process can write. The issue of distribution is addressed in shared-memory model where processes are constrained by their ability in reading and writing program variables. Given a set of fault actions and a safety specification, the tool synthesizes a fault-tolerant program via a symbolic implementation of respective synthesis algorithms [BK07]. Sycraft has successfully been used to synthesize some of the classic protocols in the literature of fault-tolerant computing in distributed systems (e.g., Byzantine agreement, token ring, etc.). 

The performance of the symbolic algorithms that SYCRAFT implements is discussed in [BK07] using case studies including the Byzantine agreement and token ring problems with various number of processes. By applying more advanced techniques than those introduced in [BK07] (e.g., exploiting symmetry in distributed processes and state space partitioning), we have been able to synthesize Byzantine agreement with 3 processes in 0.7s and up to 40 processes (reachable states of size 1045) in less than 100 minutes.

The tool has been tested using various case studies. These case studies include problems from the literature of fault-tolerant computing in distributed systems and problems from research and industrial institutes: Byzantine agreement, Byzantine agreement with fail-stop faults, Byzantine agreement with constrained speciation, token ring, triple modular redundancy, alternating bit protocol, an altitude switch controller, and a data dissemination protocol in sensor networks. The tool has been tested on Sun Solaris, Mac OS, and various distributions of Linux (e.g., Ubuntu, and Fedora). The tool Sycraft is written in C++ and the symbolic algorithms are implemented using the Glu/CUDD 2.1 package [Som]. The code of Sycraft is available freely and can be downloaded from here.

To use Sycraft, you need to be familiar with the formal framework for reasoning about fault-tolerant systems proposed by Arora and Gouda [AG93]. Some definitions and concepts in [AG93] (including the formal definition of masking fault-tolerance) have been revised in [KA00].


Theoretical background

The theory of foundations of fault-tolerant computing was first introduced by Arora and Gouda in [AG93] based on closure and convergence constraints for programs. They formally introduce the notion of levels of fault-tolerance, namely, failsafe, nonmasking and masking, based on satisfaction of safety and liveness properties of programs in the presence of faults. The concepts of safety and liveness are based on the ones proposed by Alpern and Schneider [AS85]. 

Based on the framework in [AG93], Kulkarni and Arora [KA00] provide solutions for automatic addition of fault-tolerance to fault-intolerant programs. Intuitively, given an existing program, say P, a set F of faults, a safety condition, and a reachability constraint, their solution synthesizes a fault-tolerant program, say P', such that (1) in the absence of faults, the set of computations of P' is a subset of the set of computations of P, and (2) in the presence of faults, P' never violates its safety condition, and, starting from any state reachable by program and fault transitions, P' satisfies its reachability condition in a finite number of steps. In particular, they show that while there exist sound and complete polynomial-time solutions to the synthesis problem for all levels of fault-tolerance with respect to centralized programs, the synthesis problem in the context of distributed programs is NP-complete.

In order to overcome this complexity, Kulkarni, Arora, and Chippada [KAC01] propose a set of polynomial-time enumerative (explicit-state) synthesis heuristics. However, those heuristics turned out to be ineffective due to the well-known state explosion problem. In order to remedy the state space problem Bonakdarpour and Kulkarni [BK07] develop a set of symbolic heuristics for synthesizing fault-tolerant moderate-size distributed programs. Sycraft implements these symbolic heuristics, i.e., the output of Sycraft is a masking fault-tolerant version of the input fault-intolerant program.


Example

In a token ring program, the processes 0::N are organized in a ring and the token is circulated along the ring in a fixed direction. Each process, say j, maintains a variable x with the domain {0, 1, 2}, where the value 2 denotes a corrupted value.  Process j, j <> 0, has the token iff x.j differs from its successor x.:(j + 1) and process N has the token iff x.N is the same as its successor x:0. Each process can only write its local variable (i.e., x:j). Moreover, a process can only read its own local variable and the variable of its predecessor. The following program in Sycraft models the token ring problem in its guarded command language. Specifically, process q which ranges over 1..N and process p models process number N. The [] operator denotes non-deterministic choice, i.e., if the guard of several guarded commands is true then one is chosen for execution non-deterministically.

program BinaryTokenRing;

const N = 3;
var int x.0..N: domain 0..2;

process q: 1..N
    x.q != x.(q - 1)     -->     x.q := x.(q - 1);

    prohibited (x.0 == 2) & (x.N == x.(0)`)

    read: x.q, x.(q - 1);
    write: x.q;
endprocess

process p
        (x.0 == 0) & (x.N == 0)     -->      x.0 := 1;
    []
        (x.0 == 1) & (x.N == 1)     -->      x.0 := 0;
    read: x.0, x.N;
    write: x.0;
endprocess

 

Faults can restart at most N-1 processes. Thus, the fault action in Sycraft is as follows:

fault TokenCorruption
    exists i, j in 0..N : (i != j) :: ((x.i != 2) & (x.j != 2))   -->   (x.0 := 2) [] (x.1 := 2) [] (x.2 := 2) [] (x.3 := 2);
endfault

 

In token ring processes whose state is uncorrupted are not allowed to copy the value of a corrupted process. Note that in token ring, since this constraint has to be met only by execution of program actions (not fault actions), we specify it under the prohibited section in Sycraft. Notice that primed variables denote the value of a variable in the target state of a transition:

prohibited  

    exists i in 0..N :: ((x.i != 2) & (x.(i)` == 2));

 

Finally, the invariant predicate of the token ring problem is determined as follows. Consider a state where a process  j has the token. In this state, since no other process has a token, the x value of all processes 0..j is identical and the x value of all processes (j+1)..N is identical. Letting X denote the string of binary values x.0, x.1,..., x:N, we have that X satisfies the regular expression (0l 1(N+1-l)  U  1l 0 (N+1-l) ), which denotes a sequence of length N+1 consisting of zeros followed by ones or ones followed by zeros. Thus, an invariant of the program BinaryTokenRing in Sycraft is as follows:

invariant
    ((x.0 == 1) & (x.1 == 0) & (x.2 == 0) & (x.3 == 0)) |
    ((x.0 == 1) & (x.1 == 1) & (x.2 == 0) & (x.3 == 0)) |
    ((x.0 == 1) & (x.1 == 1) & (x.2 == 1) & (x.3 == 0)) |
    ((x.0 == 1) & (x.1 == 1) & (x.2 == 1) & (x.3 == 1)) |
    ((x.0 == 0) & (x.1 == 0) & (x.2 == 0) & (x.3 == 0)) |
    ((x.0 == 0) & (x.1 == 0) & (x.2 == 0) & (x.3 == 1)) |
    ((x.0 == 0) & (x.1 == 0) & (x.2 == 1) & (x.3 == 1)) |
    ((x.0 == 0) & (x.1 == 1) & (x.2 == 1) & (x.3 == 1)) ;

 

An alert reader observes that the above program works correctly in the absence of faults. However, in the presence of faults, it deadlocks due to existence of several tokens or no tokens in the ring. Sycraft organizes actions of the output program in three categories. Unchanged actions entirely exist in the input program. Revised actions exist in the input program, but their guard or statement have been strengthened. Sycraft adds Recovery actions to enable programs to safely converge to their invariant predicate. Notice that the strengthened actions generally prohibit the program to reach a state from safety may be violated in the presence of faults. They also prohibits the program to reach deadlock states from where safe recovery is not possible. The output of Sycraft after adding fault-tolerance to BinaryTokenRing is as follows. Intuitively, a q-process in the synthesized program is allowed to copy the value of its predecessor, if this value in not corrupted. Note that the actions of the p-process in the synthesized program stipulate recovery actions that start from outside program invariant as well:

For process q = 1:

----------------------------------------------------
UNCHANGED ACTIONS:
----------------------------------------------------
----------------------------------------------------
REVISED ACTIONS:
----------------------------------------------------
1- (x0 == 1) & (x1 == 2)    -->    (x1 := 1)
2- (x0 == 0) & (x1 == 2)    -->    (x1 := 0)
3- (x0 == 1) & (x1 == 0)    -->    (x1 := 1)
4- (x0 == 0) & (x1 == 1)    -->    (x1 := 0)
----------------------------------------------------
NEW RECOVERY ACTIONS:
----------------------------------------------------
----------------------------------------------------


For process p:

---------------------------------------------------
UNCHANGED ACTIONS:
---------------------------------------------------
1- ((x0 == 0) & (x3 == 0))   -->   (x0 := 1)
2- ((x0 == 1) & (x3 == 1))   -->   (x0 := 0)
---------------------------------------------------
REVISED ACTIONS:
---------------------------------------------------
---------------------------------------------------
NEW RECOVERY ACTIONS:
---------------------------------------------------
1- (x0 == 2) & (x3 == 0)    -->    (x0 := 1)
2- (x0 == 2) & (x3 == 1)    -->    (x0 := 0)
---------------------------------------------------

More examples are available with the Sycraft code release.


Installation and content

  1. To download the latest stable version of Sycraft (latest version is 1.01) please fill out the below boxes and click on the download button 

    Name*           Email address*    Affiliation* 

    I would you like to be informed about future versions and developments on Sycraft.

  2. Move to where you would like the Sycraft source to reside and unpack the distribution:

    % cd /home                                                  # for example
    % gzip -dc /tmp/sycraft-1.01.tar.gz | tar xf -
  3. Move into the sycraft-1.01 directory and build Sycraft by running the make utility:

    % cd sycraft-1.01
    % make

    This should build Sycraft as well as the GLU package automatically. You may however, need to make some adjustments in the GLU package which in the directory glu-2.1 depending upon your operating system and architecture. For more information on how to build the GLU package, please refer to the README file in the glu-2.1 directory. 

For more information, please read the README file.


Documentation and resources


Future extensions


Contact

For more information, questions, bug report, suggestions, and comments please contact Borzoo Bonakdarpour at borzoo@cse.msu.edu.


People

People who are currently working on theory, design, and development of Sycraft include:


References

 


Warning: gethostbyaddr() [function.gethostbyaddr]: Address is not a valid IPv4 or IPv6 address in /user/borzoo/web/sycraft/index.php on line 557