Runtime verification the supervisor behaviour in Erlang

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Runtime verification is considered to be a lightweight verification technique which complements traditional verification techniques such as theorem proving, model checking and testing. In essence, runtime verification deals with the detection of violations by observing an execution of a system under scrutiny.

An essential part of runtime verification is the notion of a monitor. A monitor is a device that reads a finite trace and yields a certain verdict. Basically, a monitor determines whether the current execution satisfies a given property. Monitors are generated from a high-level specification such as LTL (a type of linear temporal logic).

JInterface is a library that offers a set of tools which enable Erlang processes to communicate with other platforms' processes (mainly Java). The java implemented node impersonates an Erlang node which can speak the wire protocol thus it's able to do all Erlang node's operations like message passing, linking and unlinking etc…

JInterface can be used to integrate a runtime verification engine, implemented in java, to Erlang. The advantage of this approach is that we can use the runtime verification engine without doing any alternations to it. But at the same time, we must keep in mind that JInterface has some weaknesses, which are:

Process creation is expensive since OS-level threads are used in contrast to an Erlang process.

Context switching is also expensive in contrast to Erlang.

JInterface development has stopped and is not being updated.

After considering all options, we decided not to go for the JInterface approach and instead we agreed to re-implement the LARVA verification engine in Erlang. This approach offers a cleaner system and allows us to take full benefit of the advantages that Erlang offers without any restrictions that JInterface might include.

Offline and online runtime verification

In online runtime verification, monitoring is done in synchronization with the system. This approach offers the possibility to take some actions when some unwanted behaviour is detected. Although the possibility to take some actions when an unwanted behaviour is detected is often desirable, online monitoring can introduce interference with system under observation and possibly slowdown the system.

On the other hand, offline runtime verification is done by monitoring a log of events, implying that recovery actions cannot be taken at the time an unwanted behaviour was detected. In contrast with online monitoring, offline monitoring does not interfere with the system apart from the logging of events which in general is done in any case.

Erlang provides a set of tracing tools which give full visibility of the inner workings of a system. The low-level tracing mechanism in Erlang is enabled and disabled through the built-in function erlang:trace/3. The tracing mechanism is able to monitor concurrency, code execution and memory usage. This tracing mechanism offers the possibility to generate events without any instrumentation.

Therefore we can use the Erlang tracing mechanism to adopt an offline monitoring approach. On the other hand, we can integrate the Erlang tracing mechanism with instrumentation to obtain an online monitoring approach.

The trade-off is whether adopting offline monitoring which will yield a cleaner approach because the system will remain intact without any modifications, which imply no interference, or adopting online monitoring which can gives us the possibility to take some actions when an unwanted behaviour is detected, but might interfere with the system.

Prototype - LARVA bank example converted in Erlang:

LARVA Script



%% Event_Name = {M, F, A}

AddUser = {Bank, add_user, 1}

DeleteUser = {Bank, delete_user, 1}




%% [{name, value}, {name2, value2}]





toomany {}

baddelete {}



ok {}



start {io:format("Started").}




ok -> toomany [AddUser\userCnt >= 5]

ok -> ok [AddUser\\userCnt++]

ok -> start [DeleteUser\userCnt == 1\userCnt--]

ok -> ok [DeleteUser\\userCnt--]

start -> ok [AddUser\\userCnt++]

start -> baddelete [DeleteUser\\]




The property users, in the above script, states that in the following two scenarios the system reaches a bad state:

either 5 or more users are added to the system

one tries to delete a user when there are no users added to the system


The system is implemented as a gen_server behaviour. The server waits for client requests which can, add a user, delete a user or check whether a particular user exists.


start_link() ->

gen_server:start_link({local, ?SERVER}, ?MODULE, [], []).

The start_link/0 function initiates the server by calling the gen_server:start_link/4 function which in turn calls the function init/1 found in the bank module. The init/1 function creates a dictionary that is used as the state of the server.

add_user(Name) ->

gen_server:call(?SERVER, {add, Name}).

The add_user/1 function takes the name of a user as input and calls the gen_server:call/2 function which in turn calls the callback handle_call/3. At this stage, the name is added to the dictionary.

delete_user(Name) ->

gen_server:call(?SERVER, {delete, Name}).

The delete_user/1 function removes the entry from the dictionary where the input Name matches a key in the dictionary.

exists_user(Name) ->

gen_server:call(?SERVER, {exists, Name}).

The exists_user/1 function checks whether the name of a user passed as input matches any key in the dictionary.


The task of the supervisor is to start the system and control the system tracing. The system is started through the function start_link/1:

start_link(ChildSpec) ->

supervisor:start_link(?MODULE, ChildSpec).

The start_link/1 function takes a child specification as input. The following is an example of a child specification:

[{bank, {bank, start_link, []}, permanent, brutal_kill, worker, [bank]}]

The tracing process is initiated by the init_trace/0 function:

init_trace() ->

spawn_link(?MODULE, trace, []).

A process is spawned and linked to the supervisor and computes the trace/0 function:

trace() ->

erlang:trace(all, true, [call, timestamp]),

%% Filter calls from EVENTS section in the larva script

erlang:trace_pattern({bank, add_user, 1}, true, [local]),

erlang:trace_pattern({bank, delete_user, 1}, true, [local]),


This function initiates tracing by calling the erlang:trace/3 and then filters the tracing process by passing the events in the LARVA script as arguments to the function erlang:trace_pattern/3. Afterwards, the process calls the trace_loop/0 function:

trace_loop() ->

receive {trace_ts, From, Event, {_M, _F, _A}, Timestamp} ->

gen_fsm:send_event(bank_users_fsm, {_M, _F, length(_A)}),



This function receives trace events which are sent in the form of Erlang messages from the virtual machine. When a message is received an event is sent to the FSM by calling gen_fsm:send_event/2.


The FSM is implemented as a gen_fsm behaviour. We can deduce from the above LARVA script that the FSM will contain 4 states. Thus we have to implement 4 functions, one for each state:


start(Event, State) ->

case Event of

{bank, add_user, 1} ->

{_Key, Value} = lists:nth(1, ets:lookup(State, userCnt)),

ets:insert(State,{userCnt, (Value + 1)}),

io:format("Moved to state: ok~n"),

{next_state, ok, State};

{bank, delete_user, 1} ->

io:format("Moved to bad state: baddelete~n"),

%% terminate child

supervisor:terminate_child(bank_supervisor, bank),

{next_state, baddelete, State};

Unknown_event ->

io:format("Unknown event~n"),

{next_state, start, State}


From the LARVA script we can see that there exist 2 transitions that can be made from the start state. This behaviour is implemented by a case construct. If the event received is add_user the variable userCnt is incremented by 1 and a message, which indicates that the state is changed to ok, is outputted. If the event received is delete_user, the state is changed to baddelete and the system is terminated by calling the terminate_child/2 function.


ok(Event, State) ->

case Event of

{bank, add_user, 1} ->

{_Key, Value} = lists:nth(1, ets:lookup(State, userCnt)),


Value >= 5 ->

io:format("Moved to bad state: toomany~n"),

%% terminate child

supervisor:terminate_child(bank_supervisor, bank),

{next_state, toomany, State};

true ->

ets:insert(State,{userCnt, (Value + 1)}),

io:format("Remained in state: ok~n"),

{next_state, ok, State}


{bank, delete_user, 1} ->

{_Key, Value} = lists:nth(1, ets:lookup(State, userCnt)),


Value == 1 ->

ets:insert(State,{userCnt, (Value - 1)}),

io:format("Moved to state: start~n"),

{next_state, start, State};

true ->

ets:insert(State,{userCnt, (Value - 1)}),

io:format("Remained in state: ok~n"),

{next_state, ok, State}


Unknown_event ->

io:format("Unknown event~n"),

{next_state, ok, State}


The ok state can compute 4 transitions. The case construct catches the event while the if constructs verify whether a certain condition holds. If the event received is add_user the variable userCnt is incremented by 1 only if userCnt is less than 5, otherwise the state changes to toomany and a message is outputted. If the event received is delete_user the variable userCnt is decremented by 1 while if userCnt is equal to 1, the state changes to start.


toomany(Event, State) ->

io:format("Bad state - no action is taken~n"),

{next_state, toomany, State}.

This function outputs a message to indicate that the FSM is in a bad state.


baddelete(Event, State) ->

io:format("Bad state - no action is taken~n"),

{next_state, baddelete, State}.

This function has the same behaviour as the toomany/2 function.

Custom behaviour

The custom behaviour links all the above components together. The custom behaviour name is gen_monitor and it is implemented as a gen_server behaviour.




behaviour_info(callbacks) ->

[{init, 1}];

behaviour_info(_Other) ->


It requires the module applying it to implement the callback init/1 which purpose is to pass the LARVA script path and the child specification (which will be used to start the system) to the gen_monitor behaviour. The gen_monitor behaviour is started through the start_link/4 function:

start_link(Name, Mod, Args, Options) ->

gen_server:start_link(Name, ?MODULE, {Mod, Args}, Options).

This function calls the gen_server's start_link/4 function which in turn calls the init/1 function found in the gen_monitor module:

init({Mod, Args}) ->

case catch Mod:init(Args) of

{ok, ScriptPath, ChildSpec} ->

State = loop(ScriptPath, ChildSpec),

{ok, State};

{stop, Reason} ->


ignore ->


{'EXIT', Reason} ->


Else ->

Error = {bad_return_value, Else},



init(_) ->


This function calls the init/1 function of the module passed as argument. If the return value matches the tuple {ok, ScriptPath, ChildSpec} function loop/2 is called:

loop(ScriptPath, ChildSpec) ->

%% Read File with path ScriptPath - LARVA Script

{ok, File} = read_file(ScriptPath),

%% Tokenize Script

{ok, Tokens} = tokenize(File),

%% Build FSM

{ok, FSM} = build_fsm(Tokens),

%% Build Supervisor

{ok, SUP} = build_sup(Tokens),

%% Compile FSM and Supervisor

compile(FSM, SUP),

%% Run FSM and Supervisor

run(FSM, SUP, ChildSpec),


The loop/2 function:

reads the LARVA script file

tokenizes the file

builds the FSM

builds the Supervisor

compiles the generated modules

starts the FSM and Supervisor (the system is started implicitly by the Supervisor)

System run

First run tries to delete a user when there are no users added to the system which violates the property users

Command number 3 starts the gen_monitor behaviour. Command 4 shows us that the system named bank is up and running since we received a valid PID from the whereis/1 function. In command number 5 we try to delete a user while the FSM is in the start state. The FSM outputs a message to indicate that the state is changed to baddelete. In command 6 we call again the whereis/1 function to see whether the system is still running and we receive undefined which shows that the system has terminated.

Second run tries to add more than 5 users which violates the property users

Command number 2 starts the gen_monitor behaviour as in the previous run. Command 3 adds a user to the system and the FSM changes the state to ok. Till command 7 we continue adding users to the system and the FSM state remains ok. On the other hand in command 8 we try to add the sixth user and the FSM outputs that it reaches the bad state toomany. In command 9 we verify that the system was terminated.