This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
The dynamics of how the constituent components of a natural system interact defines the spatiotemporal response of the system to stimuli. Modeling the kinetics of the processes that represent a biophysical system has long been pursued with the aim of improving our understanding of the studied system. Due to the unique properties of biological systems, in addition to the usual difficulties faced in modeling the dynamics of physical or chemical systems, biological simulations encounter difficulties that result from intrinsic multi scale and stochastic nature of the biological processes.
This chapter discusses the implications for simulation of models involving interacting species with very low copy numbers, which often occur in biological systems and give rise to significant relative fluctuations. The conditions necessitating the use of stochastic kinetic simulation methods and the mathematical foundations of the stochastic simulation algorithms are presented. How the well organized structural hierarchies often seen in biological systems can lead to multi scale problems, and possible ways to address the encountered computational difficulties are discussed. We present
the details of the existing kinetic simulation methods, and discuss their strengths and shortcomings. A list of the publicly available kinetic simulation tools and our reflections for future prospects are also provided.
BACKGROUND AND INTRODUCTION
Reactions occurring among a set of reactants define a kinetic reaction network. Traditionally, the set of reactions is described by nonlinear ordinary or partial differential equations, solutions of which provide insights into the dynamics of the studied processes. The size of such networks can vary considerably. In some instances, hundreds of thousands of reactions may be needed to describe the complex, coupled phenomena. Although kinetic modeling and the underlying fundamental mathematics are used in vastly different fields of science and engineering, here
we will concentrate on biological kinetic networks which model cellular activities as a set of reaction processes. Biological networks are intrinsically very rich in character; feedback loops are common, and bifurcations and oscillations can lead to interesting dynamical behavior. Although there are many commonalities, two aspects, in particular, distinguish biological networks from networks found in other fields: Firstly, in biological systems, copy numbers of many species are very low, which can give rise to significant relative fluctuations. Secondly, biological systems tend to have well-organized structural hierarchies. The traditional way of modeling the time evolution of the molecular
populations in a reacting system is to use a set of coupled, first order, ordinary differential equations (ODEs) called the reaction rate equations (RRE). The RRE are usually inferred from phenomenological arguments under the assumption that the system is well-stirred or spatially homogeneous; that is, no persistent correlations develop among the positions of the molecules. The RRE is nonlinear in the presence of bimolecular reactions, which is almost always the case. Owing to the possibility of widely differing reaction rates, the RRE can often exhibit vastly different
time scales, a condition known as dynamical stiffness. Sometimes, under certain simplifying assumptions such as quasi steady-state or partial equilibrium, the RRE reduce to a system of differential-algebraic equations (DAEs).
However, a deterministic approach alone may not be sufficient for cellular systems when the underlying problem is inherently stochastic. Fluctuations in the concentrations of biomolecules can be quite large and this may require special treatment in modeling the biological networks . This is particularly true for molecular species involved in the transcriptional regulation of gene expression. In certain such cases, as the small copy number limit is reached, differential equation-based models break down since the assumption of continuous molecular concentrations is no longer valid. In such cases a more general approach, based on stochastic methods in which natural fluctuations are treated in a direct and explicit manner, has been advocated for the quantitative analysis of biological networks . A simple argument demonstrates why fluctuations can be considerable in cellular systems. The typical diameter of a microbial cell is a few micrometers (Î¼m), which corresponds to a cellular volume on the order of ~10 femtoliter (fl). In this volume, the concentration of 1 molecule is roughly equal to 160 picomolar (pM). This discrete nature is, of course, more severe when the molecules are confined to even smaller substructures, which is typically the case in eukaryotic cells. Thus, given that typical binding affinities of the biomolecules are often in the pM to Î¼M
range, and ignoring whether the thermodynamics are still valid at this limit, even very small fluctuations in the molecule copy numbers can cause a regime change in biomolecular interactions. Therefore, fluctuations resulting from a few discrete reactions can significantly affect the dynamical patterns in cellular systems. In a series of papers Gillespie showed the equivalency of deterministic and stochastic formalisms in the limit as the populations of all constituent chemical species tend to infinity, and discussed the importance of stochastic fluctuations when the system sizes are small (1,2). We note that stochastic differential equations are often used in physics, but they are usually obtained by starting with an assumed ordinary differential equation and then adding a "noise" term that has been carefully crafted to give a preconceived outcome. Although such approaches are computationally less expensive, they do not capture the true nature of the intrinsic fluctuations. Therefore, the use of discrete stochastic
approaches is often more suitable in studying biological systems. Although they provide a more realistic modeling frame, discrete stochastic simulations are
often too slow and computationally expensive.
Kinetic simulation of biochemical systems involves the following basic steps:
Identification of the kinetic problem: This step involves the determination of the input and output variables as well as the intermediates. Key species and physical properties to be simulated are determined and tabulated.
2. Model formulation: Although that does not have to be the case, most studies employ the reaction rate equations (RRE) to model biochemical systems. In the RRE, one simply defines the changes in the concentrations (or equivalently the number of
molecules) as a function of time and location. We note that constraints can be
embedded in RRE using a Lagrange undetermined multipliers type formalism.
3. Choosing a method: At this step a decision needs to be made as to whether to adopt a deterministic or stochastic formulation. Although it is still relatively uncommon, a
hybrid approach which moves between deterministic and stochastic regimes is
another possibility. In the deterministic approach, a reasonable time step is chosen
adaptively, based on the estimated local error according to a differential equation
model. In contrast, discrete stochastic simulations model each individual reaction
event. Using the probabilities of the reactions (called the reaction propensities), one
determines the occurrence statistics of the involved reactions. This is generally done
in one of two ways: In next reaction type methods, the sequence of the reactions are
chosen according to the reaction propensities and the reaction times are computed to
determine the elapsed time. In lumped reaction approaches, one can choose the size
of the elapsed time step and compute how many times the involved reactions occur
according to their probabilities.
4. Simulation: Integrating the RRE deterministically involves choosing an appropriate
algorithm from a wide selection of sophisticated numerical methods for systems of
ODEs and DAEs. Highly efficient and reliable software is readily available, although
one must usually determine whether or not the problem is stiff and then choose the
software accordingly. Roughly speaking, an ODE or DAE system is stiff if it involves
a wide range of time scales and the fastest of those scales correspond to stable
processes. In chemically reacting systems, the wide range of time scales can come
from some reaction rates being several orders of magnitude or more greater than
others. A solver which is designed for non-stiff systems will run very slowly (because
it needs to choose time steps on the scale of the fastest process in the system to maintain stability for its explicit formulas), if it is applied to a stiff problem. The cure for
stiffness is to approximate the differential equation using implicit methods. The
software that is available for the accurate solution of stiff ODE and DAE systems
always makes use of implicit methods. For more information on ODE and DAE
methods and software, see Ascher and Petzold .
Stochastic simulation algorithms are not as advanced as the deterministic integrators
and they do not scale well with the model size and complexity. Even though recent
algorithmic improvements have led to impressive gains, stochastic simulations are
still too slow for most realistic systems. However, as discussed in the previous section,
fluctuations can be large in biological systems and, more importantly, stochastic
variations may have implications for biological functions and responses. Therefore,
when feasible, stochastic simulation methods should be preferred. This is particularly
true for processes that involve small number of regulatory molecules with large
variations in their expression levels.
5. Trajectory analysis: Time courses obtained during the simulations are catalogued and combined to derive the statistical distributions of the desired output/prediction
quantities, such as the concentrations of experimentally monitored species and the
resulting material/mass flow in the system or the probability distribution of the
reactions. Similarly, from simulation results for different model parameter sets, one
can study the sensitivity to parameter variations. The following simple example illustrates steps (1-3). Assume that we are studying a receptor (R) system that binds a ligand L. The ligand bound receptors (RL) binds an effector/adaptor/ scaffold molecule E that helps with signal transduction. Receptors that are in complex with the
effector molecules are labeled as RLE. For this case, the kinetic system consists of the two reversible reactions, L+R RL and RL+E RLE. Forward and reverse reactions for these two reactions are second and first order reactions, respectively. For simplicity, let us use a mass action formalism where the rates for the first (Aâ†’C) and second (A+Bâ†’C) order reaction are represented as k[A] and k[A][B], respectively, where [X] stands for the concentration (or the number of molecules with the proper volume conversion) of species X. In this example, the species list would be L, R, RL, E, and RLE. The reaction list will contain,
four reactions (i) L+Râ†’RL, (ii) RLâ†’L+R, (iii) RL+Eâ†’RLE, and (iv) RLEâ†’RL+E. If RLE
is the species of interest (signal transducing receptors), the output of the simulations would be [RLE], the concentration of the species RLE. This identification completes the step (1).
Development of the RRE follows a simple rule: Reactions that decrease or increase the
concentration of a species appear as a source term in the rate equation for that species. For example, the reaction RL+Eâ†’RLE decreases [RL] and [E], and increases [RLE]. So the term k3[RL][E] is included as a negative contribution in the rate equations for d[RL]/dt and d[E]/ dt, and as a positive contribution in the d[RLE]/dt rate equation. Using this logic the RRE equations for the above example would be
Once the model is defined as above, preparation for the simulation is slightly different
depending on the method to be used. If a deterministic approach is chosen, then the RRE is investigated in terms of the stiffness of the equations, and an appropriate integrator is chosen. We note that for more general formalisms, singularity of the Jacobian is another point of concern but that subject is outside the scope of this chapter. If a stochastic simulation approach is chosen, then the RRE is converted into form where propensities are associated with each reaction.
In the stochastic approach, the stoichiometry matrix defines the change in the species
concentrations when the corresponding reaction occurs. Entries for the reactants and the products appear as negative and positive, respectively. With this information at hand, at every step of the stochastic simulation one can update the system configuration x(t) according to the
occurrence of reactions as xâ†’x+Î½Â·Ï‰ using the information about how many times the reactions occur Ï‰. Step (4), execution of the simulations, is the key step in the kinetic modeling studies. The numerical solution of ODEs or DAEs is a well established area with widely available software. Thus, we will focus on stochastic simulation methods in the remainder of this chapter.
Stochastic Simulation Algorithms - Fundamentals
The Stochastic Simulation Algorithm (SSA) developed by Gillespie and its variants are the basis for stochastic kinetic simulation of biological systems. In SSA it is assumed that successive reactive molecular collisions in the system are separated by many non-reactive collisions, which effectively wipe out all positional correlations among
Defining the state of the system by the vector x(t) whose components xi(t) (i=1 to N for N species) donate the number of molecules of species Si in the system at time t, one can establish from kinetic theory the following result: For each elemental chemical reaction channel Rj (j =1,...,M), there exists a function aj such that, if the system is in state x=(x1,â€¦, xN) at time t,then the probability that reaction Rj will occur somewhere inside the system in the next infinitesimal time dt is aj(x). This can be regarded as the fundamental premise of stochastic chemical kinetics. The function aj is called the propensity function of reaction channel Rj. Eachreaction channel Rj is completely characterized by its propensity function together with itsstate change vector Î½j which represents the change in the population of the molecular species caused by one Rj event, i.e., the stoichiometric coefficients of the reactions. Thus, if the system
is in state x and one Rj reaction occurs, the system jumps to state x+Î½j. This is mathematically known as a jump Markov process and a great deal is known about such processes (8,41).The stochastic evolution of the state vector x(t) is specified by the function P(x,t|x0,t0), defined as the probability that x(t) will equal x given that x(t0)= x0 for t0 â‰¤ t. It can be proven that this function obeys the following time-evolution equation (4,44):
This elegant equation is known as the chemical master equation (CME). Unfortunately, it is of such a large dimension (it includes a differential equation corresponding to every possible state of the system) that its solution is almost always intractable, both analytically and numerically. An alternate way of analyzing the behavior of x(t) is to construct its unbiased realizations using the SSA (1,2). The steps of SSA basically are:
i. Initialize the system's state x=x0 at time t=t0;
ii. Evaluate the reaction propensities aj(x) and their sum a0(x);
iii. Draw two random numbers r1 and r2 from a uniform distribution in the unit-interval,
and compute the time interval between the reactions Ï„= ln(r1)/a0(x) and determine
which reaction type Rj occurs next by finding the smallest integer j that satisfies
iv. Record and update the changes in the system tâ†’t+Ï„ and xâ†’x+Î½j; and
v. Return to step (ii) o r end the simulation.
The SSA advances the system in time from one reaction event to the next, and it does so in a way that is rigorously equivalent to the CME. The SSA is therefore an exact method for simulating the time evolution of a spatially homogeneous chemical system. Because SSA processes reaction events serially, i.e., one after another, it is often too slow to be practical when applied to real systems of interest. An extremely large molecular population of even onereactant species, and/or a very fast reaction, will usually cause the time increments Ï„ to be extremely small.
Acceleration of Discrete Stochastic Simulation
Several lines of research have been pursued in attempts to improve upon the exact SSA. These include more efficient implementations of the SSA and the development of approximate SSAs with improved numerical efficiencies.
The efficiency of the algorithms can strongly depend on the formulation and numerical
implementation and these are key aspects to consider when developing simulation software. Gibson and Bruck used an efficient implementation strategy for the exact SSA (10) to remove a key limitation, which restricted the practical use of the first reaction method of Gillespie (1), by not requiring the generation of Nr new random uniform deviates following a reaction event where Nr is the number of active reactions. Rather than a single waiting time between events, the method stores the absolute firing time for each reaction in an indexed priority binary heap queue with times increasing with increasing depth in the heap. Assuming a static reaction network topology, a reaction dependency graph is created initially. Following an event, which is identified by the element at the top of the heap, a single random uniform deviate is generated,
reaction propensities are efficiently updated using the dependency graph, and any absolute firing time requiring updating is adjusted through a linear transformation. Lastly, the heap queue is reordered. Gibson and Bruck have shown that the algorithm has a time complexity of order O(Nr+NE log Nr) where NE is the number of events and generally requires only a single random uniform deviate per event . While the next reaction method can be efficient for large, very sparse networks, significant overhead may incur in maintaining the indexed priority queue. This has led Cao et al. to conclude that, for most kinetic systems, the most efficient formulation of the SSA is an optimized direct Gillespie method . One way to speed up the next reaction method type algorithms is to allow multiple reactions to take place in longer time steps, which is the underlying idea behind the probability-weighted dynamic Monte Carlo (PW-DMC) method and the tau-leaping based algorithms . The basic premise behind the PW-DMC method is that reactions with large propensities
dominate the stochastic simulation. By attaching weights that are updated as the simulation progresses, the PW-DMC approach attempts to equalize the propensities of the reactions constituting the network model. In this process, the large propensities are scaled down with an associated factor, which corresponds to how many times the involved reaction occurs when selected. It was shown that the PW-DMC algorithm solves the multiple time scale problem to a large extent . Efficient statistical sampling obtained with the PW-DMC algorithm made the simulation of an integrated stochastic model of EGFR signaling network that contains tens of thousands of reactions possible using desktop computers in reasonable wall clock periods
. The success of PW-DMC results from its effectiveness in bundling the occurring
reactions, and in this aspect, it is conceptually similar to the approach pursued in tau-leaping methods. The main differences between the PW-DMC and tau-leaping methods are that in the PW-DMC the bundling of the reactions is done for each reaction type (i.e., some reactions can be selected to be left out if desired) and the integration time does not have to be set in advance, but the tolerance in the fluctuation levels for the species need to be predefined. These features make PW-DMC suitable to deal with the reaction networks that can be partitioned into subclasses according to their spatio-temporal properties where the allowed fluctuation levels can be dictated on a per reaction basis. It was shown that, while the fluctuation variations may be affected, the PW-DMC conserves the mean quantities. In this respect, PW-DMC is a
weak order (i.e., not a strong order) stochastic method. The fluctuations in the computed quantities depend on the achieved speed-up, which can be several orders of magnitude without any significant effect on the computed mean quantities .
In the tau-leaping (Ï„L) methods , advances in the system configuration are determined based on a preselected time step Ï„ during which more than one reaction event may occur. Tauleaping is based on the same premises that underlie the CME and the SSA: If Ï„ is chosen small enough to satisfy the leap condition, which says that none of the propensity functions may change noticeably during Ï„, then given x(t)=x, the system state at time t+Ï„ can be approximately computed as
The Pj are statistically independent Poisson random variables; they physically represent the number of times each reaction channel fires in time Ï„. The Poisson random variable P(a,Ï„) is in fact defined as the number of events that will occur in time Ï„, given that a dt is the probability of an event occurring in the next infinitesimal time dt. We emphasize that Ï„ must be chosen small enough that none of the propensity functions change significantly during the leap. Algorithms have been developed for estimating the largest value of Ï„ that will satisfy that condition for a prescribed level of simulation accuracy .
While the Ï„L methods show significant gains in efficiency, the unbounded Poisson distribution can lead to nonphysical system configurations in which negative species populations arise when the expected number of events is large enough to consume the reactant populations. Cao et al. have shown how to avoid negative populations in explicit Poisson Ï„L (PÏ„L) methods. The non-negative PÏ„L algorithm is based on the fact that negative populations typically arise from multiple firings of reactions that are only a few firings away from consuming all the molecules of one of their reactants. To focus on those reaction channels, the modified PÏ„L algorithm introduces a control parameter nc, a positive integer that is usually set somewhere between 10 and 20. Any reaction channel that is currently within nc firings of exhausting one of its reactants is then classified as a critical reaction. The algorithm chooses Ï„ in such a way
that no more than one firing of all the critical reactions can occur during the leap. Essentially, the algorithm simulates the critical reactions using SSA, and the remaining non-critical reactions using Ï„-leaping. An alternative approach is based on choosing the reaction numbers using binomially distributed random variables that have the appropriate finite sampling ranges. If a short time interval Ï„Î£ is considered, for each reaction Rj, the Poisson distribution is replaced with a binomial
distribution B(pj, Ï‰j*) with the same mean ajÏ„. Here Ï‰j* is an upper bound (or limit) othe
reaction number and pj is the probability that Rj will fire in the interval Ï„Î£/Ï‰j*. In the resulting binomial tau-leaping (BÏ„L) method, the leap size is then chosen as the smaller of the times tj computed from the constraints pj=ajtj/Ï‰j* â‰¤ 1, j=1:M and the upper bound Ï„Îµ. Once the tau step has been determined, each network reaction number Ï‰j is generated by sampling the corresponding binomial distribution. Although the binomial distribution has the desirable property of having a finite sampling range,
this in itself does not completely resolve the non-negativity problem. In networks with multiple channel reactant dependencies, the interdependence arising from species participating in multiple reactions may still generate negative species populations. Tian and Burrage have shown how the BÏ„L completely resolves the non-negativity problem when there is at most a two-reactant dependence in a reaction network . In the BÏ„L method of Chatterjee et al. the issue of multiple-channel reactant dependencies is dealt with by assigning to each reaction a binomial random variable for the number of reaction events during a step and by modifying reaction number limits through tracking currently available population sizes as reactions fire during that step. Yet, as critically noted by Chatterjee et al., in the latter approach selected reaction numbers may depend on the order in which reactions are processed . The multinomial tau-leaping (MÏ„L) method of Pettigrew and Resat efficiently extends the BÏ„L method to networks with arbitrary multiple-channel reactant dependencies .Improvements were achieved by a combination of three factors: First, tau-leaping steps are determined simply and efficiently using a-priori information and Poisson distribution based estimates of expectation values for reaction numbers. Second, networks are partitioned into
closed groups of reactions and corresponding reactants in which no group reactant set is foundin any other group. Third, product formation is factored into upper bound estimation of the number of times a particular reaction occurs. Together, these features allow for larger time leaping steps while the numbers of reactions occurring simultaneously in a multi-channelv manner are estimated accurately using a multinomial distribution. Partitioning of the reaction network into groups with closure can be done with a graph theoretic algorithm based on the Dulmage-Mendelsohn decomposition of the stoichiometric matrix representing the network. Two methods for selecting an upper limit Î©* on the system reaction number was advocated, the rate limiting reactant and the rejection methods . The former is a conservative approach that guarantees that the chosen reactions do not lead to negative populations after the system's configuration is updated according to the occurring reactions. In contrast, the selection process in the rejection method only guarantees that the system configuration after the tau-leaping step is non-negative. It however does not ensure that all
possible multi reaction paths that take the system to that final configuration involve only
physically feasible intermediate stages. It was argued that, since one can only detect the relationship between the starting and end points of a move during the simulation, ensuring the state of the system to always correspond to physically feasible configurations at the observation points (i.e., starting and end points of the moves in a simulation run) is the most crucial aspect in obtaining accurate results. Therefore, allowing a simulation trajectory to temporarily cross the solution space boundary would introduce insignificant inaccuracies because statistics for intermediate states have only an indirect impact. In the MÏ„L method, tau-leaping step is then chosen as the smaller of the times t computed from the constraint p = tÎ£/Î©* â‰¤ 1 and the upper bound Ï„Î£ where Î£ is the system propensity. The actual system reaction number Î© is then sampled from a binomial distribution as discussed above, and the reaction numbers Ï‰j, j=1:M
âˆ’1, are drawn from a multinomial distribution with corresponding probabilities Î±j= aj/Î£ and Ï‰M=Î©âˆ’Ï‰1âˆ’Ï‰2âˆ’ â€¦âˆ’Ï‰M-1. Implemented advances over the BÏ„L method were shown to lead
to significant speed ups in the simulations of biologically relevant models (32). It was also
shown that MÏ„L is less sensitive to the error parameter of the Ï„L methods, which is a desirable
aspect of the kinetic simulation algorithms.
It is well known that the Poisson random variable P(a,Ï„), which has mean and variance aÏ„, can
be approximated by a normal random variable with the same mean and variance when aÏ„â‰«1.
Using this fact, it can be shown that if Ï„satisfies not only the leap condition but also the
conditions aj(x)Ï„â‰«1 for all j, which physically means that during time Ï„ every reaction channel
can be expected to fire many more times than once, then the basic tau-leaping Eq. (2) further
approximates to (49)
Here the Nj(0,1) are statistically independent normal random variables with mean 0 and
variance 1. This modified update formula is computationally faster than the tau-leaping formula
(2) which it approximates, because samples of the normal random variable are much easier to
generate than samples of the Poisson random variable. Also, since by hypothesis every reaction
channel fires many more times than once during each time step Ï„, simulations using the updating
Eq. (3) will be very much faster than the SSA.
If the arguments leading to Eq. (3) are repeated with Ï„ replaced by a macroscopic infinitesimal
dt, which by definition is small enough that no propensity function changes significantly during
dt yet large enough that every reaction channel fires many more times than once during dt,
then one can deduce the equation
where the gammas are statistically independent Gaussian white noise processes (49-51).This
equation is called the chemical Langevin equation (CLE), and it is an example of what is known
to mathematicians as a stochastic differential equation. It is entirely equivalent to the Langevin
We note that stochastic differential equations are often used in physics, but they are usually
obtained by starting with an assumed ordinary differential equation and then adding a "noise"
term that has been carefully crafted to give some preconceived outcome. So it is noteworthy
that the noise term in Eq. (4), namely the second summation there, has actually been derived
from the same premises that underlie the CME and the SSA using physically transparent
assumptions and approximations. Use of the Langevin formulas (3) and (4) is typically justified
whenever all reactant species are present in sufficiently large numbers.
At the thermodynamic limit, molecular populations of all the reactants and the system volume
become infinitely large, while the concentrations of all species stay constant. It can be proven
that, in this limit, all propensity functions scale linearly with the system size. Thus, the
deterministic terms on the right sides of Eqs. (3) and (4) grow in proportion to the system size,
while the fluctuating terms grow in proportion to the square root of the system size. This is
another way of showing that for large systems "relative fluctuations in the species populations
scale as the inverse square root of the system size", a central rule of thermodynamics. In the
full thermodynamic limit, the foregoing scaling behavior implies that the fluctuating terms in
Eqs. (3) and (4) become negligibly small compared to the deterministic terms, and the CLE
This is none other than the traditional reaction rate equation (RRE), which is more commonly
used in terms of the concentration vector. Note that the RRE has not been assumed here; instead,
it has been derived from the same premises that underlie the CME and the SSA, using a series
of physically understandable approximations . Thus, the discrete, stochastic CME/SSA
and the continuous, deterministic RRE are actually connected across the end of the
representation spectrum. The challenge, of course, is to develop both the software infrastructure
and the theory necessary to make reliable adaptive decisions so that we can accurately and
efficiently simulate complex reaction systems based on detecting a discrete or continuous
regime and using an optimal blend of stochastic and deterministic methods.
Scaling Properties With Respect to the Network Size
In addition to the multiscale nature of the involved kinetic processes, computational efficiency
of stochastic simulation algorithms also depend on several measures of the size of the
investigated network model, such as the overall complex population, the total number of
reactions, and the average number of nodal interactions or connectivity in a network.
Understanding how algorithms scale with size and nodal connectivity is an important
consideration in the design of numerical experiments for dynamic biological systems. As the
complexity of the systems studied in biological systems increases, such scaling issues become
a serious point of concern.
To study how stochastic kinetic simulation algorithms scale with network size, construction
of scalable reaction models that can be used in benchmark studies has been advocated (42).
Pettigrew and Resat used two scalable reaction models to compare the efficiency of the exact
Gillespie algorithm as implemented using the popular Gibson-Bruck method (GGB) (10) with
the inexact random substrate method of Firth and Bray . Foundations of these two
algorithms are quite distinct from each other and the latter algorithm has been stated to be an
efficient method when the total complex population is small and the number of multistate
The first investigated scalable model, the Size Scalable Model (SSM), is a four majorcompartment
model that is typical for a signal transduction network in eukaryotic cells .
The SSM involves 2 unimolecular and 5 bimolecular reversible reactions with mass transfer
between compartments for 8 complex types including ligands, adapter complexes, and
receptors . Upon phosphorylation the receptor can become active in signal transduction.
A sub-compartmentalization process, which is equivalent to forming smaller locally
homogeneous regions, allows for the creation of refined models where the size of the model
increases in proportion to the number of created subcompartments while maintaining the
connectivity (i.e., the topology) between the species. For SSM, it was found that the GGB
algorithm, whose numerical efficiency scales with the logarithm of the total number of active
reactions, performs significantly better in all cases including those characterized by a very
small number (~100) of complex types, each with a very small population (42).
The second scalable model, the Variable Connectivity Model (VCM), is a single compartment
signal transduction network involving 2 unimolecular and 2 bimolecular reversible reactions
for 3 complex types, ligand, receptor, and ligand receptor complexes (42). The receptors in
this model have multiple phosphorylation sites. The size and the node connectivity between
the species increase in proportion to the number of phosphorylation sites N. For the VCM, it
was shown that the FB random substrate method, whose efficiency scales with the square of
the total complex population, can outperform the GGB algorithm when the node connectivity
is sufficiently high and the total complex population sufficiently low. Performance of the FB
and GGB algorithms were determined as a function of the number of phosphorylation sites for
two distinct population sets. In the first set the total complex population was initially set at 192
complexes (128 ligands and 64 receptors), while for the second set the total complex population
was 10 times higher with the 2:1 ratio of ligands to receptors unchanged. It was found that, for
the smaller population set the FB algorithm becomes more efficient than the GGB when the
number of phosphorylation sites N reaches 6, whereas in the larger population set the GGB is
clearly superior for small N up to a crossover point that occur around N~11. Thus, it was
concluded that with the exception of special cases, GGB is a more efficient method than the
This simple study using two sample realistic networks clearly illustrated that the most efficient
algorithm can depend on the problem type and the used simulation method should be chosen
after a careful study of the topology and multiscale properties of the investigated network.
FUTURE ROSPECTS OF STOCHASTIC SIMULATION
In recent years there has been rapid progress in the analysis and development of algorithms for
the accelerated discrete stochastic simulation of chemically reacting systems. Although these
developments provide a good foundation, existing stochastic simulation methods are far from
being computationally efficient to simulate detailed biological models. This is particularly true
for the models that involve multiple time and spatial scales while still requiring high resolution
for accurate representation. Thus, the development of efficient and adaptive multiscale
algorithms and software for stochastic-deterministic simulations are urgently needed. This is
particularly true for the simulation of spatially resolved models because immense recent
progress in imaging is beginning to make the data about the inhomogeneous distribution of
molecular species available. As the expense associated with spatially resolved models makes
fully stochastic simulations nearly infeasible, efforts to develop hybrid approaches will become
very important. Hybrid methods have recently been proposed to simulate multiscale
chemically reacting systems. These methods combine the traditional deterministic reaction rate
equation, or alternatively the chemical Langevin equation, with the SSA. The idea is to split
the system into two regimes: the continuous regime and the discrete regime. The RRE is used
to simulate the fast reactions between species with large populations and the SSA is used for
slow reactions or species with small populations. The conditions for the continuous regime are
1) the number of instances of each molecular species in a reaction in the continuous regime
must be large relative to one, and 2) the number of reaction events of each reaction occurring
within one time step of the numerical solver must be large relative to one . If either condition
is not satisfied for a reaction channel, that reaction channel must be handled in the discrete
The hybrid methods efficiently use the multiscale properties of the problem but there are still
some fundamental unsolved problems. The first is the automatic partitioning of systems. This
needs to be done very carefully, because both accuracy and efficiency depends on a proper
partitioning. It will require a precise awareness of the assumptions under which algorithms and
approximation available in the multiscale framework are valid, and the corresponding tests on
the system to determine the partitioning. This is one reason why a proper theoretical foundation
for each algorithm is so critical - to ensure that we are using it only to make approximations
for which the corresponding assumptions are valid. Concerns about automatic step size
selection continue to exist. Another difficult problem arises in the case when a reaction is fast
but one of the corresponding reactants has a small population thus failing to satisfy the second
requirement above for the deterministic regime. Such a reaction channel will still be handled
by the SSA, resulting in a very slow simulation. New approaches and approximations, such as
the slow-scale SSA have the potential to overcome this limitation.
Another promising area of research is the application of tau-leaping based stochastic algorithms
to problems involving reaction-diffusion processes. When the SSA algorithm is applied to
reaction-diffusion processes the standard approach is to subdivide the heterogeneous system
volume into a sufficiently large number of small-volume elements V0~O(h3) so that the
assumption of a homogeneous, well-stirred system within each element is recovered. With
diffusion considered as a first-order reaction in which molecules of a species are exchanged
between neighboring elements at a rate rD âˆV0D/h2 , kinetic reactions within an element occur
at a rate rCâˆV0Î£ . Here D and Î£ are, respectively, the diffusion constant and the total system
propensity. By choosing elements sufficiently small so that rC â‰ª rD is true for all elements,
the SSA may be applied although under these circumstances it is not usual to be confronted
with a reaction network where the total number of processes and species is now several orders
of magnitude larger than that found in a single element. Moreover, it has been observed in
stochastic simulations with the SSA that reaction-diffusion processes are strongly dominated
by diffusion events. The increase in network size and diffusion dominated events both
contribute to a severe reduction in efficiency and it is highly likely that a number of more recent
stochastic algorithms for reaction-diffusion suffer from the same problem. It is possible that
Ï„-leaping methods may be well suited to overcoming this issue with the stochastic simulation
of reaction-diffusion problems.
As stochastic simulation is used more and more often in studying the dynamics of biological
systems, and as biological models become larger and more complicated, there is a need for the
development and efficient implementation of simulation algorithms that can handle the
increasing complexity. Thus, the issue of how an algorithm scales with the system size is crucial
in assessing the limitations of the algorithm. Scaling concerns expose the serious need for
establishing realistic scalable models that can be used in benchmark studies for algorithm
efficiency comparison. Pettigrew and Resat have introduced two such models. It was
observed that the scaling qualities of the algorithms investigated break down significantly
when the network reaches a certain size. We suspect that this is a common occurrence for all
of the existing algorithms, which unfortunately can impose severe limitations on the utility of
the stochastic kinetic approaches. Roughly estimating, existing stochastic simulation
algorithms can be used to investigate reaction networks containing ~105 reactions using a
desktop machine. Utilization of high-performance resources may push this limit to 108.
Considering that even the smallest microbial community or physiologically relevant eukaryotic
network can easily contain >1012 reactions, it is clear that the scaling properties of the stochastic
simulation algorithms need significant improvements before they can be widely used for larger
problems. As discussed above, hybrid methods that reduce the computational complexity by
treating the non-critical parts of the network using deterministic models may offer the most
suitable solution to the scaling bottleneck in the near future.
We conclude by observing that while recent improvements in algorithm development,
computational implementation, and hybrid approaches are indications of a promising future
for the stochastic kinetic simulation of biological systems, there is still plenty of room for
achieving significant advances.