# Kinetics Of Biological Systems Biology Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The dynamics of how the constituent components of a natural system interact defines the spatiotemporal response of the system to stimuli. Modeling the kinetics of the processes that represent a biophysical system has long been pursued with the aim of improving our understanding of the studied system. Due to the unique properties of biological systems, in addition to the usual difficulties faced in modeling the dynamics of physical or chemical systems, biological simulations encounter difficulties that result from intrinsic multi scale and stochastic nature of the biological processes.

This chapter discusses the implications for simulation of models involving interacting species with very low copy numbers, which often occur in biological systems and give rise to significant relative fluctuations. The conditions necessitating the use of stochastic kinetic simulation methods and the mathematical foundations of the stochastic simulation algorithms are presented. How the well organized structural hierarchies often seen in biological systems can lead to multi scale problems, and possible ways to address the encountered computational difficulties are discussed. We present

the details of the existing kinetic simulation methods, and discuss their strengths and shortcomings. A list of the publicly available kinetic simulation tools and our reflections for future prospects are also provided.

BACKGROUND AND INTRODUCTION

Reactions occurring among a set of reactants define a kinetic reaction network. Traditionally, the set of reactions is described by nonlinear ordinary or partial differential equations, solutions of which provide insights into the dynamics of the studied processes. The size of such networks can vary considerably. In some instances, hundreds of thousands of reactions may be needed to describe the complex, coupled phenomena. Although kinetic modeling and the underlying fundamental mathematics are used in vastly different fields of science and engineering, here

we will concentrate on biological kinetic networks which model cellular activities as a set of reaction processes. Biological networks are intrinsically very rich in character; feedback loops are common, and bifurcations and oscillations can lead to interesting dynamical behavior. Although there are many commonalities, two aspects, in particular, distinguish biological networks from networks found in other fields: Firstly, in biological systems, copy numbers of many species are very low, which can give rise to significant relative fluctuations. Secondly, biological systems tend to have well-organized structural hierarchies. The traditional way of modeling the time evolution of the molecular

populations in a reacting system is to use a set of coupled, first order, ordinary differential equations (ODEs) called the reaction rate equations (RRE). The RRE are usually inferred from phenomenological arguments under the assumption that the system is well-stirred or spatially homogeneous; that is, no persistent correlations develop among the positions of the molecules. The RRE is nonlinear in the presence of bimolecular reactions, which is almost always the case. Owing to the possibility of widely differing reaction rates, the RRE can often exhibit vastly different

time scales, a condition known as dynamical stiffness. Sometimes, under certain simplifying assumptions such as quasi steady-state or partial equilibrium, the RRE reduce to a system of differential-algebraic equations (DAEs).

However, a deterministic approach alone may not be sufficient for cellular systems when the underlying problem is inherently stochastic. Fluctuations in the concentrations of biomolecules can be quite large and this may require special treatment in modeling the biological networks . This is particularly true for molecular species involved in the transcriptional regulation of gene expression. In certain such cases, as the small copy number limit is reached, differential equation-based models break down since the assumption of continuous molecular concentrations is no longer valid. In such cases a more general approach, based on stochastic methods in which natural fluctuations are treated in a direct and explicit manner, has been advocated for the quantitative analysis of biological networks . A simple argument demonstrates why fluctuations can be considerable in cellular systems. The typical diameter of a microbial cell is a few micrometers (Î¼m), which corresponds to a cellular volume on the order of ~10 femtoliter (fl). In this volume, the concentration of 1 molecule is roughly equal to 160 picomolar (pM). This discrete nature is, of course, more severe when the molecules are confined to even smaller substructures, which is typically the case in eukaryotic cells. Thus, given that typical binding affinities of the biomolecules are often in the pM to Î¼M

range, and ignoring whether the thermodynamics are still valid at this limit, even very small fluctuations in the molecule copy numbers can cause a regime change in biomolecular interactions. Therefore, fluctuations resulting from a few discrete reactions can significantly affect the dynamical patterns in cellular systems. In a series of papers Gillespie showed the equivalency of deterministic and stochastic formalisms in the limit as the populations of all constituent chemical species tend to infinity, and discussed the importance of stochastic fluctuations when the system sizes are small (1,2). We note that stochastic differential equations are often used in physics, but they are usually obtained by starting with an assumed ordinary differential equation and then adding a "noise" term that has been carefully crafted to give a preconceived outcome. Although such approaches are computationally less expensive, they do not capture the true nature of the intrinsic fluctuations. Therefore, the use of discrete stochastic

approaches is often more suitable in studying biological systems. Although they provide a more realistic modeling frame, discrete stochastic simulations are

often too slow and computationally expensive.

## SIMULATION METHODS

## General Description

Kinetic simulation of biochemical systems involves the following basic steps:

Identification of the kinetic problem: This step involves the determination of the input and output variables as well as the intermediates. Key species and physical properties to be simulated are determined and tabulated.

2. Model formulation: Although that does not have to be the case, most studies employ the reaction rate equations (RRE) to model biochemical systems. In the RRE, one simply defines the changes in the concentrations (or equivalently the number of

molecules) as a function of time and location. We note that constraints can be

embedded in RRE using a Lagrange undetermined multipliers type formalism.

3. Choosing a method: At this step a decision needs to be made as to whether to adopt a deterministic or stochastic formulation. Although it is still relatively uncommon, a

hybrid approach which moves between deterministic and stochastic regimes is

another possibility. In the deterministic approach, a reasonable time step is chosen

adaptively, based on the estimated local error according to a differential equation

model. In contrast, discrete stochastic simulations model each individual reaction

event. Using the probabilities of the reactions (called the reaction propensities), one

determines the occurrence statistics of the involved reactions. This is generally done

in one of two ways: In next reaction type methods, the sequence of the reactions are

chosen according to the reaction propensities and the reaction times are computed to

determine the elapsed time. In lumped reaction approaches, one can choose the size

of the elapsed time step and compute how many times the involved reactions occur

according to their probabilities.

4. Simulation: Integrating the RRE deterministically involves choosing an appropriate

algorithm from a wide selection of sophisticated numerical methods for systems of

ODEs and DAEs. Highly efficient and reliable software is readily available, although

one must usually determine whether or not the problem is stiff and then choose the

software accordingly. Roughly speaking, an ODE or DAE system is stiff if it involves

a wide range of time scales and the fastest of those scales correspond to stable

processes. In chemically reacting systems, the wide range of time scales can come

from some reaction rates being several orders of magnitude or more greater than

others. A solver which is designed for non-stiff systems will run very slowly (because

it needs to choose time steps on the scale of the fastest process in the system to maintain stability for its explicit formulas), if it is applied to a stiff problem. The cure for

stiffness is to approximate the differential equation using implicit methods. The

software that is available for the accurate solution of stiff ODE and DAE systems

always makes use of implicit methods. For more information on ODE and DAE

methods and software, see Ascher and Petzold .

Stochastic simulation algorithms are not as advanced as the deterministic integrators

and they do not scale well with the model size and complexity. Even though recent

algorithmic improvements have led to impressive gains, stochastic simulations are

still too slow for most realistic systems. However, as discussed in the previous section,

fluctuations can be large in biological systems and, more importantly, stochastic

variations may have implications for biological functions and responses. Therefore,

when feasible, stochastic simulation methods should be preferred. This is particularly

true for processes that involve small number of regulatory molecules with large

variations in their expression levels.

5. Trajectory analysis: Time courses obtained during the simulations are catalogued and combined to derive the statistical distributions of the desired output/prediction

quantities, such as the concentrations of experimentally monitored species and the

resulting material/mass flow in the system or the probability distribution of the

reactions. Similarly, from simulation results for different model parameter sets, one

can study the sensitivity to parameter variations. The following simple example illustrates steps (1-3). Assume that we are studying a receptor (R) system that binds a ligand L. The ligand bound receptors (RL) binds an effector/adaptor/ scaffold molecule E that helps with signal transduction. Receptors that are in complex with the

effector molecules are labeled as RLE. For this case, the kinetic system consists of the two reversible reactions, L+R RL and RL+E RLE. Forward and reverse reactions for these two reactions are second and first order reactions, respectively. For simplicity, let us use a mass action formalism where the rates for the first (Aâ†’C) and second (A+Bâ†’C) order reaction are represented as k[A] and k[A][B], respectively, where [X] stands for the concentration (or the number of molecules with the proper volume conversion) of species X. In this example, the species list would be L, R, RL, E, and RLE. The reaction list will contain,

four reactions (i) L+Râ†’RL, (ii) RLâ†’L+R, (iii) RL+Eâ†’RLE, and (iv) RLEâ†’RL+E. If RLE

is the species of interest (signal transducing receptors), the output of the simulations would be [RLE], the concentration of the species RLE. This identification completes the step (1).

Development of the RRE follows a simple rule: Reactions that decrease or increase the

concentration of a species appear as a source term in the rate equation for that species. For example, the reaction RL+Eâ†’RLE decreases [RL] and [E], and increases [RLE]. So the term k3[RL][E] is included as a negative contribution in the rate equations for d[RL]/dt and d[E]/ dt, and as a positive contribution in the d[RLE]/dt rate equation. Using this logic the RRE equations for the above example would be

Once the model is defined as above, preparation for the simulation is slightly different

depending on the method to be used. If a deterministic approach is chosen, then the RRE is investigated in terms of the stiffness of the equations, and an appropriate integrator is chosen. We note that for more general formalisms, singularity of the Jacobian is another point of concern but that subject is outside the scope of this chapter. If a stochastic simulation approach is chosen, then the RRE is converted into form where propensities are associated with each reaction.

In the stochastic approach, the stoichiometry matrix defines the change in the species

concentrations when the corresponding reaction occurs. Entries for the reactants and the products appear as negative and positive, respectively. With this information at hand, at every step of the stochastic simulation one can update the system configuration x(t) according to the

occurrence of reactions as xâ†’x+Î½Â·Ï‰ using the information about how many times the reactions occur Ï‰. Step (4), execution of the simulations, is the key step in the kinetic modeling studies. The numerical solution of ODEs or DAEs is a well established area with widely available software. Thus, we will focus on stochastic simulation methods in the remainder of this chapter.

## Stochastic Simulation Algorithms - Fundamentals

The Stochastic Simulation Algorithm (SSA) developed by Gillespie and its variants are the basis for stochastic kinetic simulation of biological systems. In SSA it is assumed that successive reactive molecular collisions in the system are separated by many non-reactive collisions, which effectively wipe out all positional correlations among

Defining the state of the system by the vector x(t) whose components xi(t) (i=1 to N for N species) donate the number of molecules of species Si in the system at time t, one can establish from kinetic theory the following result: For each elemental chemical reaction channel Rj (j =1,...,M), there exists a function aj such that, if the system is in state x=(x1,â€¦, xN) at time t,then the probability that reaction Rj will occur somewhere inside the system in the next infinitesimal time dt is aj(x). This can be regarded as the fundamental premise of stochastic chemical kinetics. The function aj is called the propensity function of reaction channel Rj. Eachreaction channel Rj is completely characterized by its propensity function together with itsstate change vector Î½j which represents the change in the population of the molecular species caused by one Rj event, i.e., the stoichiometric coefficients of the reactions. Thus, if the system

is in state x and one Rj reaction occurs, the system jumps to state x+Î½j. This is mathematically known as a jump Markov process and a great deal is known about such processes (8,41).The stochastic evolution of the state vector x(t) is specified by the function P(x,t|x0,t0), defined as the probability that x(t) will equal x given that x(t0)= x0 for t0 â‰¤ t. It can be proven that this function obeys the following time-evolution equation (4,44):

This elegant equation is known as the chemical master equation (CME). Unfortunately, it is of such a large dimension (it includes a differential equation corresponding to every possible state of the system) that its solution is almost always intractable, both analytically and numerically. An alternate way of analyzing the behavior of x(t) is to construct its unbiased realizations using the SSA (1,2). The steps of SSA basically are:

i. Initialize the system's state x=x0 at time t=t0;

ii. Evaluate the reaction propensities aj(x) and their sum a0(x);

iii. Draw two random numbers r1 and r2 from a uniform distribution in the unit-interval,

and compute the time interval between the reactions Ï„= ln(r1)/a0(x) and determine

which reaction type Rj occurs next by finding the smallest integer j that satisfies

## ;

iv. Record and update the changes in the system tâ†’t+Ï„ and xâ†’x+Î½j; and

v. Return to step (ii) o r end the simulation.

The SSA advances the system in time from one reaction event to the next, and it does so in a way that is rigorously equivalent to the CME. The SSA is therefore an exact method for simulating the time evolution of a spatially homogeneous chemical system. Because SSA processes reaction events serially, i.e., one after another, it is often too slow to be practical when applied to real systems of interest. An extremely large molecular population of even onereactant species, and/or a very fast reaction, will usually cause the time increments Ï„ to be extremely small.

## Acceleration of Discrete Stochastic Simulation

Several lines of research have been pursued in attempts to improve upon the exact SSA. These include more efficient implementations of the SSA and the development of approximate SSAs with improved numerical efficiencies.

The efficiency of the algorithms can strongly depend on the formulation and numerical

implementation and these are key aspects to consider when developing simulation software. Gibson and Bruck used an efficient implementation strategy for the exact SSA (10) to remove a key limitation, which restricted the practical use of the first reaction method of Gillespie (1), by not requiring the generation of Nr new random uniform deviates following a reaction event where Nr is the number of active reactions. Rather than a single waiting time between events, the method stores the absolute firing time for each reaction in an indexed priority binary heap queue with times increasing with increasing depth in the heap. Assuming a static reaction network topology, a reaction dependency graph is created initially. Following an event, which is identified by the element at the top of the heap, a single random uniform deviate is generated,

reaction propensities are efficiently updated using the dependency graph, and any absolute firing time requiring updating is adjusted through a linear transformation. Lastly, the heap queue is reordered. Gibson and Bruck have shown that the algorithm has a time complexity of order O(Nr+NE log Nr) where NE is the number of events and generally requires only a single random uniform deviate per event . While the next reaction method can be efficient for large, very sparse networks, significant overhead may incur in maintaining the indexed priority queue. This has led Cao et al. to conclude that, for most kinetic systems, the most efficient formulation of the SSA is an optimized direct Gillespie method . One way to speed up the next reaction method type algorithms is to allow multiple reactions to take place in longer time steps, which is the underlying idea behind the probability-weighted dynamic Monte Carlo (PW-DMC) method and the tau-leaping based algorithms . The basic premise behind the PW-DMC method is that reactions with large propensities

dominate the stochastic simulation. By attaching weights that are updated as the simulation progresses, the PW-DMC approach attempts to equalize the propensities of the reactions constituting the network model. In this process, the large propensities are scaled down with an associated factor, which corresponds to how many times the involved reaction occurs when selected. It was shown that the PW-DMC algorithm solves the multiple time scale problem to a large extent . Efficient statistical sampling obtained with the PW-DMC algorithm made the simulation of an integrated stochastic model of EGFR signaling network that contains tens of thousands of reactions possible using desktop computers in reasonable wall clock periods

. The success of PW-DMC results from its effectiveness in bundling the occurring

reactions, and in this aspect, it is conceptually similar to the approach pursued in tau-leaping methods. The main differences between the PW-DMC and tau-leaping methods are that in the PW-DMC the bundling of the reactions is done for each reaction type (i.e., some reactions can be selected to be left out if desired) and the integration time does not have to be set in advance, but the tolerance in the fluctuation levels for the species need to be predefined. These features make PW-DMC suitable to deal with the reaction networks that can be partitioned into subclasses according to their spatio-temporal properties where the allowed fluctuation levels can be dictated on a per reaction basis. It was shown that, while the fluctuation variations may be affected, the PW-DMC conserves the mean quantities. In this respect, PW-DMC is a

weak order (i.e., not a strong order) stochastic method. The fluctuations in the computed quantities depend on the achieved speed-up, which can be several orders of magnitude without any significant effect on the computed mean quantities .

In the tau-leaping (Ï„L) methods , advances in the system configuration are determined based on a preselected time step Ï„ during which more than one reaction event may occur. Tauleaping is based on the same premises that underlie the CME and the SSA: If Ï„ is chosen small enough to satisfy the leap condition, which says that none of the propensity functions may change noticeably during Ï„, then given x(t)=x, the system state at time t+Ï„ can be approximately computed as

(2)

The Pj are statistically independent Poisson random variables; they physically represent the number of times each reaction channel fires in time Ï„. The Poisson random variable P(a,Ï„) is in fact defined as the number of events that will occur in time Ï„, given that a dt is the probability of an event occurring in the next infinitesimal time dt. We emphasize that Ï„ must be chosen small enough that none of the propensity functions change significantly during the leap. Algorithms have been developed for estimating the largest value of Ï„ that will satisfy that condition for a prescribed level of simulation accuracy .

While the Ï„L methods show significant gains in efficiency, the unbounded Poisson distribution can lead to nonphysical system configurations in which negative species populations arise when the expected number of events is large enough to consume the reactant populations. Cao et al. have shown how to avoid negative populations in explicit Poisson Ï„L (PÏ„L) methods. The non-negative PÏ„L algorithm is based on the fact that negative populations typically arise from multiple firings of reactions that are only a few firings away from consuming all the molecules of one of their reactants. To focus on those reaction channels, the modified PÏ„L algorithm introduces a control parameter nc, a positive integer that is usually set somewhere between 10 and 20. Any reaction channel that is currently within nc firings of exhausting one of its reactants is then classified as a critical reaction. The algorithm chooses Ï„ in such a way

that no more than one firing of all the critical reactions can occur during the leap. Essentially, the algorithm simulates the critical reactions using SSA, and the remaining non-critical reactions using Ï„-leaping. An alternative approach is based on choosing the reaction numbers using binomially distributed random variables that have the appropriate finite sampling ranges. If a short time interval Ï„Î£ is considered, for each reaction Rj, the Poisson distribution is replaced with a binomial

distribution B(pj, Ï‰j*) with the same mean ajÏ„. Here Ï‰j* is an upper bound (or limit) othe

reaction number and pj is the probability that Rj will fire in the interval Ï„Î£/Ï‰j*. In the resulting binomial tau-leaping (BÏ„L) method, the leap size is then chosen as the smaller of the times tj computed from the constraints pj=ajtj/Ï‰j* â‰¤ 1, j=1:M and the upper bound Ï„Îµ. Once the tau step has been determined, each network reaction number Ï‰j is generated by sampling the corresponding binomial distribution. Although the binomial distribution has the desirable property of having a finite sampling range,

this in itself does not completely resolve the non-negativity problem. In networks with multiple channel reactant dependencies, the interdependence arising from species participating in multiple reactions may still generate negative species populations. Tian and Burrage have shown how the BÏ„L completely resolves the non-negativity problem when there is at most a two-reactant dependence in a reaction network . In the BÏ„L method of Chatterjee et al. the issue of multiple-channel reactant dependencies is dealt with by assigning to each reaction a binomial random variable for the number of reaction events during a step and by modifying reaction number limits through tracking currently available population sizes as reactions fire during that step. Yet, as critically noted by Chatterjee et al., in the latter approach selected reaction numbers may depend on the order in which reactions are processed . The multinomial tau-leaping (MÏ„L) method of Pettigrew and Resat efficiently extends the BÏ„L method to networks with arbitrary multiple-channel reactant dependencies .Improvements were achieved by a combination of three factors: First, tau-leaping steps are determined simply and efficiently using a-priori information and Poisson distribution based estimates of expectation values for reaction numbers. Second, networks are partitioned into

closed groups of reactions and corresponding reactants in which no group reactant set is foundin any other group. Third, product formation is factored into upper bound estimation of the number of times a particular reaction occurs. Together, these features allow for larger time leaping steps while the numbers of reactions occurring simultaneously in a multi-channelv manner are estimated accurately using a multinomial distribution. Partitioning of the reaction network into groups with closure can be done with a graph theoretic algorithm based on the Dulmage-Mendelsohn decomposition of the stoichiometric matrix representing the network. Two methods for selecting an upper limit Î©* on the system reaction number was advocated, the rate limiting reactant and the rejection methods . The former is a conservative approach that guarantees that the chosen reactions do not lead to negative populations after the system's configuration is updated according to the occurring reactions. In contrast, the selection process in the rejection method only guarantees that the system configuration after the tau-leaping step is non-negative. It however does not ensure that all

possible multi reaction paths that take the system to that final configuration involve only

physically feasible intermediate stages. It was argued that, since one can only detect the relationship between the starting and end points of a move during the simulation, ensuring the state of the system to always correspond to physically feasible configurations at the observation points (i.e., starting and end points of the moves in a simulation run) is the most crucial aspect in obtaining accurate results. Therefore, allowing a simulation trajectory to temporarily cross the solution space boundary would introduce insignificant inaccuracies because statistics for intermediate states have only an indirect impact. In the MÏ„L method, tau-leaping step is then chosen as the smaller of the times t computed from the constraint p = tÎ£/Î©* â‰¤ 1 and the upper bound Ï„Î£ where Î£ is the system propensity. The actual system reaction number Î© is then sampled from a binomial distribution as discussed above, and the reaction numbers Ï‰j, j=1:M

âˆ’1, are drawn from a multinomial distribution with corresponding probabilities Î±j= aj/Î£ and Ï‰M=Î©âˆ’Ï‰1âˆ’Ï‰2âˆ’ â€¦âˆ’Ï‰M-1. Implemented advances over the BÏ„L method were shown to lead

to significant speed ups in the simulations of biologically relevant models (32). It was also

shown that MÏ„L is less sensitive to the error parameter of the Ï„L methods, which is a desirable

aspect of the kinetic simulation algorithms.

## Thermodynamic Limit

It is well known that the Poisson random variable P(a,Ï„), which has mean and variance aÏ„, can

be approximated by a normal random variable with the same mean and variance when aÏ„â‰«1.

Using this fact, it can be shown that if Ï„satisfies not only the leap condition but also the

conditions aj(x)Ï„â‰«1 for all j, which physically means that during time Ï„ every reaction channel

can be expected to fire many more times than once, then the basic tau-leaping Eq. (2) further

approximates to (49)

(3)

Here the Nj(0,1) are statistically independent normal random variables with mean 0 and

variance 1. This modified update formula is computationally faster than the tau-leaping formula

(2) which it approximates, because samples of the normal random variable are much easier to

generate than samples of the Poisson random variable. Also, since by hypothesis every reaction

channel fires many more times than once during each time step Ï„, simulations using the updating

Eq. (3) will be very much faster than the SSA.

If the arguments leading to Eq. (3) are repeated with Ï„ replaced by a macroscopic infinitesimal

dt, which by definition is small enough that no propensity function changes significantly during

dt yet large enough that every reaction channel fires many more times than once during dt,

then one can deduce the equation

(4)

where the gammas are statistically independent Gaussian white noise processes (49-51).This

equation is called the chemical Langevin equation (CLE), and it is an example of what is known

to mathematicians as a stochastic differential equation. It is entirely equivalent to the Langevin

leaping formula

We note that stochastic differential equations are often used in physics, but they are usually

obtained by starting with an assumed ordinary differential equation and then adding a "noise"

term that has been carefully crafted to give some preconceived outcome. So it is noteworthy

that the noise term in Eq. (4), namely the second summation there, has actually been derived

from the same premises that underlie the CME and the SSA using physically transparent

assumptions and approximations. Use of the Langevin formulas (3) and (4) is typically justified

whenever all reactant species are present in sufficiently large numbers.

At the thermodynamic limit, molecular populations of all the reactants and the system volume

become infinitely large, while the concentrations of all species stay constant. It can be proven

that, in this limit, all propensity functions scale linearly with the system size. Thus, the

deterministic terms on the right sides of Eqs. (3) and (4) grow in proportion to the system size,

while the fluctuating terms grow in proportion to the square root of the system size. This is

another way of showing that for large systems "relative fluctuations in the species populations

scale as the inverse square root of the system size", a central rule of thermodynamics. In the

full thermodynamic limit, the foregoing scaling behavior implies that the fluctuating terms in

Eqs. (3) and (4) become negligibly small compared to the deterministic terms, and the CLE

reduces to

(5)

This is none other than the traditional reaction rate equation (RRE), which is more commonly

used in terms of the concentration vector. Note that the RRE has not been assumed here; instead,

it has been derived from the same premises that underlie the CME and the SSA, using a series

of physically understandable approximations . Thus, the discrete, stochastic CME/SSA

and the continuous, deterministic RRE are actually connected across the end of the

representation spectrum. The challenge, of course, is to develop both the software infrastructure

and the theory necessary to make reliable adaptive decisions so that we can accurately and

efficiently simulate complex reaction systems based on detecting a discrete or continuous

regime and using an optimal blend of stochastic and deterministic methods.

## Scaling Properties With Respect to the Network Size

In addition to the multiscale nature of the involved kinetic processes, computational efficiency

of stochastic simulation algorithms also depend on several measures of the size of the

investigated network model, such as the overall complex population, the total number of

reactions, and the average number of nodal interactions or connectivity in a network.

Understanding how algorithms scale with size and nodal connectivity is an important

consideration in the design of numerical experiments for dynamic biological systems. As the

complexity of the systems studied in biological systems increases, such scaling issues become

a serious point of concern.

To study how stochastic kinetic simulation algorithms scale with network size, construction

of scalable reaction models that can be used in benchmark studies has been advocated (42).

Pettigrew and Resat used two scalable reaction models to compare the efficiency of the exact

Gillespie algorithm as implemented using the popular Gibson-Bruck method (GGB) (10) with

the inexact random substrate method of Firth and Bray . Foundations of these two

algorithms are quite distinct from each other and the latter algorithm has been stated to be an

efficient method when the total complex population is small and the number of multistate

interactions large.

The first investigated scalable model, the Size Scalable Model (SSM), is a four majorcompartment

model that is typical for a signal transduction network in eukaryotic cells .

The SSM involves 2 unimolecular and 5 bimolecular reversible reactions with mass transfer

between compartments for 8 complex types including ligands, adapter complexes, and

receptors . Upon phosphorylation the receptor can become active in signal transduction.

A sub-compartmentalization process, which is equivalent to forming smaller locally

homogeneous regions, allows for the creation of refined models where the size of the model

increases in proportion to the number of created subcompartments while maintaining the

connectivity (i.e., the topology) between the species. For SSM, it was found that the GGB

algorithm, whose numerical efficiency scales with the logarithm of the total number of active

reactions, performs significantly better in all cases including those characterized by a very

small number (~100) of complex types, each with a very small population (42).

The second scalable model, the Variable Connectivity Model (VCM), is a single compartment

signal transduction network involving 2 unimolecular and 2 bimolecular reversible reactions

for 3 complex types, ligand, receptor, and ligand receptor complexes (42). The receptors in

this model have multiple phosphorylation sites. The size and the node connectivity between

the species increase in proportion to the number of phosphorylation sites N. For the VCM, it

was shown that the FB random substrate method, whose efficiency scales with the square of

the total complex population, can outperform the GGB algorithm when the node connectivity

is sufficiently high and the total complex population sufficiently low. Performance of the FB

and GGB algorithms were determined as a function of the number of phosphorylation sites for

two distinct population sets. In the first set the total complex population was initially set at 192

complexes (128 ligands and 64 receptors), while for the second set the total complex population

was 10 times higher with the 2:1 ratio of ligands to receptors unchanged. It was found that, for

the smaller population set the FB algorithm becomes more efficient than the GGB when the

number of phosphorylation sites N reaches 6, whereas in the larger population set the GGB is

clearly superior for small N up to a crossover point that occur around N~11. Thus, it was

concluded that with the exception of special cases, GGB is a more efficient method than the

FB (42).

This simple study using two sample realistic networks clearly illustrated that the most efficient

algorithm can depend on the problem type and the used simulation method should be chosen

after a careful study of the topology and multiscale properties of the investigated network.

## FUTURE ROSPECTS OF STOCHASTIC SIMULATION

In recent years there has been rapid progress in the analysis and development of algorithms for

the accelerated discrete stochastic simulation of chemically reacting systems. Although these

developments provide a good foundation, existing stochastic simulation methods are far from

being computationally efficient to simulate detailed biological models. This is particularly true

for the models that involve multiple time and spatial scales while still requiring high resolution

for accurate representation. Thus, the development of efficient and adaptive multiscale

algorithms and software for stochastic-deterministic simulations are urgently needed. This is

particularly true for the simulation of spatially resolved models because immense recent

progress in imaging is beginning to make the data about the inhomogeneous distribution of

molecular species available. As the expense associated with spatially resolved models makes

fully stochastic simulations nearly infeasible, efforts to develop hybrid approaches will become

very important. Hybrid methods have recently been proposed to simulate multiscale

chemically reacting systems. These methods combine the traditional deterministic reaction rate

equation, or alternatively the chemical Langevin equation, with the SSA. The idea is to split

the system into two regimes: the continuous regime and the discrete regime. The RRE is used

to simulate the fast reactions between species with large populations and the SSA is used for

slow reactions or species with small populations. The conditions for the continuous regime are

1) the number of instances of each molecular species in a reaction in the continuous regime

must be large relative to one, and 2) the number of reaction events of each reaction occurring

within one time step of the numerical solver must be large relative to one . If either condition

is not satisfied for a reaction channel, that reaction channel must be handled in the discrete

regime.

The hybrid methods efficiently use the multiscale properties of the problem but there are still

some fundamental unsolved problems. The first is the automatic partitioning of systems. This

needs to be done very carefully, because both accuracy and efficiency depends on a proper

partitioning. It will require a precise awareness of the assumptions under which algorithms and

approximation available in the multiscale framework are valid, and the corresponding tests on

the system to determine the partitioning. This is one reason why a proper theoretical foundation

for each algorithm is so critical - to ensure that we are using it only to make approximations

for which the corresponding assumptions are valid. Concerns about automatic step size

selection continue to exist. Another difficult problem arises in the case when a reaction is fast

but one of the corresponding reactants has a small population thus failing to satisfy the second

requirement above for the deterministic regime. Such a reaction channel will still be handled

by the SSA, resulting in a very slow simulation. New approaches and approximations, such as

the slow-scale SSA have the potential to overcome this limitation.

Another promising area of research is the application of tau-leaping based stochastic algorithms

to problems involving reaction-diffusion processes. When the SSA algorithm is applied to

reaction-diffusion processes the standard approach is to subdivide the heterogeneous system

volume into a sufficiently large number of small-volume elements V0~O(h3) so that the

assumption of a homogeneous, well-stirred system within each element is recovered. With

diffusion considered as a first-order reaction in which molecules of a species are exchanged

between neighboring elements at a rate rD âˆV0D/h2 , kinetic reactions within an element occur

at a rate rCâˆV0Î£ . Here D and Î£ are, respectively, the diffusion constant and the total system

propensity. By choosing elements sufficiently small so that rC â‰ª rD is true for all elements,

the SSA may be applied although under these circumstances it is not usual to be confronted

with a reaction network where the total number of processes and species is now several orders

of magnitude larger than that found in a single element. Moreover, it has been observed in

stochastic simulations with the SSA that reaction-diffusion processes are strongly dominated

by diffusion events. The increase in network size and diffusion dominated events both

contribute to a severe reduction in efficiency and it is highly likely that a number of more recent

stochastic algorithms for reaction-diffusion suffer from the same problem. It is possible that

Ï„-leaping methods may be well suited to overcoming this issue with the stochastic simulation

of reaction-diffusion problems.

As stochastic simulation is used more and more often in studying the dynamics of biological

systems, and as biological models become larger and more complicated, there is a need for the

development and efficient implementation of simulation algorithms that can handle the

increasing complexity. Thus, the issue of how an algorithm scales with the system size is crucial

in assessing the limitations of the algorithm. Scaling concerns expose the serious need for

establishing realistic scalable models that can be used in benchmark studies for algorithm

efficiency comparison. Pettigrew and Resat have introduced two such models. It was

observed that the scaling qualities of the algorithms investigated break down significantly

when the network reaches a certain size. We suspect that this is a common occurrence for all

of the existing algorithms, which unfortunately can impose severe limitations on the utility of

the stochastic kinetic approaches. Roughly estimating, existing stochastic simulation

algorithms can be used to investigate reaction networks containing ~105 reactions using a

desktop machine. Utilization of high-performance resources may push this limit to 108.

Considering that even the smallest microbial community or physiologically relevant eukaryotic

network can easily contain >1012 reactions, it is clear that the scaling properties of the stochastic

simulation algorithms need significant improvements before they can be widely used for larger

problems. As discussed above, hybrid methods that reduce the computational complexity by

treating the non-critical parts of the network using deterministic models may offer the most

suitable solution to the scaling bottleneck in the near future.

We conclude by observing that while recent improvements in algorithm development,

computational implementation, and hybrid approaches are indications of a promising future

for the stochastic kinetic simulation of biological systems, there is still plenty of room for

achieving significant advances.