# Network Inference With Dynamic Bayesian Network Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Dynamic Bayesian networks have been highly popular recently on the inference of gene regulatory network form gene expression data. This is due to their capability of modeling temporal characteristics of gene expression data and the ability to model causal interaction among genes. This chapter focuses on the inference of gene regulatory networks from time series of expression data using Markov chain Monte Carlo simulation of dynamic Bayesian networks. Section 2.1 provides in an overview of Bayesian networks. Section 2.2 presents Dynamic Bayesian networks and structure learning. Section 2.3 describes the experiment design and Section 2.4 shows the results obtained with dynamic Bayesian network. Finally Section 2.5 concludes the chapter with a discussion on the obtained results.

## 2.1 Bayesian networks

Bayesian networks are a combination of two well-developed mathematical concepts: Probability Theory and Graph Theory. Probability theory is the study about random variables, events and uncertainty. Graph theory is the study about complex interaction and structures using 'graph' like structures. Since Bayesian network is derived from these two mathematical concepts, it is also defined as a Graphical Probabilistic Model. A Bayesian network provides probabilistic framework for robust inference of interactions in the presence of noise and consists of a set of nodes and a set of directed edges or arcs. All the edges are directed and there are no directed cycles. Therefore, a Bayesian network forms a Directed Acyclic Graph (DAG).

Bayesian networks are defined by a graphical structure S, a conditional probability distribution and their parameters. The graphical model S of a Bayesian network consists of a set of nodes and a set of directed edges: S= (,). The nodes represent random variables, while the edges indicate conditional dependence relations. Thus, if there is an edge from a node A to a node B, B is conditionally dependent on A where A is called a parent of B and B is called a child of A. The random variables are drawn from conditional probability distributions. Where the random variable is a node in the directed acyclic graph S and is the set of parents of node.

Bayesian network assume the Markov assumption that given its parents, each variable is independent of its non-descendants. With this Markov assumption, Bayesian networks uniquely specify a decomposition of the joint distribution over all variables down to the conditional distributions of the nodes. So we can use factorization rule to expand the joint probability in terms of simpler conditional probabilities as follows:

Gene regulatory networks provide a natural example for Bayesian network. When gene regulatory network is modeled by a Bayesian network, genes are represented at the vertices or the nodes and regulatory interactions are parameterized over the connection by conditional probabilities of gene expressions. Figure 2.1 shows the example of a gene regulatory network that is modeled by a Bayesian network. It consists of a set of nodes and a set of edges . The Gene G1 does not have any regulators. But G1 regulates G2 and G3. Gene G2 and G3 both regulate G4. The Gene G3 regulates G3. The probability distribution for the expression levels of each gene is modeled by the Bayesian network's parameters. Probability distribution for a gene depends only on its parents (regulators) in the network. The expression level of G4 and G5 are related only because they have a common regulator G3. But expression level of G4 and G5 are conditionally independent if G3 given. We can factorize the full joint probability distribution into component conditional distribution by applying (2.1) in to the Bayesian network in the figure 2.1:

(2.2)

Figure 2.1 Modeling of gene regulatory network using Bayesian network

Bayesian network models the likelihood of a set of random variables by considering their conditional probabilities. Let's take that expression of number of genes indexed by set are given. If denotes the structure of the Bayesian network and the parameters of the network, the likelihood of gene expressions is given by

Where denotes the conditional probability of gene expression of gene, given the set of gene expression of its parents. And where denotes the parameters of the conditional probability. The Bayesian network decomposes the likelihood of gene expressions into a product of conditional probabilities conditioned on the expressions of parent genes.

## 2.1.1 Strengths of Bayesian Network

Like many methods in machine learning, Bayesian Networks have been applied for different applications. Bayesian networks are suitable for infer gene regulatory networks from gene expression data because of its characteristic strengths. They are given bellow,

Bayesian networks are capable of learning about causal relationships between random variables. This is very important because it allows making prediction in the presence of interventions and causal relationships and provides a better understanding about overall problem domain.

Bayesian networks can learn from incomplete data sets. In real world many of the data sets have missing values. So it is essential to handle missing values in learning techniques. Bayesian networks offer a natural way to handle missing observations.

Bayesian networks can avoid overfitting of data to the model. A common problem in machine learning is overfitting, where the model poorly generalizes to unseen data. The use of Bayesian probability theory in Bayesian networks provides mechanisms for describing uncertainty and for adapting the number of parameters to the size of the data set.

Bayesian networks readily facilitate the use of prior knowledge in the inference process. Encoding of causal prior knowledge is relatively straightforward by constructing "causal" edges between any two factors that are believed to be correlated. Also, Bayesian networks encode the strength of causal relationships with probabilities.

Even though Bayesian network has lot of advantages of modeling the data, it has some drawbacks also. Bayesian network cannot model temporal or time series data. It only handles static data. Also, Bayesian network does not allow cycles in the network structure. But most biological systems have feedback loops. Therefore we need to extend the Bayesian network to handle the "dynamics" or "temporal" characteristics of the data and the feedback loops. Most of the systems that we observed in biology such as gene interaction, protein interaction are not detected based on a particular point in time. However they can be described through a multiple sates of observation over time. Bayesian networks do not provide direct mechanism to model the temporal dependencies. When Bayesian networks are used to model time series data and feedback loops, the random variables are index by time and replicated in the Bayesian network. Such networks are known as Dynamic Bayesian networks [1, 2].

## 2.2 Dynamic Bayesian Network

Dynamic Bayesian networks (DBN) extends the Bayesian network framework to capture the dynamic properties of data and represent feedback loops. We can categories dynamic Bayesian network as special case of hidden Markov models. Feedback loops or recurrent networks such as the one depicted in figure 2.2(a), cannot be modeled by Bayesian network because of the acyclicity constraint. However feedback mechanism is a highly important feature in the most of biological systems. Consider the figure 2.2(a) which shows a simple gene regulatory network consists of three genes. It has a feedback loop from gene G1 to gene G2, from gene G2 to gene G3 and back to gene G1 from gene G3. However, interactions between genes are usually not static, they change over time. For example, when first gene G1 is transcribed and translated into protein, which then regulate the transcription of the gene G2 and products of gene G2 regulate the transcription of the gene G3. Then proteins of gene G3 back again has some influence on transcription of gene G1. This indicates that final effect occurs with a time delay after its cause. This kind of transition of a gene network from time t=to to time t=to + d (delay) can be modeled by having two copies of the same Bayesian network, with additional edges between the two. Therefore we can similarly unfold the feedback network in the figure 2.2(a), in time to model cycles. Figure 2.2(b) shows the unfolded gene network that having directed acyclic graph property which corresponds to a dynamic Bayesian network.

G3

G3

G2

G1

G2

G1

G3

G2

G1

G1

G2

G3

Dynamic Bayesian Network

t = 1

t = 3

t = 2

Cycles are not allowed in

Bayesian network

## (b)

Figure 2.2 Modeling of a gene network with dynamic Bayesian network. Figure (a) shows gene network consisting of 3 genes with feedback and (b) shows the equivalent dynamic Bayesian network obtained by unfolding of gene network in time

The graphical structure of dynamic Bayesian network represents regulations from genes across the consecutive time points. Note that the time-course of gene expression of gene is given by where denotes the expression of gene at time t and T is the total number of time points at which the expressions of genes are gathered. We assume that time points are equally spaced. Let's take each gene have number of states. So parent gene expression will have number of states where ||denotes the number of parents of gene . Let . Then the likelihood of gene expression given by (2.3), can written as bellow using the property of decomposability.

Where, denotes the total number of counts of parents' state being when the gene takes the value in the preceding time point. In above equation, conditional probabilities are given by the conditional probability distribution table:. The process of learning Bayesian networks from the data consists two important steps. They are,

Parameter learning - calculate the best conditional probabilities for each node, given a structure and the data

Structure learning - find the best structure which explain the relationships between the variables, given the data

## 2.2.1 Parameter learning

In parameter learning, we are trying to find the parameter set that maximizes the likelihood of data, given the model. Marginal likelihood of the network structure can obtain by marginalizing (2.4) over the parameters:

Where denotes the prior probabilities of the parameters. Then can be writing as a product of conditional probabilities assuming that conditional probabilities are independent:

By substituting (2.6) and (2.4) in (2.5), we can obtain:

Different types of prior distribution are used to represent the conditional probabilities. Dirichlet distribution, geometric distribution and gamma distribution are some of them. The Dirichlet prior have the capability to model the complex interactions that is present in the gene regulatory network [3]. Therefore by assuming dirichlet prior for conditional probabilities:

Where denotes the hyperparameters corresponding to the parameters of dirichlet prior of and denotes the gamma function. By substituting (2.8) in (2.7), the likelihood is given by

Where and . Hyperparameters given by where denotes the scale parameter [4]. Using equation (2.9) and Lagrange theory, the maximum likelihood estimates of the parameters can be derived:

## 2.2.2 Structure learning

Inference of network structure is a very interesting problem in computational biology. The optimal GRN structure obtained by the maximum a posterior distribution:

The posterior distribution of network structure is proportional to the product of marginal likelihood and prior probability according to Bayes' rule:

Where is the prior probability, denotes the marginal likelihood of the network structure and is denotes the normalization factor. The optimal network structure can obtain by maximizing posterior probability of a Bayesian network by using equation 2.11. But it is infeasible because of following reasons.

Microarray expression data is sparse. This happen because number of genes in a data set is very large than the number of time points it has. Therefore, posterior probability tends to be diffused due to expression data being sparse. As a result, huge numbers of different structures are plausible as shown in figure 2.3. This implies that it is more appropriate to take the average of networks which have high posterior probability rather than taking the network which has the highest posterior probability.

The direct approach to obtain the posterior distribution is impractical due to the denominator () in 2.11. The number of different network structures increases super exponentially with the number of nodes, as shown in Table 2.1. As a result calculating denominator is intractable for large number of nodes.

(a)

Large data set:

Best network structure well defined

(b)

Small data set :

Figure 2.3 Expression data and uncertainty of structure inference [5]. (a) Shows when the data is sufficiently large and informative, optimal structure is well defined by the posterior probability distribution. (b) Shows When there is small amount of data which is the real biological situation, there is intrinsic uncertainty for finding optimal network structure

Table 2.1 Change of number of structures with the number of nodes. From [6]

Number of nodes

Number of structures

2

3

4

543

6

3.7Ã-106

8

7.8Ã-1011

10

4.2Ã-1018

Because of above problems, it is hard to obtain straightforward solution to equation 2.11. Therefore we need to sample networks from the posterior distribution and apply the Markov chain Monte Carlo (MCMC) [7] simulation.

## 2.2.3 Markov Chain Monte Carlo (MCMC)

In order to find the optimal network structure, Markov chain of structures is formed by sampling network from the posterior distribution:

Where, T denotes a matrix of transition probabilities, with which denoting the probability of transition from structure to. And , denotes the probability distribution in the and steps of the Markov chain. Above Markov chain converges in distribution to the true posterior probability of (2.12):

The Metropolis-Hastings [9] method of MCMC is implemented, which associates an acceptance mechanism when new sample structures are drawn. Given a network structure, a new network structure is proposed, with proposal probability of . This new network structure is accepted based on the acceptance ratio given by:

Where Metropolis-Hasting acceptance ratio is:

Where is obtained from the (2.12) and the most important advantage of the Metropolis-Hasting acceptance ratio is it cancels out the intractable denominator of the since it used the ratio. Initial structure obtained by using mutual information among the gene expression data and a new network structure is proposed by applying one of the elementary operations such as deleting, reversing, or adding an edge, and then discarding those structures that violate the acyclic condition as shown in figure 2.4.

Delete

Reverse

Delete

Reverse

Neighborhood

Neighborhood

Delete

Reverse

Delete

Reverse

Proposal probability = 1/6

Proposal probability = 1/5

Figure 2.4 MCMC elementary proposal moves and proposal probability.

The second term of the Metropolis-Hasting acceptance ratio is called Hastings ratio and it is obtained by:

Where N denotes the size of the neighborhood obtained by elementary operations on structure s and counting of the valid structures. According to the figure 2.4 and . Therefore, Hasting ratio is . Detailed Metropolis-Hasting algorithm for structure learning is given bellow,

Algorithm 1 Metropolis-Hasting algorithm

Initialize starting structure

for i = 0 to N-1

Generate by elementary operation

Sample

## end for

Discard structures in burn-in period and compute the average network using remain structures

## 2.3 Experiment Design

Multiple time-series datasets of gene expressions were generated for a given network topology. Most of GRN inference studies were done using scale-free networks which were generated using Barabasi-Albert model [10]. But in this study, we used real biological networks that are extracted from global Escherichia coli (Ecloi). Ecoli global interaction network is used as a template for extracting realistic gene network structures. Ecoli transcriptional regulatory network contains 1502 nodes and there are 3587 interaction between those nodes. There are two steps in the gene expression data generation.

Extract realistic gene network structures from the Global interaction networks

Generate gene expression data for a given network structure

## 2.3.1 Sub network extraction

Scale-free random graph models and manual design of small benchmark networks are the most dominant ways of generating synthetic gene networks. But these methods are not able to capture important properties such as modularity or the occurrence of network motifs [8]. Module extraction is used by the GeneNetWeaver (GNW) [8] to extract sub networks from the large interaction networks. Module extraction starts from a seed node that is selected randomly among the nodes of the source network. Then nodes are added to the subnetwork based on modularity score. Modularity score is defined as below,

Where m denotes the total number of edges in the network, s is the index vector defining the subnetwork ( = 1 if node i is part of the subnetwork, = âˆ’1 if not), and B denotes modularity matrix with elements = âˆ’. is the actual number of edges between node i and j and = /2m is the expected number of edges in a randomized network ( being the degree of node i). Detailed module extraction method is given below,

Algorithm 2.2 Module Extraction

Select the seed node (randomly or user defined)

while (current subnetwork size < desired size)

Select all neighbors of the current subnetwork

Compute for each neighbor the modularity Q of the subnetwork after adding only this neighbor to the subnetwork

If (Neighbor selection == Greedy)

Select the neighbor that provide highest modularity Q and add it to the current subnetwork

## end if

If (Neighbor selection == Random among top (p%))

One of the top p% neighbors with the highest modularity is chosen randomly

## end while

Subnetworks extracted by module extraction have a biologically plausible connectivity because they preserve functional and structural properties of gene networks such as degree distribution and network motifs when compared with randomly extracted network [8]. All the gene networks that are used in this study were extracted using module extraction procedure from the Ecoli global interaction network. GRNs of having 10, 20, 30, 40, 50 and 100 genes were extracted by module extraction. Figure 1 shows a GRN obtained using the module extraction method. It consists of 20 genes.

Ecoli_20G.jpg

Figure 2.5 GRN which having 20 nodes obtained using the module extraction

## 2.3.2 Generation of gene expression time-series data

Let vector represent expression of a gene network having M number of genes at time t. According to multivariate vector autoregressive (MVAR) model [11], gene expression values of all the genes at time obtained by,

Where denotes the connectivity matrix of the gene network after multiplied by the random weights and denotes the added Gaussian random noise to the gene expression at time t. Random weights were obtained by drawing samples from a uniform distribution on the interval [0.5, 1] . The initial gene expression vector was constructed by using the sample drawn by the uniform distribution on the interval [0, 1]. Algorithm runs fixed number of iteration (10,000) to gain stability. Then, required number of time points is extracted. Detailed algorithm is given below:

Algorithm 2.3 Sampling gene expression data

Input: Network structure - S, number of time points - T

M = size(S)

weights = 0.5 + 0.5*rand(M)

for t = 1 to

Return: dataset

## 2.3.3 Data discretization

All gene expression values were represented as continuous data in generated data set. Therefore these data were converted to the discrete values before using them in the learning algorithm. We assume that genes can have three states {-1, 0, +1} representing down-, no- and up-regulation of genes.

## 2.3.4 Equal width interval binning

Range of the Continuous gene expression data is divided in to a predetermined number of intervals in the Equal width interval binning. This method is very popular because of its simplicity. Data range was divided in to three intervals because gene can have three states. Discretization steps are given below.

Sort the continues expression values and find the maximum value () and minimum value ()

compute the width of the interval () as follows,

Then state (down, no and up regulation) of the gene was determined by as follows,

State =

## 2.3.5 Initialization of the structure using mutual information

The starting structure is initialized by using mutual information among the genes because it provides a measure about dependence between two genes [12, 13]. According to Information Theory, Entropy is a measure of the uncertainty present in the data. Entropy of the ith gene computes from the time series gene expression data as below,

Here T is the total number of time points and k is the number of states that a gene can have. Mutual information (MI) between gene i and j is given by,

Here, the first two terms are individual entropies of gene i and j. last term is the joint entropy. Joint entropy obtained using the following formula.

After calculating mutual information for all gene pairs, connectivity matrix (C) of the structure obtain as follows,

## 2.4 Experiments Results of DBN

Gene expression values for each GRN sub network were obtained by changing the number of time points (10, 50, 100 and 200 time points) using the algorithm 2.3. Each data set consists of 20 time series. Structure learning was done using MCMC simulation. Performance measures such as precision, recall, accuracy and F-measure were calculated by using true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN). Table 2.2 shows the performance of MCMC inference procedure. Following equations are used to calculate the performance measures.

Precision = TP/ (TP+FP)

Recall = TP/ (TP+FN)

Accuracy = (TP+TN)/ (TP+TN+FP+FN)

F-measure = 2*

Table 2.2 Performance measures for Dynamic Bayesian network

#Genes

#Samples

Precision

Recall

Accuracy

F-measure

10

10

50

100

200

0.190.11

0.380.11

0.710.12

0.830.10

0.170.09

0.450.09

0.530.09

0.620.06

0.790.03

0.830.03

0.910.02

0.940.01

0.170.09

0.410.08

0.600.08

0.710.06

20

10

50

100

200

0.140.10

0.250.05

0.460.10

0.680.08

0.080.06

0.370.06

0.480.05

0.550.05

0.910.01

0.880.02

0.920.01

0.950.01

0.100.08

0.300.06

0.470.06

0.600.05

30

10

50

100

200

0.120.08

0.190.04

0.370.04

0.580.06

0.050.03

0.320.05

0.480.05

0.540.04

0.940.01

0.910.01

0.940.01

0.960.00

0.070.05

0.240.04

0.410.04

0.560.04

40

10

50

100

200

0.120.08

0.180.04

0.310.04

0.500.05

0.040.02

0.280.04

0.410.03

0.470.03

0.950.00

0.920.01

0.940.01

0.960.00

0.060.04

0.220.03

0.350.03

0.480.03

50

10

50

100

200

0.070.06

0.160.03

0.270.02

0.450.03

0.020.02

0.220.04

0.300.02

0.350.02

0.950.00

0.920.00

0.940.00

0.960.00

0.030.02

0.190.03

0.280.02

0.390.02

100

10

50

100

200

0.040.03

0.070.01

0.110.01

0.200.01

0.010.01

0.100.01

0.150.01

0.190.01

0.970.00

0.950.00

0.960.00

0.970.00

0.010.01

0.080.01

0.130.01

0.200.02

## 2.5 Discussion

Many applications in bioinformatics have taken the advantages of Bayesian networks. They are gene expression analysis, protein modeling and DNA sequence analysis. However, applying Bayesian networks to infer gene networks became more popular because Bayesian networks are capable of expressing causal relationships, learn from incomplete data and combining prior knowledge. The Bayesian network models the likelihood of gene expressions by considering their conditional independences. Dynamic Bayesian networks extend the Bayesian network framework to capture the dynamic characteristics of gene expression by assuming a first order stationary Markov chain. Table 2.2 shows the results of gene regulatory network inference with dynamic Bayesian network. According to the results, precision, recall and F-measure values are high for the networks which have smaller number of genes. Performance parameters of large networks are adversely affected by the presence of large number of false positives and false negatives when compared with small networks. Also we can see that the overall performance is improved with the increase of number of time points. Heatmap of figure 2.6 shows the effect of number of time points to the F-measure for different networks.

DBN_map.jpg

Figure 2.6 Illustration of the effect of number of time points on accuracy (F-measure) of DBN method of inferring GRN structure of varying number of genes