Systems Biology Ireland, Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland

Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland

School of Medicine and Medical Science, University College Dublin, Belfield, Dublin 4, Ireland

Abstract

Background

Recent advancements in genetics and proteomics have led to the acquisition of large quantitative data sets. However, the use of these data to reverse engineer biochemical networks has remained a challenging problem. Many methods have been proposed to infer biochemical network topologies from different types of biological data. Here, we focus on unraveling network topologies from steady state responses of biochemical networks to successive experimental perturbations.

Results

We propose a computational algorithm which combines a deterministic network inference method termed Modular Response Analysis (MRA) and a statistical model selection algorithm called Bayesian Variable Selection, to infer functional interactions in cellular signaling pathways and gene regulatory networks. It can be used to identify interactions among individual molecules involved in a biochemical pathway or reveal how different functional modules of a biological network interact with each other to exchange information. In cases where not all network components are known, our method reveals functional interactions which are not direct but correspond to the interaction routes through unknown elements. Using computer simulated perturbation responses of signaling pathways and gene regulatory networks from the DREAM challenge, we demonstrate that the proposed method is robust against noise and scalable to large networks. We also show that our method can infer network topologies using incomplete perturbation datasets. Consequently, we have used this algorithm to explore the ERBB regulated G1/S transition pathway in certain breast cancer cells to understand the molecular mechanisms which cause these cells to become drug resistant. The algorithm successfully inferred many well characterized interactions of this pathway by analyzing experimentally obtained perturbation data. Additionally, it identified some molecular interactions which promote drug resistance in breast cancer cells.

Conclusions

The proposed algorithm provides a robust, scalable and cost effective solution for inferring network topologies from biological data. It can potentially be applied to explore novel pathways which play important roles in life threatening disease like cancer.

Background

We are faced with a fundamental challenge of understanding how a cell’s behavior arises from protein and gene interactions. Yet, the exact map of dynamic interactions between cellular network components is largely unknown for key cellular networks. Even for perturbations confined to single network nodes, mapping the dynamic topology of protein and gene network interactions is not straightforward. In fact, a local perturbation that is initially confined to a node rapidly propagates through the entire network, causing widespread, global changes that mask direct connections between nodes. Thus, the “reverse engineering” approaches where the connection architectures are inferred from the perturbation response data are becoming increasingly appreciated. Although reverse engineering methods such as Boolean networks

We previously developed a method to infer network interaction maps based on steady-state responses to systematic perturbations

Here, we propose a computationally efficient method which integrates the theoretical framework of MRA with a Bayesian Variable Selection Algorithm to infer functional interactions in signaling and gene networks based on noisy and incomplete perturbation response data.

Results

Fundamentals of the inference framework

Motivation

In general, network interactions can be quantified by analyzing the direct effect of a small change in one node on the activity of another node, while keeping the remaining nodes unchanged to prevent the spread of the perturbation _{
i
j
}) of this local response is the ratio of the immediate fractional change in the activity (_{
i
}) of node _{
k
},

On the other hand, the global changes (_{
i
j
}) in node

where _{
k
} respectively. Let us select node _{
k
} does not directly influence node **
r
**

Eq. 2 presents a precise solution to the problem of inferring the network topology (determined by connection coefficients _{
i
j
}) from the steady-state perturbation responses **
R
** must have rank

In a previous effort, total least square regression (TLSR) was exploited as a method for estimating the connection coefficients _{
i
j
} from noisy perturbation responses _{
i
j
} to draw reliable inference about the nature of the corresponding interactions. Therefore, a Monte Carlo method for estimating the probability distributions of _{
i
j
} was proposed and successfully used to find out connection coefficients for a three-level extracellular signal-regulated kinase (ERK) cascade in a subsequent study ^{6} sets of random realizations of the perturbation responses were drawn from normal distributions with means and standard deviations equal to those of the experimentally measured values _{
i
j
} calculated in this manner were used to estimate its probability distribution

Objective

To speed up the computation process, we refrained from inferring the distributions of the connection coefficients _{
i
j
}. Instead, we chose to infer whether node _{
i
j
}≠0 represents an edge from node _{
i
j
} = 0 indicates that there is no edge from node _{
i
j
} to determine whether the mean value of _{
i
j
} is significantly different from zero. However, this requires estimating the probability distribution of _{
i
j
} which is computationally expensive. To avoid the process of estimating the distributions of _{
i
j
}, we modified the original MRA equation (Eq. 2) by introducing a new set of binary variables (_{
i
j
}) which explicitly represent presence (_{
i
j
} = 1) or absence (_{
i
j
} = 0) of direct interaction between node

For noisy global responses (_{
i
j
}), the above equality does not hold exactly. If we account for the difference between the left and right hand sides of Eq. 3 caused by measurement noise, then the above equation (Eq. 3) becomes,

Here, _{
i
k
} is the difference between the left and the right hand side of Eq. 3 and _{
i
j
} = 1)), without having to estimate the probability distributions of the connection coefficients (_{
i
j
}). Additionally, in the new formulation, we relax the restrictions of required number of perturbation experiments (i.e.,

**Details of Bayesian formulation.** In this file we have described the mathematical details of the Bayesian formulation presented in the paper.

Click here for file

The proposed algorithm

Eq. 4 represents a mathematical relationship between the network topology (_{
i
j
}), the strength of each interaction (the connection coefficient _{
i
j
}) and the measured noisy perturbation responses (_{
i
j
}) of the network components. Here, the network topology (_{
i
j
}), the interaction strengths (connection coefficients _{
i
j
}) and the error (_{
i
k
}) caused by measurement noise are unknown variables and can be estimated from the perturbation responses (_{
i
j
}) using statistical inference algorithms. To simplify the estimation process, we first conceptually divided a network of _{
i
j
}, _{
i
j
} and _{
i
k
}) are expressed in terms of their respective probability distributions. Prior to any experimental observation, these distributions are estimated based solely on our subjective assessments (assuming that little is known about the network topology a priori) and are referred to as prior distributions (see the ‘Methods’ section for a detailed description of the prior distributions of the unknown variables _{
i
j
}, _{
i
j
}, _{
i
k
}). The prior distributions were then updated based on experimentally observed data using the Bayes theorem (see Additional file _{
i
j
} (see Methods), which represents the posterior probability of the presence (_{
i
j
} = 1) or absence (_{
i
j
} = 0) of a direct network connection from node j to node i. However, it was not possible to analytically calculate the posterior distribution of _{
i
j
}, since it involves a normalization constant which requires calculating a very large integration (see Methods). Therefore, the posterior distributions of _{
i
j
} were approximated using Markov Chain Monte Carlo (MCMC) sampling (as detailed in Methods). Finally, the topology of the network was inferred by thresholding the approximate posterior distributions of _{
i
j
}, i.e. if the posterior probability of _{
i
j
} = 1 is higher than a threshold value (_{
t
h
}), then we assumed that node

Work flow of the Bayesian framework

**Work flow of the Bayesian framework.** Here, we have illustrated the steps necessary for the reconstruction of a hypothetical toy network using our Bayesian framework. The toy network consists of three nodes _{1}, _{2} and _{3}. The experimentally measured global responses of _{1}, _{2} and _{3} to external perturbations are denoted by _{i}, we first developed a set of modified MRA equations by introducing the binary variables _{ij} (which indicates whether _{j},_{i}) into the MRA equations. Then we used the BVSA algorithm to infer the probability of _{ij} = 1|

**source-code.** This file contains the MATLAB source code for the BVSA algorithm which is described in this paper.

Click here for file

Performance of the proposed algorithm for simulated and real biological networks

We studied the performance of BVSA in reconstructing both simulated and real biological networks. For simulation, we considered the Mitogen Activated Protein Kinase (MAPK) Pathway and two gene regulatory networks (GRNs) consisting of 10 and 100 genes respectively. For real biological networks we chose the ERBB signaling pathway that regulates the G1/S transition in the cell cycle of human breast cancer cells

The above datasets were used not only to evaluate the performance of BVSA, but also to compare its performance with many other algorithms, e.g. stochastic MRA

Simulation study: Mitogen Activated Protein Kinase (MAPK) Pathway

MAPK pathways encompass central mechanisms of signal processing in many different eukaryotic species and participate in the regulation of a large number of important physiological processes, such as differentiation, proliferation, cell cycle and apoptosis

Reconstructing the interactions among different modules of the MAPK pathway

**Reconstructing the interactions among different modules of the MAPK pathway.** Panel **(a)** shows a simplified representation of the EGF pathway which activates the RAF/MEK/ERK MAPK cascade when stimulated by EGF. The conceptual modules of this pathway is highlighted in Gray. The communicating intermediates of each module are shown in yellow boxes. Panel **(b)** shows the true interactions among different modules of the MAPK pathway. In this figure, the arrowheads represent activation and the blunt arrowheads represent inhibition. Panel **(c)** shows the AUROC and AUPR values calculated from the networks which were inferred by BVSA, MRA and SBRA respectively. The error bars depict the standard deviations of the corresponding AUROCs.

The MAPK pathway was conceptually divided into six modules, where each module is a functional unit which consists of several biochemical interactions and performs one or more identifiable tasks

1. the receptor module which consists of the interactions leading to receptor (EGFR) activation upon ligand stimulation

2. the adapter module which consists of the phosphorylation of Shc and its complex formation with Grb2SOS

3. the initiator module which consists of the activation and deactivation of RasGDP

4. the MAP3K module which consists of the activation and deactivation of Raf

5. the MAP2K module which consists of the activation and deactivation of MEK

6. the MAPK module which consists of the activation and deactivation of ERK.

Only a single entity of each module serves as its output (referred to as “ communicating intermediates” in

We developed a mathematical model to simulate the responses of different modules of the MAPK pathway to a series of experimental perturbations.

Computational simulation of the MAPK pathway

Our mathematical model consists of a set of ordinary differential equations (ODE) which describe the biochemical reactions of the MAPK pathway (see Additional file

**MAPK model.** In this file, we have provided the details of the ODE and the SDE models which were created to simulate the noise free and noisy perturbation response of the MAPK pathway respectively.

Click here for file

Network reconstruction from simulated response of the MAPK pathway

For network reconstruction, we calculated the global responses of each module to different perturbations using Eq. 1. These responses form the global response matrix **
R
**. The rows of this matrix represent the network modules and the columns represent the perturbations performed on the MAPK pathway.

**Figure S1.** In this figure, we have illustrated the convergence of the Gibbs samplers which were created to reconstruct the MAPK pathway from noise free simulation data.

Click here for file

Evaluating the performance of BVSA

BVSA produces a probability matrix **
P
** with the elements

Since BVSA uses a MCMC method to approximate the posterior distribution of the network structure its accuracy depends on the approximation error. Hence, it is necessary to evaluate the robustness of BVSA against MCMC related approximation errors. This was done by executing BVSA 10000 times on the same dataset. This resulted in 10000 different probability matrices from each of which we calculated the AUROCs and AUPRs. Then we calculated the mean and standard deviations of the AUROCs and AUPRs. The mean AUROC and AUPR represent the average performance of BVSA, and the standard deviation represents the uncertainty surrounding the performance estimate. For BVSA to be robust, the standard deviations of AUROC and AUPR must be much smaller than the corresponding means. The mean AUROC and AUPR were found to be ≈0.98 and ≈0.88 and the corresponding standard deviations were ≈0.02 and ≈0.016 respectively, suggesting near perfect and highly robust performance of BVSA on the simulated data.

We compared the performance of BVSA with that of stochastic MRA **
R
** using TLSR

Similar to MRA and LMML, SBRA infers the interaction strengths in the form of a weight matrix **
W
**

Network reconstruction from noisy datasets

The perturbation responses simulated by the ODE model are noise free. Real biological datasets are usually contaminated with biological noises and measurement errors. We introduced biological noise and measurement errors in the MAPK pathway simulations and used the resulting noisy datasets for network reconstruction. Biological noise is caused by many factors, such as, random thermal fluctuations, Brownian motion of the biochemical molecules, genetic variability within a cell population, etc. We developed a stochastic differential equation (SDE) model to simulate the effects of some of these factors

Furthermore, we added measurement errors to the stochastically simulated responses. Measurement errors in biological datasets depend on many factors ranging from inherent biological variability to sample preparation and consistent equipment accuracy

Here, _{
b
} is the signal independent or background noise, _{
s
} is signal dependent noise and _{
b
} and the signal dependent noise _{
s
} vary among different measurement systems. However, in most high throughput proteomic experiments _{
b
}<0.1 and _{
s
}<1 _{
s
}) and independent (_{
b
}) measurement errors. We started with _{
b
} = 0.01, _{
s
} = 0.1 and generated 10000 datasets by repeating the stochastic simulations of the perturbation experiments and then introducing random measurement errors. A network was inferred from each of these datasets using BVSA. Similar to the noise free data, we used five parallel Gibbs samplers for each module. In this case we used 500 iterations since noisy data may slow down convergence. To see whether all parallel samplers converge to the same distribution we plotted (Additional file **
A
**

**Figure S2.** In this figure, we have illustrated the convergence of the Gibbs samplers which were created to reconstruct the MAPK pathway from noisy simulation data.

Click here for file

**Supplementary Table S1.** In this table we have shown the AUROCs and their standard deviations calculated from the MAPK pathway topologies reconstructed by BVSA, stochastic MRA, SBRA and LMML at different levels of signal dependent and independent noises.

Click here for file

**Supplementary Table S2.** In this table we have shown the AUPRs and their standard deviations calculated from the MAPK pathway topologies reconstructed by BVSA, stochastic MRA, SBRA and LMML at different levels of signal dependent and independent noises.

Click here for file

As in the case of BVSA, the performances of stochastic MRA, SBRA and LMML were also evaluated by generating 10000 datasets for each noise level (i.e. for each combination of _{
b
} and _{
s
} values) and executing these algorithms on each of these data sets. The resulting connection coefficient matrices (in case of stochastic MRA and LMML) and weight matrices (in case of SBRA) were then used to calculate the corresponding AUROC and AUPR values. The resulting AUROC and AUPR values (see Additional file ^{1}) at each noise level (Figure

Best performers in reconstructing the the MAPK pathway at different levels of signal dependent and independent noises

**Best performers in reconstructing the the MAPK pathway at different levels of signal dependent and independent noises.** The performances were evaluated for different levels of signal dependent (_{s}, _{b}, _{s} and _{b} were gradually increased from a minimum of 0.1 up to 1 and from 0.01 up to 0.1 respectively. The best performers in terms of maximum average AUROC(**(a)** and **(b)** respectively.

Network reconstruction from incomplete sets of perturbations

For real biological networks,it often is impossible to perturb each network module, separately or in combination. Accordingly, the resulting datasets usually do not contain complete information for a full reconstruction of the underlying network. Here we demonstrate that even in such cases BVSA can reveal salient features of network structures with better accuracy than its counterparts.

Firstly, we simulated steady state responses of the MAPK pathway after perturbing only five out of six modules (adapter, initiator, MAP3K, MAP2K and MAPK) modules by knocking down Shc, Ras, Raf, MEK and ERK one at a time. We assumed that the knockdowns were performed with 80% efficiency. The simulations were performed stochastically to account for biological noise. Additionally, simulated measurement errors (_{
b
} = 0.01, _{
s
} = 0.1) were added to the perturbation responses. No repetitions of the knockdown experiments were performed. This yielded noisy steady state responses of the MAPK modules to five different perturbations. Classical MRA

Network reconstruction from incomplete perturbation data

**Network reconstruction from incomplete perturbation data.****(a)** and AUPR in panel**(b)**. The error-bars indicate the standard deviations of the corresponding AUROCs in panel **(a)** and AUPRs in panel **(b)**. The results suggest that the networks reconstructed by BVSA is significantly more accurate than both LMML and random guesses even when only half of the MAPK modules were perturbed and no repetitions of the experiments were available.

In the simulation study of the MAPK pathway we established that BVSA can accurately infer network structures from perturbation data and it is robust against biological noises, measurement errors, and insufficient perturbation experiments. However, the above study does not demonstrate the scalability of BVSA, i.e. whether BVSA can be efficiently implemented to infer larger networks, e.g. GRNs consisting of hundreds or even thousands of genes. Below, we address this issue by using simulated perturbation responses of a 10 gene and a 100 gene GRN and compare its performance with that of (stochastic) MRA, SBRA and LMML.

Simulation study: in-silico GRNs

For this study we chose two in-silico gene regulatory networks which were previously provided as a part of the fourth network inference challenge of the DREAM consortium http://wiki.c2b2.columbia.edu/dream/index.php/ Challenges. The chosen networks are indexed as network 1 in the 10 gene and 100 gene categories, respectively, in the DREAM-4 data repository (

We used BVSA, stochastic MRA ^{2} on the same datasets and calculated: (a) the average AUROC and the corresponding standard deviation, (b) the average AUPR and the corresponding standard deviation, (c) the average time taken to finish execution for each of the four algorithms. The results of this analysis, along with the performances of the winning algorithms (

**Algorithm**

**10 Gene network**

**100 Gene network**

**AUROC**

**AUPR**

**Time (secs)**

**AUROC**

**AUPR**

**Time (secs)**

The results are shown in mean ± std format. The information regarding the performance of Kuffner et. al.’s algorithm on the 100 gene dataset is not available since they did not participate in the 100 gene category of the DREAM4 challenge. The execution times of Pinna et. al.’s amd Kuffer et. al.’s algorithms were not published and therefore not available. Unavailble information is shown by ‘NA’ in the table.

BVSA

0.9323 ± 0.0121

0.7311 ± 0.011

6.023 ± 0.119

0.85 ± 0.0101

0.14 ± 0.0108

1384.92 ± 12.8

stochastic MRA

0.9231

0.7133

0.0008

0.709

0.037

0.68

SBRA

0.7572 ± 0.019

0.58 ± 0.02

0.11 ± 0.02

0.65 ±0.003

0.075 ±0.01

1520 ± 3.319

LMML

0.8035 ± 0.06

0.66 ± 0.07

27.32 ± 1.73

0.644 ±0.02

0.04 ±0.001

41562 ± 3722.2

Kuffer et. al.

0.972

0.916

NA

NA

NA

NA

Pinna et. al.

0.764

0.590

NA

0.914

0.536

NA

The results (Table

In the 100 genes category, BVSA outperformed most of the other algorithms (Stochastic MRA, SBRA and LMML) except that proposed by Pinna et. al. (

Encouraged by the above results we used BVSA to infer the topology of the ERBB regulated G1-S transition pathway in breast cancer cells from real experimental data.

Real datasets: ERBB regulated G1/S transition in human breast cancer cells

ERBB receptors are a family of four structurally related receptor tyrosine kinases (RTK) which form homodimers, heterodimers, and possibly higher-order oligomers upon activation by growth factors such as EGF, TGF-

At the end of G1 phase of cell cycle when the cells reach their final stage of growth they decide whether to divide, delay division or enter a resting stage. The decision making process involves phosphorylation of the retinoblastoma protein pRB by different Cyclin/CDK complexes. Unphosphorylated pRB proteins bind to E2F family of transcription factors and inhibit its activity. Upon phosphorylation, pRB proteins dissociate from E2F resulting in its activation. A eukaryotic cell commits to divide and initiates DNA replication (i.e. enter into S phase) when active E2F triggers transcription of the necessary genes. The ERBB regulated signaling pathways influence this mechanism by releasing Cyclin/CDK complexes from their inhibitor proteins (Cyclin Dependent Kinase inhibitors) p21 and p27. In 20-30% of breast cancers, ERBB2, a member of the ERBB family of receptors, is overexpressed resulting in a malfunction of control points in the cell division process and unrestricted growth. These cancers are usually treated with Trastuzumab, a recombinant antibody designed to block the ERBB2 activity. However, about two third of the ERBB2 overexpressing breast cancer patients are found to be Trastuzumab resistant ab. initio

In a notable effort, Sahin et. al. systematically perturbed key components of ERBB mediated signaling pathways and the G1/S transition mechanisms in Trastuzumab resistant breast cancer cells to understand how the former influence the later and vice versa ^{
3
} using BVSA, (stochastic) MRA, SBRA and LMML to unravel the interactions among the above proteins. To estimate the accuracy of each of these algorithms, we first developed a literature based reference pathway (Figure

A network of ERBB regulated G1/S transition constructed from the literature

**A network of ERBB regulated G1/S transition constructed from the literature.** The numbers associated with the network edges point to corresponding references listed in the Additional file

**References for ERBB pathway.** In this file we have provided the references for different interactions of the ERBB pathway.

Click here for file

In case of BVSA, we used five parallel Gibbs samplers to search for the potential regulators of each protein. Each sampler was allowed to sample for 2000 iterations. The entire simulation took ≈3 minutes to complete on an intel core i7-820m processor based laptop computer with 12 Giga bytes of RAM. To see whether all parallel samplers converge to the same distribution we plotted (Additional file **
A
**

**Figure S3.** In this figure, we have illustrated the convergence of the Gibbs samplers which were created to reconstruct the ERBB-G1/S transition pathway from experimentally obtained perturbation data.

Click here for file

The inferred network (Figure

Reconstruction of the ERBB regulated G1/S transition mechanism by BVSA (panel a), stochastic MRA (panel b), SBRA (panel c) and LMML (panel d)

**Reconstruction of the ERBB regulated G1/S transition mechanism by BVSA (panel a), stochastic MRA (panel b), SBRA (panel c) and LMML (panel d).**

Similarly, ERK was also found to directly regulate the activity of pRB1 (Figure

We also identified a number of potential feedback mechanisms. For instance, pRB was found to feedback into its upstream kinases CDK2, CDK4 and even further upstream, into the kinases AKT and ERK (Figure

Some of the feedback mechanisms identified in this analysis can potentially explain the observed Trastuzumab resistance in HCC1954 cells. In fact, our reconstructed model identified two feedback mechanisms which were experimentally proved by other researchers to cause Trastuzumab resistance in ERBB2 overexpressing breast cancer cells. These feedback loops involve AKT and ERK mediated regulation of ERBB receptors (Figure

Additionally, the ERK-ERBB feedback loop which was also inferred by BVSA as a direct feedback from ERK to ERBB is in-fact mediated by EGR1

We found credible evidence in the literature to support all but two interactions inferred by BVSA. The literature references regarding the inferred interactions are provided in the SI. At the same time, a few known mechanisms involving ERBB regulated signaling pathways and the G1/S checkpoints were not identified by BVSA. In Figure

We also used the Median Probability Model, i.e. _{
t
h
} = 0.5 **
P
** which was inferred by BVSA. The resulting network is shown in Additional file

**Figure S4.** In this figure, we have shown the topology of the ERBB-G1/S transition network as reconstructed by the Median Probability Model.

Click here for file

For further comparisons, we employed MRA **], SBRA [**
^{
6
} random realizations of the steady state perturbation responses were drawn from Gaussian distributions with means and standard deviations obtained from experimental data ^{
6
} realizations of each connection coefficient _{
i
j
} were used to infer the structure of the ERBB regulated G1/S transition mechanism. In most cases, a few realizations (typically <1_{
i
j
} had very different values from the bulk of its values. These “outliers” were discarded by rejecting 1_{
i
j
}. The connection coefficients which had high variances even after rejecting the outliers were assumed to be unidentifiable and were discarded from the analysis. The values of the remaining connection coefficients were then subjected to a Z-test which calculates a p-value to determine whether its mean is close enough to 0. If the p-value is less than 0.05 then the mean of the _{
i
j
} is significantly different from 0, i.e. in this case, _{
i
j
} represents a true network connection. We then used the Benjamini-Hochberg _{
i
j
}represents an activating or inhibitory interaction we first calculated the histogram of each _{
i
j
}. The histograms are shown in Additional file _{
i
j
}is larger than the fraction of positive realizations then _{
i
j
} is assumed to represent an inhibitory interaction. Otherwise, it represents an activating interaction. The above procedure took approximately 3 hours and 27 minutes to complete by the same computer which was used to implement BVSA on the ERBB2 dataset. The network which was reconstructed this way is shown in Figure

**Figure S5.** In this figure, we have shown the histograms of the connection coefficients of the ERBB regulated G1/S transition pathway as calculated by the stochastic MRA algorithm.

Click here for file

Furthermore, we reconstructed the same pathway using SBRA. SBRA does not infer connection coefficients. Instead, it infers a weight matrix **
W
** which represents the strength of the interactions. The sign of the elements of

Finally, we reconstructed the ERBB pathway using LMML

The above analysis suggests that BVSA provides an overall faster and more accurate solution to the network reconstruction problem when compared to other network inference algorithms such as MRA, SBRA and LMML. However, our comparison of accuracy depends on the reference ERBB pathway which was constructed from literature. We selected only highly cited experimental results to construct the reference pathway. However, not all of these experiments were performed on the same cell line as the one used by Sahin and colleagues

Discussion and conclusion

In this paper, we propose a network inference algorithm which combines modular response analysis with Bayesian variable selection techniques. This algorithm is capable of reconstructing network topologies from noisy perturbation responses of biochemical systems. It is more accurate than two previously proposed stochastic formulations of MRA, one based on TLS regression _{
i
j
}, _{
i
j
}), thereby minimizing the possibilities of false positives, whereas the stochastic MRA methods lack this capability due to lack of appropriate regularization techniques. The proposed BVSA algorithm is also performs better than a recently proposed Levenberg-Marquardt optimization based Maximum Likelihood (LMML) method **-**

Additionally, BVSA is vulnerable to collinearity in experimental data

The biggest concern of using statistical network inference algorithms to analyze biological datasets is the reliability of the predicted networks. One way of increasing reliability is to make systematic use of all existing information regarding the biochemical networks which the researcher wants to explore

Methods

The prior distributions of the unknown variables

The prior distribution of the binary variables _{
i
j
}

Biochemical entities such as genes and proteins interact with only selective groups of partners, making biochemical networks sparse systems. Network sparsity implies that for any two arbitrary nodes _{
i
j
} has a small probability of being 1, typically _{
i
j
} = 1)<0.5 Therefore, if we denote _{
i
j
} = 1) = **
A
**. By the same rationale, we choose

Following this notion, we assigned _{
i
j
} = 1)≤0.57 (see Additional file

The prior distribution of the connection coefficients _{
i
j
}

We conceptually divide a ^{
t
h
}subnetwork consists of the interactions between node **
r
**

In Eq. 6, ^{
4
}

The value of _{
i
j
} even when a network has far more nodes than the number of perturbations performed.

The prior distribution of the error _{
i
k
}

_{
i
k
} is a linear combination of the noise present in individual measurements _{
i
k
} is a Gaussian random variable _{
i
k
} is equally likely to have positive or negative values and hence its distribution is centered around 0, i.e. has zero mean. The variance (^{
2
}) of _{
i
k
} depends on biological noises and measurement errors and can vary drastically depending on the type of network being investigated and measurement systems used in the investigation. Therefore, our knowledge about the true nature of the noise variance ^{
2
} is uncertain. To account for the uncertainties in the noise variance ^{
2
}, we assumed that ^{
2
} has an inverse gamma distribution with scale parameter ^{
2
}. Following this notion, we selected ^{
2
} has a flat prior which implies that it can have a wide range of positive values.

The posterior distribution of the binary variable _{
i
j
}

The posterior distribution of the binary variables corresponding to each subnetwork was calculated separately. Let us denote by **
A
**

Step by step analytical calculations which lead to the above expression are illustrated in Figure **
A
**

To determine the true posterior of **
A
**

Approximating the posterior distribution of _{
i
j
} using Gibbs sampling

We implemented a Gibbs sampler for approximating the posterior distribution of **
A
**

for all

_{
1
} and _{
0
} in Eq. 9 can be calculated using Eq. 7.

Repeated successive sampling of Eq. 9 for all components of **
A
**

The samples drawn after the “burn in” period can be used to calculate the posterior probability of _{
i
j
} = 1 which represents an individual edge emanating from node _{
i
j
}) was calculated as shown below:

Here, _{
c
} is the number of Gibbs samplers initiated for each **
A
**

Thresholding the posterior probabilities of _{
i
j
}

The topology of the underlying network can be determined by thresholding _{
i
j
} with a threshold probability _{
t
h
}, i.e., if _{
i
j
}≥_{
t
h
} it can be assumed that node _{
i
j
}<_{
t
h
}then node _{
t
h
} should be chosen carefully. During the performance evaluation phase, when the network topology is known, the standard approach is to construct a series of networks for different values of _{
t
h
} in the range [ 0,1]. The topology of each network is then compared with the known topology and the overall performance of the algorithm is determined using Receiver Operating Characteristics (ROC) curves. This procedure is discussed in details in the results section.

When the network structure is unknown, determining the correct _{
t
h
}is crucial. In this case, the most commonly used approach is the Median Probability Model (MPM) _{
t
h
}**=0.5. It has been shown that under certain conditions MPM ensures optimal performance [**
_{
t
h
} = 0.5 no longer yields optimal results _{
i
j
} is uniformly distributed within the interval [ 0,1], _{
t
h
}≈0.5 and our thresholding scheme resembles MPM. However, high level of multicollinearity often results in _{
i
j
}<0.5 even when there is a direct influence from node

Endnotes

^{
1
} Based on Benjamini-Hochberg corrected t-test between the AUROCs and AUPRs of the best and second best performers.

^{
2
} All computations were performed in a laptop computer equipped with core i7-3610Qm processor and 20 Gigabytes of Random access memory.

^{
3
} we considered only those perturbations which directly targeted the measured proteins. Only nine out of ten measured proteins were targeted by their corresponding siRNA. pRB was not targeted for siRNA mediated knockdown.

^{
4
} Precession is the inverse of variance.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

TS performed the simulations and wrote the manuscript, WK and BNK designed the experiments and wrote the manuscript. All authors read and approved the final manuscript.

Acknowledgments

This project was supported by Science Foundation Ireland under Grant No. 06/CE/B1129 and European Union Grant PRIMES No. FP7-HEALTH-2011-278568. We thank Vladislav Vyshemirsky, Norma Coffey and Nial Friel for stimulating discussions and advices on several statistical aspects of this paper.