Hdl Handle:
http://hdl.handle.net/10147/119156
Title:
Missing value imputation for epistatic MAPs
Authors:
Ryan, Colm; Greene, Derek; Cagney, Gerard; Cunningham, Padraig
Citation:
BMC Bioinformatics. 2010 Apr 20;11(1):197
Issue Date:
20-Apr-2010
URI:
http://hdl.handle.net/10147/119156
Abstract:
Abstract Background Epistatic miniarray profiling (E-MAPs) is a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others. An effective method for imputing interactions would therefore increase the types of possible analysis, as well as increase the potential to identify novel functional interactions between gene pairs. Several methods have been developed to handle missing values in microarray data, but it is unclear how applicable these methods are to E-MAP data because of their pairwise nature and the significantly larger number of missing values. Here we evaluate four alternative imputation strategies, three local (Nearest neighbor-based) and one global (PCA-based), that have been modified to work with symmetric pairwise data. Results We identify different categories for the missing data based on their underlying cause, and show that values from the largest category can be imputed effectively. We compare local and global imputation approaches across a variety of distinct E-MAP datasets, showing that both are competitive and preferable to filling in with zeros. In addition we show that these methods are effective in an E-MAP from a different species, suggesting that pairwise imputation techniques will be increasingly useful as analogous epistasis mapping techniques are developed in different species. We show that strongly alleviating interactions are significantly more difficult to predict than strongly aggravating interactions. Finally we show that imputed interactions, generated using nearest neighbor methods, are enriched for annotations in the same manner as measured interactions. Therefore our method potentially expands the number of mapped epistatic interactions. In addition we make implementations of our algorithms available for use by other researchers. Conclusions We address the problem of missing value imputation for E-MAPs, and suggest the use of symmetric nearest neighbor based approaches as they offer consistently accurate imputations across multiple datasets in a tractable manner.
Item Type:
Journal Article

Full metadata record

DC FieldValue Language
dc.contributor.authorRyan, Colm-
dc.contributor.authorGreene, Derek-
dc.contributor.authorCagney, Gerard-
dc.contributor.authorCunningham, Padraig-
dc.date.accessioned2011-01-11T12:42:39Z-
dc.date.available2011-01-11T12:42:39Z-
dc.date.issued2010-04-20-
dc.identifierhttp://dx.doi.org/10.1186/1471-2105-11-197-
dc.identifier.citationBMC Bioinformatics. 2010 Apr 20;11(1):197-
dc.identifier.urihttp://hdl.handle.net/10147/119156-
dc.description.abstractAbstract Background Epistatic miniarray profiling (E-MAPs) is a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others. An effective method for imputing interactions would therefore increase the types of possible analysis, as well as increase the potential to identify novel functional interactions between gene pairs. Several methods have been developed to handle missing values in microarray data, but it is unclear how applicable these methods are to E-MAP data because of their pairwise nature and the significantly larger number of missing values. Here we evaluate four alternative imputation strategies, three local (Nearest neighbor-based) and one global (PCA-based), that have been modified to work with symmetric pairwise data. Results We identify different categories for the missing data based on their underlying cause, and show that values from the largest category can be imputed effectively. We compare local and global imputation approaches across a variety of distinct E-MAP datasets, showing that both are competitive and preferable to filling in with zeros. In addition we show that these methods are effective in an E-MAP from a different species, suggesting that pairwise imputation techniques will be increasingly useful as analogous epistasis mapping techniques are developed in different species. We show that strongly alleviating interactions are significantly more difficult to predict than strongly aggravating interactions. Finally we show that imputed interactions, generated using nearest neighbor methods, are enriched for annotations in the same manner as measured interactions. Therefore our method potentially expands the number of mapped epistatic interactions. In addition we make implementations of our algorithms available for use by other researchers. Conclusions We address the problem of missing value imputation for E-MAPs, and suggest the use of symmetric nearest neighbor based approaches as they offer consistently accurate imputations across multiple datasets in a tractable manner.-
dc.titleMissing value imputation for epistatic MAPs-
dc.typeJournal Article-
dc.language.rfc3066en-
dc.rights.holderRyan et al.; licensee BioMed Central Ltd.-
dc.description.statusPeer Reviewed-
dc.date.updated2010-12-15T21:04:31Z-
All Items in Lenus, The Irish Health Repository are protected by copyright, with all rights reserved, unless otherwise indicated.