Investigating the effect of paralogs on microarray gene-set analysis

Hdl Handle:
http://hdl.handle.net/10147/121770
Title:
Investigating the effect of paralogs on microarray gene-set analysis
Authors:
Faure, Andre J; Seoighe, Cathal; Mulder, Nicola J
Citation:
BMC Bioinformatics. 2011 Jan 24;12(1):29
Issue Date:
24-Jan-2011
URI:
http://hdl.handle.net/10147/121770
Abstract:
Abstract Background In order to interpret the results obtained from a microarray experiment, researchers often shift focus from analysis of individual differentially expressed genes to analyses of sets of genes. These gene-set analysis (GSA) methods use previously accumulated biological knowledge to group genes into sets and then aim to rank these gene sets in a way that reflects their relative importance in the experimental situation in question. We suspect that the presence of paralogs affects the ability of GSA methods to accurately identify the most important sets of genes for subsequent research. Results We show that paralogs, which typically have high sequence identity and similar molecular functions, also exhibit high correlation in their expression patterns. We investigate this correlation as a potential confounding factor common to current GSA methods using Indygene http://www.cbio.uct.ac.za/indygene, a web tool that reduces a supplied list of genes so that it includes no pairwise paralogy relationships above a specified sequence similarity threshold. We use the tool to reanalyse previously published microarray datasets and determine the potential utility of accounting for the presence of paralogs. Conclusions The Indygene tool efficiently removes paralogy relationships from a given dataset and we found that such a reduction, performed prior to GSA, has the ability to generate significantly different results that often represent novel and plausible biological hypotheses. This was demonstrated for three different GSA approaches when applied to the reanalysis of previously published microarray datasets and suggests that the redundancy and non-independence of paralogs is an important consideration when dealing with GSA methodologies.
Item Type:
Journal Article

Full metadata record

DC FieldValue Language
dc.contributor.authorFaure, Andre J-
dc.contributor.authorSeoighe, Cathal-
dc.contributor.authorMulder, Nicola J-
dc.date.accessioned2011-02-14T09:10:23Z-
dc.date.available2011-02-14T09:10:23Z-
dc.date.issued2011-01-24-
dc.identifierhttp://dx.doi.org/10.1186/1471-2105-12-29-
dc.identifier.citationBMC Bioinformatics. 2011 Jan 24;12(1):29-
dc.identifier.urihttp://hdl.handle.net/10147/121770-
dc.description.abstractAbstract Background In order to interpret the results obtained from a microarray experiment, researchers often shift focus from analysis of individual differentially expressed genes to analyses of sets of genes. These gene-set analysis (GSA) methods use previously accumulated biological knowledge to group genes into sets and then aim to rank these gene sets in a way that reflects their relative importance in the experimental situation in question. We suspect that the presence of paralogs affects the ability of GSA methods to accurately identify the most important sets of genes for subsequent research. Results We show that paralogs, which typically have high sequence identity and similar molecular functions, also exhibit high correlation in their expression patterns. We investigate this correlation as a potential confounding factor common to current GSA methods using Indygene http://www.cbio.uct.ac.za/indygene, a web tool that reduces a supplied list of genes so that it includes no pairwise paralogy relationships above a specified sequence similarity threshold. We use the tool to reanalyse previously published microarray datasets and determine the potential utility of accounting for the presence of paralogs. Conclusions The Indygene tool efficiently removes paralogy relationships from a given dataset and we found that such a reduction, performed prior to GSA, has the ability to generate significantly different results that often represent novel and plausible biological hypotheses. This was demonstrated for three different GSA approaches when applied to the reanalysis of previously published microarray datasets and suggests that the redundancy and non-independence of paralogs is an important consideration when dealing with GSA methodologies.-
dc.titleInvestigating the effect of paralogs on microarray gene-set analysis-
dc.typeJournal Article-
dc.language.rfc3066en-
dc.rights.holderFaure et al.; licensee BioMed Central Ltd.-
dc.description.statusPeer Reviewed-
dc.date.updated2011-02-11T13:20:32Z-
All Items in Lenus, The Irish Health Repository are protected by copyright, with all rights reserved, unless otherwise indicated.