Ensemble approach combining multiple methods improves human transcription start site prediction

Hdl Handle:
http://hdl.handle.net/10147/121047
Title:
Ensemble approach combining multiple methods improves human transcription start site prediction
Authors:
Dineen, David G; Schroder, Markus; Higgins, Desmond G; Cunningham, Padraig
Issue Date:
30-Nov-2010
URI:
http://hdl.handle.net/10147/121047
Abstract:
Abstract Background The computational prediction of transcription start sites is an important unsolved problem. Some recent progress has been made, but many promoters, particularly those not associated with CpG islands, are still difficult to locate using current methods. These methods use different features and training sets, along with a variety of machine learning techniques and result in different prediction sets. Results We demonstrate the heterogeneity of current prediction sets, and take advantage of this heterogeneity to construct a two-level classifier ('Profisi Ensemble') using predictions from 7 programs, along with 2 other data sources. Support vector machines using 'full' and 'reduced' data sets are combined in an either/or approach. We achieve a 14% increase in performance over the current state-of-the-art, as benchmarked by a third-party tool. Conclusions Supervised learning methods are a useful way to combine predictions from diverse sources.
Item Type:
Article
Language:
en
Keywords:
GENETICS; INFORMATION TECHNOLOGY
ISSN:
http://dx.doi.org/10.1186/1471-2164-11-677

Full metadata record

DC FieldValue Language
dc.contributor.authorDineen, David Gen
dc.contributor.authorSchroder, Markusen
dc.contributor.authorHiggins, Desmond Gen
dc.contributor.authorCunningham, Padraigen
dc.date.accessioned2011-02-03T10:35:28Z-
dc.date.available2011-02-03T10:35:28Z-
dc.date.issued2010-11-30-
dc.identifier.issnhttp://dx.doi.org/10.1186/1471-2164-11-677-
dc.identifier.urihttp://hdl.handle.net/10147/121047-
dc.description.abstractAbstract Background The computational prediction of transcription start sites is an important unsolved problem. Some recent progress has been made, but many promoters, particularly those not associated with CpG islands, are still difficult to locate using current methods. These methods use different features and training sets, along with a variety of machine learning techniques and result in different prediction sets. Results We demonstrate the heterogeneity of current prediction sets, and take advantage of this heterogeneity to construct a two-level classifier ('Profisi Ensemble') using predictions from 7 programs, along with 2 other data sources. Support vector machines using 'full' and 'reduced' data sets are combined in an either/or approach. We achieve a 14% increase in performance over the current state-of-the-art, as benchmarked by a third-party tool. Conclusions Supervised learning methods are a useful way to combine predictions from diverse sources.-
dc.language.isoenen
dc.subjectGENETICSen
dc.subjectINFORMATION TECHNOLOGYen
dc.titleEnsemble approach combining multiple methods improves human transcription start site predictionen
dc.typeArticleen
dc.language.rfc3066en-
dc.rights.holderDineen et al.; licensee BioMed Central Ltd.-
dc.description.statusPeer Reviewed-
dc.date.updated2010-12-23T18:01:38Z-
All Items in Lenus, The Irish Health Repository are protected by copyright, with all rights reserved, unless otherwise indicated.