Loading...
Thumbnail Image
Publication

Ensemble approach combining multiple methods improves human transcription start site prediction.

Dineen, David G
Schröder, Markus
Higgins, Desmond G
Cunningham, Pádraig
Advisors
Editors
Other Contributors
Date
2010
Date Submitted
Keywords
Other Subjects
Subject Mesh
Base Pairing
Computational Biology
Genome, Human
Humans
Principal Component Analysis
Promoter Regions, Genetic
Software
Transcription Initiation Site
Planned Date
Start Date
Collaborators
Principal Investigators
Alternative Titles
Publisher
Abstract
The computational prediction of transcription start sites is an important unsolved problem. Some recent progress has been made, but many promoters, particularly those not associated with CpG islands, are still difficult to locate using current methods. These methods use different features and training sets, along with a variety of machine learning techniques and result in different prediction sets.
We demonstrate the heterogeneity of current prediction sets, and take advantage of this heterogeneity to construct a two-level classifier ('Profisi Ensemble') using predictions from 7 programs, along with 2 other data sources. Support vector machines using 'full' and 'reduced' data sets are combined in an either/or approach. We achieve a 14% increase in performance over the current state-of-the-art, as benchmarked by a third-party tool.
Supervised learning methods are a useful way to combine predictions from diverse sources.
Language
en
ISSN
1471-2164
eISSN
ISBN
DOI
10.1186/1471-2164-11-677
PMID
21118509
PMCID
Sponsorships
Funding Sources
Funding Amounts
Grant Identifiers
Methodology
Duration
Ethical Approval