Show simple item record

dc.contributor.authorDineen, David G*
dc.contributor.authorSchroder, Markus*
dc.contributor.authorHiggins, Desmond G*
dc.contributor.authorCunningham, Padraig*
dc.date.accessioned2011-02-03T10:35:28Z
dc.date.available2011-02-03T10:35:28Z
dc.date.issued2010-11-30
dc.identifier.issnhttp://dx.doi.org/10.1186/1471-2164-11-677
dc.identifier.urihttp://hdl.handle.net/10147/121047
dc.description.abstractAbstract Background The computational prediction of transcription start sites is an important unsolved problem. Some recent progress has been made, but many promoters, particularly those not associated with CpG islands, are still difficult to locate using current methods. These methods use different features and training sets, along with a variety of machine learning techniques and result in different prediction sets. Results We demonstrate the heterogeneity of current prediction sets, and take advantage of this heterogeneity to construct a two-level classifier ('Profisi Ensemble') using predictions from 7 programs, along with 2 other data sources. Support vector machines using 'full' and 'reduced' data sets are combined in an either/or approach. We achieve a 14% increase in performance over the current state-of-the-art, as benchmarked by a third-party tool. Conclusions Supervised learning methods are a useful way to combine predictions from diverse sources.
dc.language.isoenen
dc.subjectGENETICSen
dc.subjectINFORMATION TECHNOLOGYen
dc.titleEnsemble approach combining multiple methods improves human transcription start site predictionen
dc.typeArticleen
dc.language.rfc3066en
dc.rights.holderDineen et al.; licensee BioMed Central Ltd.
dc.description.statusPeer Reviewed
dc.date.updated2010-12-23T18:01:38Z
refterms.dateFOA2018-08-16T02:09:32Z
html.description.abstractAbstract Background The computational prediction of transcription start sites is an important unsolved problem. Some recent progress has been made, but many promoters, particularly those not associated with CpG islands, are still difficult to locate using current methods. These methods use different features and training sets, along with a variety of machine learning techniques and result in different prediction sets. Results We demonstrate the heterogeneity of current prediction sets, and take advantage of this heterogeneity to construct a two-level classifier ('Profisi Ensemble') using predictions from 7 programs, along with 2 other data sources. Support vector machines using 'full' and 'reduced' data sets are combined in an either/or approach. We achieve a 14% increase in performance over the current state-of-the-art, as benchmarked by a third-party tool. Conclusions Supervised learning methods are a useful way to combine predictions from diverse sources.


Files in this item

Thumbnail
Name:
1471-2164-11-677.xml
Size:
67.45Kb
Format:
XML
Thumbnail
Name:
1471-2164-11-677.pdf
Size:
265.8Kb
Format:
PDF
Thumbnail
Name:
1471-2164-11-677-S3.ZIP
Size:
0bytes
Format:
Unknown
Thumbnail
Name:
1471-2164-11-677-S2.DOC
Size:
0bytes
Format:
Microsoft Word
Thumbnail
Name:
1471-2164-11-677-S1.XLS
Size:
0bytes
Format:
Microsoft Excel

This item appears in the following Collection(s)

Show simple item record