Genetic classification of populations using supervised learning.

Hdl Handle:
http://hdl.handle.net/10147/238790
Title:
Genetic classification of populations using supervised learning.
Authors:
Bridges, Michael; Heron, Elizabeth A; O'Dushlaine, Colm; Segurado, Ricardo; Morris, Derek; Corvin, Aiden; Gill, Michael; Pinto, Carlos
Affiliation:
Astrophysics Group, Cavendish Laboratory, Cambridge, United Kingdom.
Citation:
Genetic classification of populations using supervised learning. 2011, 6 (5):e14802 PLoS ONE
Journal:
PloS one
Issue Date:
2011
URI:
http://hdl.handle.net/10147/238790
DOI:
10.1371/journal.pone.0014802
PubMed ID:
21589856
Abstract:
There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case-control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available.In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.
Item Type:
Article
Language:
en
MeSH:
Bulgaria; Case-Control Studies; Genetics, Population; Genome-Wide Association Study; Humans; Learning; Nerve Net; Polymorphism, Single Nucleotide; Principal Component Analysis; Scotland
ISSN:
1932-6203

Full metadata record

DC FieldValue Language
dc.contributor.authorBridges, Michaelen_GB
dc.contributor.authorHeron, Elizabeth Aen_GB
dc.contributor.authorO'Dushlaine, Colmen_GB
dc.contributor.authorSegurado, Ricardoen_GB
dc.contributor.authorMorris, Dereken_GB
dc.contributor.authorCorvin, Aidenen_GB
dc.contributor.authorGill, Michaelen_GB
dc.contributor.authorPinto, Carlosen_GB
dc.date.accessioned2012-08-15T14:17:09Z-
dc.date.available2012-08-15T14:17:09Z-
dc.date.issued2011-
dc.identifier.citationGenetic classification of populations using supervised learning. 2011, 6 (5):e14802 PLoS ONEen_GB
dc.identifier.issn1932-6203-
dc.identifier.pmid21589856-
dc.identifier.doi10.1371/journal.pone.0014802-
dc.identifier.urihttp://hdl.handle.net/10147/238790-
dc.description.abstractThere are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case-control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available.In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.en_GB
dc.language.isoenen
dc.rightsArchived with thanks to PloS oneen_GB
dc.subject.meshBulgaria-
dc.subject.meshCase-Control Studies-
dc.subject.meshGenetics, Population-
dc.subject.meshGenome-Wide Association Study-
dc.subject.meshHumans-
dc.subject.meshLearning-
dc.subject.meshNerve Net-
dc.subject.meshPolymorphism, Single Nucleotide-
dc.subject.meshPrincipal Component Analysis-
dc.subject.meshScotland-
dc.titleGenetic classification of populations using supervised learning.en_GB
dc.typeArticleen
dc.contributor.departmentAstrophysics Group, Cavendish Laboratory, Cambridge, United Kingdom.en_GB
dc.identifier.journalPloS oneen_GB
dc.description.provinceLeinsteren
All Items in Lenus, The Irish Health Repository are protected by copyright, with all rights reserved, unless otherwise indicated.