top of page

What is Statistical Genetics?

Midway Tutors

Nathnael Bekele


Statistical genetics uses methodologies in statistical analysis in order to further understand genetic variations, traits, and diseases using “from data collected on large samples of families or individuals” (Schork). This is very useful since understanding genetic variations associated with diseases can help in the treatment of patients before symptoms get severe. Furthermore, if molecular engineering and immunology get improved enough in the future, knowing that someone is more susceptible to a disease because of their genetic variations could mean that genetic editing can be used to treat the patient.


“Genome editing, also called gene editing, is an area of research seeking to modify genes of living organisms to improve our understanding of gene function and develop ways to use it to treat genetic or acquired diseases.” (NIH)

Genetic variation within families leads to diseases (NIH). Hence, statistical genetics enables scientists to understand how the expression and repression of certain genes affect one's susceptibility to diseases. When a gene is repressed, it is switched off (Santer). This means the gene’s products which are for a cell for the production of vital enzymes or cofactors are no longer constructed. When a gene is expressed, it is available for the making of these products.


There are many complicated statistical genetics methods nowadays. Development in computer science and improved understanding of genetics has made it possible to use computer programing to analyze collected genetic data.


With my knowledge of statistics, the following is a simple (possibly inaccurate) methodology I designed to test if gene expression and repression contribute to a disease:


The null hypothesis would be given that the person has the disease, and the probability that the subject has a specific gene expressed is the same as that of an individual without the disease. Then, we aggregate the number of times the gene has been expressed among all the samples of subjects who have the disease. We also aggregate the times the gene was expressed in the control group (the healthy individuals). Then, we find the 2 sample proportion p-value to find the probability that the gene is expressed given the person has the disease. We do this for all the genes. Finally, making a barplot of 1 minus the p-value should show which genes are more likely to be expressed in subjects with the disease than those without.


Genetic repression should also be tested. That is whether a gene being repressed is correlated with the subject having the disease. The same procedure can be used in order to test this alternative hypothesis by changing our focus to look at genes not expressed among ill subjects.


Because the sample of genes we have available is just a tiny proportion of the genetic data of the entire world or just the US, it is safe to assume that the differences in gene expression between the ill and healthy group are going to be a result of the presence or absence of the disease. Furthermore, confounding variables are going to be insignificant because of the large size of the sample and its small proportion to the population. As a result, the confounding variables will be distributed among both ill and healthy groups.


A drawback with this methodology is my limited understanding of genetics. A combination of gene expressions and repression could be what leads to a certain disease. Hence, It is important to consider both findings. Also, the fact whether the genes collected were from related subjects (Cristopher). This makes it difficult to compare different groups as genetic traits are more likely to be repeated among related individuals. Hence, it does not make sense to compare these groups to people who are unrelated.


Although this is just a simple method that might have a lot of flaws, it still demonstrates the power of statistical genetics. Using already known mathematical methods in statistics, one can help understand how a person might be more likely to have a disease in the future. This can help countless lives. However, both statistical genetics and genetic editing are growing and have a long way to go.





Sources


Cristopher V. Van Hout, Chapter 10 - Statistical approaches to rare disease analyses, Editor(s): Claudia Gonzaga-Jauregui, James R. Lupski, In Translational and Applied Genomics, Genomics of Rare Diseases, Academic Press, 2021, Pages 205-213, ISBN 9780128201404, https://doi.org/10.1016/B978-0-12-820140-4.00011-9. (https://www.sciencedirect.com/science/article/pii/B9780128201404000119)


Hubrecht Institute. "Curing genetic disease in human cells." ScienceDaily. ScienceDaily, 20 February 2020. <www.sciencedaily.com/releases/2020/02/200220141740.htm>.


“Gene Editing – Digital Media Kit.” National Institutes of Health, U.S. Department of Health and Human Services, 5 Nov. 2020, www.nih.gov/news-events/gene-editing-digital-press-kit.


Institute of Medicine (US) Committee on Assessing Interactions Among Social, Behavioral, and Genetic Factors in Health; Hernandez LM, Blazer DG, editors. Genes, Behavior, and the Social Environment: Moving Beyond the Nature/Nurture Debate. Washington (DC): National Academies Press (US); 2006. 3, Genetics and Health. Available from: https://www.ncbi.nlm.nih.gov/books/NBK19932/


Institute of Medicine (US) Committee on Assessing Interactions Among Social, Behavioral, and Genetic Factors in Health; Hernandez LM, Blazer DG, editors. Genes, Behavior, and the Social Environment: Moving Beyond the Nature/Nurture Debate. Washington (DC): National Academies Press (US); 2006. 3, Genetics and Health. Available from: https://www.ncbi.nlm.nih.gov/books/NBK19932/


Robert Santer, CHAPTER 8 - Cellular Mechanisms of Aging, Editor(s): Howard M. Fillit, Kenneth Rockwood, Kenneth Woodhouse, Brocklehurst's Textbook of Geriatric Medicine and Gerontology (Seventh Edition), W.B. Saunders, 2010, Pages 42-50, ISBN 9781416062318, https://doi.org/10.1016/B978-1-4160-6231-8.10008-X. (https://www.sciencedirect.com/science/article/pii/B978141606231810008X)



6 views0 comments

Recent Posts

See All

Comments


bottom of page