As genome sequencing becomes cheaper and faster, resulting in an exponential increase in data, the need for efficiency in predicting gene function and the need to train the next generation of bioinformatics scientists increases. Researchers in the laboratory of Lukas Mueller, a faculty member of the Boyce Thompson Institute (BTI), have developed a strategy to meet both needs, benefitting students and researchers.
The Mueller Lab created a framework that used the vast influx of new genome sequences as a training resource for students interested in genome annotation. This framework was published online in PLOS Computational Biology on April 3, 201
What is genome annotation and why is it important?
After researchers have determined the order of millions of base pairs of DNA in the genome of an organism: you have to figure out two things: Which DNA segments are genes that encode proteins, and what are these proteins' functions? This process of identifying genes and predicting their functions is called genome annotation.
"Predicting genes and their functions is what most biologists are interested in. Most understanding of biological processes happens here," says Prashant Hosmani, bioinformatics analyst at Mueller Lab and first author of the paper.
A genome is annotated by comparing its sequence with gene sequences from other related organisms. The most accurate method of genome annotation is manual curation, in which a person performs the analysis. In contrast, using a computer program to recognize genes and their functions is faster, but sometimes less accurate.
"Manual annotation is very time consuming and therefore expensive," said Surya Saha, senior bioinformatics analyst at Mueller Lab and project coordinator. "The trick is to do both: use automatic annotation first, and then focus on the genes and biochemical pathways of interest and comment on them manually."
The article describes a series of logical steps to start a basic annotation program from scratch. When the students join the project for the first time, they are trained by team leaders and experts with comments on craft tools.
Throughout the project, the students carefully record their research findings and findings, and finally summarize them for a report on the biochemical pathway of interest and the member gene families that can be published. In fact, this method was used to create a peer-reviewed publication with more than 20 authors.
"Working is one thing and it is also very important to get recognition for this work," says Hosmani. "This is a real motivation for the students."
Other benefits for students include collaborating with international collaborators, networking, practicing communication and peer review skills, and gaining valuable insights into career options. Students can also receive research or Capstone project credits for their work, which increases their commitment to the project. More and more science graduate programs also require knowledge in bioinformatics, so these skills will prove valuable in many areas.
In the end, researchers receive high-quality genome annotations for every species – not just plants – that provide a better understanding of how the organism works, ultimately benefiting society in many areas, such as agriculture, biofuels, and medicine.
The authors hope that other institutions will adapt and build on this framework, regardless of their size, access to resources or annotations. To simplify the use of the framework, the authors have designed their illustrations and tables to serve as references.
"Anyone with a research problem, a sequenced genome and interested students, can implement a system by expanding our workflow," said Saha.
Scientists investigate misidentified genes as "non-protein coding"
Prashant S. Hosmani et al., Brief Guide to Annotating Student-Driven Community Genomes, PLOS Computational Biology (2019). DOI: 10.1371 / journal.pcbi.1006682
A Universal Framework Combining Genome Annotation and Primary Education (2019, April 19)
retrieved on April 20, 2019
This document is subject to copyright. Apart from a fair trade for the purpose of private study or research, no
Part may be reproduced without written permission. The content is provided for informational purposes only.