This extensively expanded third edition offers a practical introduction to Bio Data Science. With a hands-on approach to learning, this book offers ample opportunities to practice:
- Installing and utilizing Linux as a virtual machine or remotely
- Processing bio data with the programming language AWK
- Managing data with the relational database system MariaDB
- Analyzing and visualizing data with R
- Implementing good bioinformatics practices with Jupyter Notebook and GitHub
This book targets both students and professionals in the life sciences. While it is aimed at beginners, it also provides valuable tips and tricks for experienced researchers dealing with large datasets.
Worked examples illustrate how to utilize various bioinformatics tools such as BLAST, Clustal, PLINK, IGV, SAMtools, BCFtools, Mason2, Minimap, NCBI Datasets, Velvet, Jmol, and more for:
- Identifying bacterial proteins potentially associated with pathogenicity
- Querying molecular structures for redox-regulated enzymes
- Mapping and assembling real or simulated sequence reads
- Identifying and mapping molecular structure mutations in viruses
- Conducting genome-wide association studies
All software tools and datasets mentioned are freely available, and all code is accessible as Jupyter Notebooks on GitHub. Drawing from the author's experiences and knowledge gained from both academia and industry, this book provides a practical and comprehensive approach to bioinformatics.