Peter White, PhD is a principal investigator in the Center for Microbial Pathogenesis at The Research Institute at Nationwide Children’s Hospital and an Assistant Professor of Pediatrics at The Ohio State University. He is also Director of Molecular Bioinformatics, serving on the research computing executive governance committee, and Director of the Biomedical Genomics Core, a nationally recognized microarray and next-gen sequencing facility that help numerous investigators design, perform and analyze genomics research. His research program focuses on molecular bioinformatics and high performance computing solutions for "big data", including discovery of disease associated human genetic variation and understanding the molecular mechanism of transcriptional regulation in both eukaryotes and prokaryotes.
The BGC Bioinformatics Unit is comprised of a dynamic team of computational biologists, with the substantial technical and bioinformatics expertise required to oversee the multiple platforms that acquire, store and analyze the very large and complex data sets generated by the BGC Microarray and Sequencing Units. The unit provides advanced bioinformatics analysis on a collaborative basis and serves as an interface between the research investigator and the multiple domains that are required to handle the size and complexity of genomic data, including Research Information Services (RIS) and the Ohio Supercomputing Center (OSC). With our high performance compute cluster and over 100TB of clustered high performance disk space we are able to support the analysis of both large and small scale sequencing projects.
As part of the research group of Dr. Peter White, a major research focus for our team is the development of analytical pipelines for BIG DATA, including:
- Human genome resequencing and the identification of disease causing genetic variants
- Bacterial genomics including RNA-Seq,ChIP-Seq and TN-Seq
- Re-sequencing and assembly of bacterial genome sequencing data
- Analysis pipeline development, automation and optimization
Next generation sequencing (NGS) has revolutionized genetic research, empowering dramatic increases in the discovery of new functional variants in both syndromic and common diseases. The technology has been widely adopted by the research community and is now seeing rapid adoption clinically, driven by recognition of NGS’s diagnostic utility and enhancements in quality and speed of data acquisition. Compounded by declining sequencing costs, this exponential growth in data generation has created a computational bottleneck. Current analysis approaches can take weeks to complete resulting in bioinformatics overheads that exceed raw sequencing costs and represent a significant limitation for those utilizing the technology.
Churchill is a computational approach that overcomes these challenges. Through implementation of novel parallelization techniques we have dramatically reduced the analysis time for deep whole human genome resequencing from weeks to hours, without the need for specialized analysis equipment or supercomputers. Our approach fully automates the analytical process required to take raw sequencing instrument output through the complex and computationally intensive process of alignment, post-alignment processing, local realignment, recalibration and genotyping. Furthermore, through optimized utilization of available compute resources, the pipeline scales in a near linear fashion, enabling complete human genome resequencing analysis in ten hours with a single server, three hours with our in-house cluster and under 100 minutes using a larger HPC cluster.