Our lab does research in two complementary activities. First, we develop computational methods to analyze high-throughput genomic data, such as new tools and pipelines. Second, we develop software to better interact with--analyze, visualize, reproduce, scale, and share--high-throughput genomics data. Below are active and past projects in the lab.
Galaxy (http://galaxyproject.org) is a scientific analysis workbench used by thousands of scientists worldwide to analyze genomic, proteomic, imaging, and other large biomedical datasets. Galaxy’s user-friendly, web-based interface makes it possible for anyone, regardless of their informatics expertise, to create, run, and share large-scale robust and reproducible analyses. Galaxy accelerates biomedical research by bringing together tool developers and end users such as bench scientists and physician-researchers. There are more than 5,000 analysis tools available in Galaxy’s ToolShed (https://toolshed.g2.bx.psu.edu), and users run more than 200,000 analyses each month on Galaxy’s main public server (https://usegalaxy.org). OHSU’s precision cancer medicine programs use Galaxy to run clinical and research genomics analyses as well as machine learning workflows. Galaxy is funded by both NIH and NSF. (You can try Galaxy now using our public server)
Precision Cancer Medicine Informatics
We are developing data analysis methods and data management software to store, analyze, and integrate clinical, imaging, and molecular data for (1) treating cancer using precision therapies adapted over time; and (2) discovering and understanding mechanisms of resistance in cancer. This initiative brings together and advances many areas, including (a) development of computational analysis workflows to identify key biomarkers such as somatic mutations, gene expression, pathway activity, and tumor composition; (b) using public datasets in genomics, transcriptomics, and biological pathways together with patient data to correlate biomarkers with prognosis and predict therapeutic response; and (c) producing patient reports and interactive visualizations that provide precision therapy recommendations based on consensus amongst methods and enable differential analysis across timepoints. Key software used in this work includes LabKey for data management and visualization, G2P for finding key biological and clinically actionable biomarkers, and Galaxy for analysis workflow creation and execution.
Eukaryotic Genome Annotation for Research and Education
G-OnRamp is a collaboration between two successful and long-running projects — the Genomics Education Partnership (GEP) and the Galaxy Project. G-OnRamp provides biologists with an integrated, web-based, scalable environment for interactive annotation of eukaryotic genomes using large genomic datasets. It also provides educators with a platform to help undergraduates develop “big data” science skills through eukaryotic genome annotation. GEP is a consortium of over 100 colleges and universities that provides Classroom Undergraduate Research Experiences (CURE) in bioinformatics/genomics for students at all levels. G-OnRamp extends Galaxy with tools and workflows that creates UCSC Assembly Hubs and Apollo/JBrowse genome browsers with evidence tracks for sequence similarity, ab initio gene predictions, RNA-Seq, and repeats. Educators can use this system to design CUREs based on their favorite eukaryotic species (e.g., parasitoid wasps). G-OnRamp provides a VirtualBox virtual appliance and an AMI image for local and cloud (Amazon EC2) deployments. G-OnRamp is supported by the NIH.
Genotype-to-Phenotype Database (G2P)
G2P is an aggregate public clinical cancer knowledge base for storing and searching connections between genomic biomarkers (“genotypes”) and patient diagnosis, prognosis, and response to treatment (“phenotypes”). Key uses of G2P include (a) searching by somatic variant to find drugs known to lead to response or resistance in tumors with the variant; (b) searching by drug to identify different mutations in which it can lead to response; (c) searching clinical trials to find those associated with particular biomarkers or drugs. G2P combines biomarker-phenotype associations from 9 trusted and curated knowledge bases, including CIViC, OncoKB, PMKB, JAX CKB and the Cancer Genome Interpreter. Clinical trials data is also included from several sources as well. Users can perform full-text search on G2P and filter results using a web portal with intuitive visualizations.
Web-based Interactive Visual Analysis
Our lab develops frameworks and applications for doing interactive visual analysis on the Web. Visual analysis combines visualization with analysis tools & pipelines so that visual inspection can be used to guide tool & pipeline usage. One aspect of this work is enabling visualization of very large genomic datasets on the Web, and another aspect is integrating visualizations, tools, and pipelines in a meaningful way.
Venomics and Parasite Infection Strategies
This work uses a combination of transcriptomic (RNA-seq) and proteomic (Mass spec) methods to identify the proteins in parasitic wasps’ venom. This is joint work with Nate Mortimer, and manuscripts from this work appear in PLoS One and the PNAS.