Explorative visual analytics on interval-based genomic data and their metadata.
AbstractBackground With the wide-spreading of public repositories of NGS processed data, the availability of user-friendly and effective tools for data exploration, analysis and visualization is becoming very relevant. These tools enable interactive analytics, an exploratory approach for the seamless “sense-making” of data through on-the-fly integration of analysis and visualization phases, suggested not only for evaluating processing results, but also for designing and adapting NGS data analysis pipelines.
Results This paper presents abstractions for supporting the early analysis of NGS processed data and their implementation in an associated tool, named GenoMetric Space Explorer (GeMSE). This tool serves the needs of the GenoMetric Query Language, an innovative cloud-based system for computing complex queries over heterogeneous processed data. It can also be used starting from any text files in standard BED, BroadPeak, NarrowPeak, GTF, or general tab-delimited format, containing numerical features of genomic regions; metadata can be provided as text files in tab-delimited attribute-value format. GeMSE allows interactive analytics, consisting of on-the-fly cycling among steps of data exploration, analysis and visualization that help biologists and bioinformaticians in making sense of heterogeneous genomic datasets. By means of an explorative interaction support, users can trace past activities and quickly recover their results, seamlessly going backward and forward in the analysis steps and comparative visualizations of heatmaps.