Fork me on GitHub
DNAism
Exploring genomic datasets on the web with Horizon Charts

For the impatient

Open your console (we need git, tar and python for this to work) and run:


  $ git clone https://github.com/drio/dnaism.git
  $ cd dnaism/example/depth
  $ make
  Unpacking bed files ...
  point your browser here: http://localhost:8888
  python3 -m http.server 8888
  Serving HTTP on 0.0.0.0 port 8888 ...
  

And now point your browser to http://localhost:8888

If everything went well you should see multiple horizon charts encoding the read depth for multiple genomic samples for a small region of the genome.

Here is a video that shows these steps:

Introduction

Effective visualizations help us understand data and facilitate the navigation of intricate and dense data sets.

The web and its open standards present a fantastic environment to build visualizations and reach many people. You have probably heard of D3, a Javascript library build by Mike Bostock that beautifully abstracts and exposes these technologies for the purpose of generating and manipulating novel large-scale visualizations.

You are probably less familiar with Cubism, a project from the same author, also based on D3, that helps explore time series data using a visualization technique called Horizon Charts.

Horizon Charts are not only useful in the domain of time series data. Biological datasets, for example genomic data, can greatly benefit from them. That's why we have modified the original library to work with Genomic datasets. We call this new library DNAism.

A case for Horizon Charts

Imagine we have a single variable we want to explore (in this case stock market data over time). We could visualize it using a simple line chart (below).

This method is effective as soon as we have enough vertical space to encode the visuals. As we decrease the vertical space, we lose graphical perception. This problem increases as more variables or a greater range of values for those variables are introduced The visualization method stops being effective. Try it out by using the slider below:

Height size = 200

Horizon Charts can help us here. Below you can see the same dataset, this time visually encoded using a Horizon Chart.

You can find the details of the process for converting a line chart into a horizon chart here. We start by applying different colors to positive and negative values and flip the negative ones over the baseline. We then divide the chart in horizontal bands (initially one single band is used). As we reduce vertical space (try it by using the +/- buttons below), we create more bands and collapse them over the baseline. Different color intensities or shades are applied to the different bands in order to visually differentiate them.

1

Borrowed from Mike Bostock.

Requirements

In order to get the most out of DNAism you should become familiar with the DOM, CSS, HTML and D3. You can read more about the first three technologies here.

In the rest of this document we will learn about DNAism by building a visualization to help us explore the read depth for a particular genome region across multiple DNA sequencing samples. Our input data is in BED format:

 
  $ ls *.bed 
  18277.bed       23138.bed       30158.bed       34598.bed  ...  
  19466.bed       27347.bed       32510.bed       34600.bed  ...  

  $ head -3 18277.bed
  Chr17   1100003 1100004 36
  Chr17   1100004 1100005 35
  Chr17   1100005 1100006 36

  $ tail -3 18277.bed
  Chr17   1199991 1199992 41
  Chr17   1199992 1199997 40
  Chr17   1199997 1200000 41
  

These bed files live in the server, in the same location of the web files that contain the code of the article you are reading.

We will start with a basic html document that loads DNAism and its only dependency (D3). In this basic document we also load the css styles for the different elements that DNAism will be creating in the document as we progress in the visualization.

At the end of our html document we add a script tag. Within it, we will be adding all the Javascript code that will contain all the instructions necessary to build the visualization.

Creating a context

Our code will use the different DNAism components to create the visualization, starting with context(). We use it to tell the library what region of the genome we want to explore.


  var context = dnaism.context()
                 .start(1100000)
                 .stop(1200000)
                 .size(800)
                 .chrm('Chr17');
  

Here, we are interested in exploring (Chr17:1100000-1200000). We do that by setting the chromosome (usingchrm()) and the start() and stop() position of the region of interest. We also need to specify the space we have to visualize the data, in pixels.

We haven't rendered any web element yet, we are just telling DNAism what region we are interested on.

Sources

Next we define a source. This component encapsulates the logic on how to retrieve the actual data.


  var source_bedfile = context.bedfile();
  

For this example we are using the bedfile() source. This source's logic will request whole files from the server. Notice we haven't yet specified what files we want to load, we do that with the next component metric().

Metrics

Now we can use the source to instantiate as many metrics as necessary. Those metrics will point to a specific file (or sample). To start, we will work on a single sample (sample 18277). We store the metric in a Javascript array so we can add more metrics (samples) later.


  metric = [ source_bedfile.metric("data/18277.bed") ];
  

Now DNAism is ready to retrieve data for that particular sample. We can now start creating the visual elements.

Creating the horizon chart

If you are not familiar with Javascript, functional programming and common D3 patterns, this chunk of code may look unreadable. Bear with me here:


  d3.select("body").selectAll(".horizon")
      .data(metrics)
    .enter().insert("div", ".bottom")
      .attr("class", "horizon")
    .call(context.horizon()
      .format(d3.format(".2")));
  

It is a fundamental pattern in D3 and you should master it if you want to use DNAism beyond the basics.

What we are doing is selecting the html element where we will be drawing the horizon chart on (or charts if there is more than one sample we are working on). We also associate metrics (the data) to that selection and apply the necessary css styles to it (horizon class). Finally, D3 calls context.horizon() to create the necessary visual elements for the metric(sample) we want to create. Remember, in this case we have only one sample.

And here you have the result:

Notice how abnormal regions are easily spot (high and low coverage). Now we can explore the read depth for more samples, just by adding them to the metrics array:


  var metrics = [
    source_bedfile.metric("data/18277.bed"),
    source_bedfile.metric("data/19466.bed"),
    source_bedfile.metric("data/23138.bed"),
  ];
  

We can use the axis() component now to help us determine what part of the genome we are exploring:

And finally, we can add a rule() to help us compare the data value, for a specific location, across the different samples. Try hovering the mouse over the chart:

And here is how the final piece of code looks like:


  var context = dnaism.context()
                 .start(1100000)
                 .stop(1200000)
                 .size(800)
                 .chrm('Chr17');

  d3.select("#with_rule").selectAll(".axis")
      .data(["top", "bottom"])
    .enter().append("div")
      .attr("class", function(d) { return d + " axis"; })
      .each(function(d) {
        d3.select(this).call(context.axis().ticks(12).orient(d));
      });

  d3.select("body").append("div")
      .attr("class", "rule")
      .call(context.rule());

  var source_bedfile = context.bedfile();

  var metrics = [
    source_bedfile.metric("data/18277.bed"),
    source_bedfile.metric("data/19466.bed"),
    source_bedfile.metric("data/23138.bed"),
  ];

  d3.select("#with_rule").selectAll(".horizon")
      .data(metrics)
    .enter().insert("div", ".bottom")
      .attr("class", "horizon")
    .call(context.horizon()
      .format(d3.format(".2")));

  
  

Using other sources

If the region of the genome we are exploring is large, the browser will have to load a huge number of data points. That may be a problem depending the amount of memory you have in your system. But there is a solution to this issue.

DNAism is data format and backend agnostic. You can extend DNAism to accommodate the specific details of your own datasets. This flexibility allow us, for example, to change our sources to make sure we only sent to the browser the data points that are necessary for the visualization we intend.

When we create new sources, we are extending the ways for DNAism to load datasets. A source has a metric method which encapsulates all the logic for how to retrieve data for a specific genome location. The horizon() component will make calls to each of the metrics to retrieve the necessary data points that will be used in each horizon chart.

In the logic for the most simple source (bedfile), a request is made to the webserver to retrieve the whole file associated with the metric (the sample we are working on). When the rendering happens, DNAism has to reduce the number raw data points to fit the available number of pixels in the visualization.

The bedserver source implements the same logic, but it performs a request to a RESTful service. The backend does all the heavy lifting and only returns the necessary data points. If you want to know more about how the backend works, take a look at this. You'll see the server indexes the bedfiles to speed up the queries and quickly access the regions of interest.

Questions, comments and feedback

If you encounter any problem when using DNAism or you have feedback, please open a ticket. We promise we'll help you out as soon as possible.