Archive for the ‘Gene Sequencing Post-Processing’ Category

The day when a cost efficient technology for reading and sequencing DNA is available may be closer at hand. Intense research is focused on refining the most promising techniques. While IBM’s DNA Transistor Technology is in the forefront, efforts at Oxford Nanopore, Sandia National Labs and elsewhere are validating the trend.

The goal is to reduce costs to where an individuals’ complete genome can be sequenced for between $100 and $1,000. Once this becomes a reality the impact could be so significant as to create a brand new generation of health care capabilities. While there is no way to predict exactly how fast this technology will become available to the average researcher, Manfred Baier, head of Roche Applied Science maintains an optimistic position:

“We are confident that this powerful technology…will make low-cost whole genome sequencing available to the marketplace   faster than previously thought possible”

The technology involves the creation of nanometer sized holes in silicon based chips and then drawing strands of DNA through them. The key rests with forcing the strand to move through slowly enough for accurate reading and sequencing. In this case researchers have developed a device that utilizes the interaction of discrete charges along the backbone of a DNA molecule with a modulated electric field to trap the DNA in the nanopore. By turning these so named ‘gate voltages’ on and off scientists expect to be able to slow and manipulate the DNA through the nanopore at a readable rate. The effort combines the work of experts in nanofabrication, biology, physics and microelectronics.

A clarifying cross-section of this transistor technology has been simulated on the Blue Gene supercomputer. It shows a single-stranded DNA moving in the midst of (invisible) water molecules through the nanopore.

No matter how long it takes for the technology to become a cost-effective reality, it will be a true game-changer when achieved. While researchers express both optimism and caution on the timing, there is one inevitable result for which keen observers in related fields are preparing. When peoples’ individual genetic codes can be economically deciphered and stored, the amount of data generated will be massive. The consequent demands on data storage, mining and analytics will in turn generate their own new challenges.

Using the huge influx of new data to make more informed life science decisions is a key, long-range benefit of the current research efforts in sequencing technology. In health science alone revolutionary new approaches are expected to allow:

  • Early detection of genetic predisposition to diseases
  • Customized medicines and treatments.
  • New tools to assess the application of gene therapy
  • The emergence of DNA based personal health care

An equally critical benefit is the potential cost savings expected when sequencing technology, data storage and advanced predictive analytics combine allowing truly preventive medicine to take its place as the new foundation of health care.

In Next Generation Gene Sequencing, Don’t Forget the Data…and the Answers

In the next wave of gene sequencing techniques, the focus is mostly on the inputs.  Like this new nanopore approach by a computational physicist from the University of Illinois Urbana-Champaign.  By pulsing an electric field on and off around a strand of DNA, they can induce the DNA to expand and relax as it fits through the nanopore…just the behavior needed to read each protein.  So much innovation on the front end.  What about the outputs?

In a recent press release, one industry guru wants us to spend more time thinking about what to do with the data than how to generate it:

“[The] difficult challenge is accurately estimating what researchers are going to do with the data downstream. Collaborative research efforts, clever data mash-ups and near-constant slicing and dicing of NGS datasets are driving capacity and capability requirements in ways that are difficult to predict,” said Chris Dagdigian, principal consultant at BioTeam, an independent consulting firm that specialises in high performance IT for research. “Users today need to consider a much broader spectrum of requirements when investing in storage solutions.”

Unfortunately, one of today’s myths is that storage solutions are prepared to do the ‘near-constant slicing and dicing’ Mr. Dagdigian mentions.  Too often, high performance computers (née supercomputers) are used to sticking a big storage system on the end and dumping data.  The problem is that without industry leading tools to get data out of the storage system, the real challenge doesn’t end in the sequencing…it’s just beginning.

Is this a new problem?  Some think so.  For example, George Magklaras, senior engineer at the University of Oslo says “The distribution and post-processing of large data-sets is also an important issue. Initial raw data and resulting post-processing files need to be accessed (and perhaps replicated), analyzed and annotated by various scientific communities at regional, national and international levels. This is purely a technological problem for which clear answers do not exist, despite the fact that large-scale cyber infrastructures exist in other scientific fields, such as particle physics.  However, genome sequence data have slightly different requirements from particle physics data and thus the process of distributing and making sense of large data-sets for Genome Assembly and annotation requires different technological approaches at the data- network and middleware/software layers.”

New problems need new solutions.