A recent article in the WSJ once again highlights the steadily growing list of applications for predictive analytics in health/life science. This time the goal is identifying patients likely to stop their medication regimen before they do so.
According to a report from the nonprofit New England Healthcare Institute, an estimated one half to one-third of Americans don’t take their medications as prescribed by their doctors…contributing to about $290 billion a year in avoidable medical spending including excess hospitalization.
Is it any wonder with that level of cost at stake along just one vector in health care science that the demand for the best in predictive analytics is becoming more and more critical?
Significant cost savings across the entire spectrum of health care science, as well as more individualized service options for patients, are expected as inevitable results of the steady powering-up now seen in predictive analytics. There are many opportunities to review the positive results in applied case studies. These provide just a glimpse at the successes ahead.
Take for instance the results of SSPS technology and software at Texas Health Services, where the challenge was to limit health care costs without reducing the quality of service provided to patients. As Texas Health Services relates:
“With IBM SPSS Statistics Base, Texas Health Resources has greatly enhanced its ability to support its process-improvement initiative. Today, it not only detects process variations, but can determine the underlying causes such as a sicker-than-normal patient population.”
Data quality was improved while data mining costs were reduced by 50 percent.
These high yield results make it clear why developments in predictive analytics have become so worthy of front and center news for so many segments of the health care industry. Other endeavors are focused on identifying those individuals most likely to develop specific illnesses such as develop diabetes or cancer. Ingenix, a unit of United Health Group, and a customer of Netezza, has already successfully launched an effort to mine and analyze the underlying risk factors in patient data before an illness develops or advances.
As health care science, quality patient care and advancing medical technology struggle with cost factors, the value of predictive analytics in lowering costs increases exponentially.
The community vectors receiving value from electronic health records (EHR) is broad. EHR data has uses from epidemiology and populations at one end of the spectrum, and genotyping in computational biology for specific proteins at the other. EHR is prominent in healthcare legislation, issues of interest to governments vis-à-vis personalized medicine and genetic testing, and the number of lives represented is growing.
At some point soon – if it has not happened already – the patient will realize the value of EHR is not as much “Wow, my doctors are talking to each other and sharing my medical record,” but rather “My daughter was not prescribed x because EHR data mining showed the chance of an adverse reaction was lower for kids like her when they take y.” This ‘behind the curtain’ intelligence that invariably pulls medicine-and society-toward deeper personalization as an index of the increase in mine-able data will save more lives than relieve headaches of coordinating care.
But what about ‘Big Pharma’? As the innovators and distribution centers of the drug interventions, are there compelling value propositions for them to analyze EHR data? Most certainly, and these areas give just a start:
- Pharmacovigilance. This concept has been around for some time, seems the most obvious, and EHR seems to have a critical value proposition in drug safety and interactions. And the model is so simple and direct, the data sizes so large and rich, that – for example – the European Union (through its executive body, the European Commission) proctored a study identifying the most important adverse events to search for in the EHR pool. The question is not how to use it, but how to use it best.
- In Clinical Trials. Dr. Michael Kahn of Children’s Hospital, Denver, presented a paper to the National Institute of Health () and the first benefit listed was “Query EHR database to establish number of potential study Candidates”. The ability to use EHR databases to save time during participant selection and recruitment seems compelling.
- Comparative Effectiveness. Earlier this year, Steven Labkoff, MD, FACP formerly of Pfizer presented at the National Academy of Science conference “Electronic Health Records: Where Do We Go From Here?” Labkoff discussed the benefits Big Pharma can receive by aggressively using EHR data. A key mention by Labkoff is that new players-payors, PBMs, US government- are now in the market asking for “pharmacoeconomic justification” that a drug is better and cheaper than something on the formulary. Geisinger Health Network weighed in simply on how EHR can help on comparative effectiveness.
- Marketing & Branding. Bringing all the insights together from these areas tied to a single drug make a much stronger case that prescribing it has increased therapeutic value. As consumers and prescribers have more channels of data-centric information on their treatments, sound, large sample EHR-type data backing up claims of a drug’s benefits become critical.
Taken together, the EHR as our collective health repository and ‘effect’ database can make the pharma expenditures drop noticeably in many silos from picking the compounds to bring to market, testing and validating them, and ensuring they are used well once in the market.
Obamacare Helps the Middle Man?
Wall St. Journal reports that the ratings agency Fitch had increased the rating of McKesson (MCK), Cardinal Health (CAH) and AmerisourceBergen (ABC). “Fitch Ratings announced on Thursday increased optimism about drug distributors, noting low debt levels and ongoing growth in the industry, which is expected to be further helped by the federal health-care overhaul.”
Fitch sees “moderate increases” in prescription volumes when Obamacare goes into effect in 2014, and concomitant leverage for MCK, ABC and CAH. Like others, MCK is keenly aware of the disruptive power and shift in buying due to ObamaCare. McKesson has been giving monthly updates on the Federal Stimulus, especially as it relates to McKesson’s Electronic Health Record offering.
McKesson has had a particularly good run over the last few months. Successes include winning the coveted Walmart Supplier of the Year Award; increasing their shareholder dividend by 50%; and showing increased quarterly revenue for the quarter ending June 30, 2010. McKesson cited strength in its Distribution Solutions Group, the same group that grabbed the Walmart Supplier of the Year Award. That MCK Distribution Solutions Group is the one that will be able to leverage new buying contracts and volume when ObamaCare goes into place.
Consistent with the patient ‘land-grab’ happening among electronic health record providers including eclipsys, GE Health, Siemens, Cerner, Sage, PracticeFusion and Epic, McKesson took the hit on profit margins as they chose to invest invest more in developing their EHR offering. They, like others, are keenly aware and caught in the race to position their software as uniquely designed to qualify providers as achieving ‘meaningful use’. Said their CEO:
“We are pleased that the Department of Health and Human Services released the final rules for meaningful use and certification standards of electronic health record (EHR) systems under the Medicare and Medicaid incentive programs,” said Hammergren. “Providers now have the flexibility to achieve meaningful use by taking different paths to implementing an EHR system based on their needs and priorities, which we believe is critical to broad-based adoption. We remain focused on working with our customers to make sure they have the right resources in place to qualify for stimulus money and improve the quality of care for their patients.”
The future seems to hold a nice mix for McKesson: new, as yet inefficient Rx markets in existing distribution channels, increased Rx volume, and an existing clinical solution driven by ObamaCare stimulus and regulation. Maybe this time the ratings agencies are getting it right.
Recently, the Proceedings of the National Academy of Science (PNAS) published an article describing the successful completion of IBM researchers mapping neural pathways of a macaque monkey. IBM’s interest? One of IBM’s chief interests is looking at design in substrates that can generate lots of intelligence in small spaces. Being smart about squeezing lots of intelligence in physically small spaces can be used when designed next generation computer chips. Indeed, one of the findings IBM was interested in is how architecting intelligence in a space-limited network (like the brain, which is bounded by the skull on the upper plane and by quantum physics at the lower plan) is different from the unlimited space of a social network.
The PNAS article included this comment (emphasis ours)
“We derive a unique network incorporating 410 anatomical tracing studies of the macaque brain from the Collation of Connectivity data on the Macaque brain (CoCoMac) neuroinformatic database. Our network consists of 383 hierarchically organized regions spanning cortex, thalamus, and basal ganglia; models the presence of 6,602 directed long-distance connections; is three times larger than any previously derived brain network; and contains subnetworks corresponding to classic corticocortical, corticosubcortical, and subcortico-subcortical fiber systems.”
Why is deeper analysis of brain networks important for the genetic predictions of Alzheimers? Genetic testing for Alzheimer’s risk is at a standstill of sorts. Consider this snippet from a widely read article on the limitations of genetic screening testing for Alzheimers:
“For the majority of people who are at risk for the late-onset form of Alzheimer’s disease, the most important factors are age, female gender, family history, and presence of the gene APOE4 . Even though people who have the APOE4 gene are more likely to develop Alzheimer’s, genetic testing is not very useful because so many people who have APOE4 don’t go on to develop Alzheimer’s, and there are plenty of people who don’t have APOE4 that do develop Alzheimer’s.”
It seems likely that APOE4 alone is not a good indicator. What will be a much better indicator are multiple markers. One approach for finding multiple markers is looking for single markers like APOE4; a better approach is to look down neural pathways to find where APOE4 is acting together with other genes—and their joint activity, decay, presence, or absence is a much higher correlate. This is impossible without much deeper understanding of where the brain rural routes, highways, freeways, and superhighways exist.
It’s an amazing day, the beginning of something great! Netezza Corporation’s (www.netezza.com) (NYSE:NZ) new Health & Life Sciences division is launching our voice on Facebook, Twitter, LinkedIn, and more. As the premier analysis engine for all healthcare oriented data, Netezza will be a leading voice in what is now possible as we move toward truly predictive drugs, care, and outcomes.
My friend and colleague Bill Zanine, our Business Solutions Executive for Health & Life Sciences, will be the broadcast presence for where technology meets science meets business in his blog Rx for Analytics. Bill has spent years as feet on the street and in the elite ivory towers of global healthcare data vision, provision, and utilization.
Where will you find Netezza this year, next year, and beyond?
– When insurance payers are trying to identify fraud and abuse to pass those savings on to you the consumer, and do it before the fraudster is paid, not after, you’ll find Netezza
– When drug companies look to gene sequencing post-process analysis to identify which drugs your personalized medicine profile contraindicates, you’ll find Netezza
– When providers want to query a centralized electronic health record for aggregate analytics on a vector, symptom, drug or outcome, you’ll find Netezza
– When pharmaceuticals want to make smarter contracts, more effective distribution, and penetrate the opinion ‘cloud’ of influential doctors and academicians to get better drugs to you, you’ll find Netezza
We can make your doctor smarter. We can make the next drug better. We can make your insurance cheaper. We can help them cure cancer faster. Netezza can do all this. Netezza can.
In Next Generation Gene Sequencing, Don’t Forget the Data…and the Answers
In the next wave of gene sequencing techniques, the focus is mostly on the inputs. Like this new nanopore approach by a computational physicist from the University of Illinois Urbana-Champaign. By pulsing an electric field on and off around a strand of DNA, they can induce the DNA to expand and relax as it fits through the nanopore…just the behavior needed to read each protein. So much innovation on the front end. What about the outputs?
In a recent press release, one industry guru wants us to spend more time thinking about what to do with the data than how to generate it:
“[The] difficult challenge is accurately estimating what researchers are going to do with the data downstream. Collaborative research efforts, clever data mash-ups and near-constant slicing and dicing of NGS datasets are driving capacity and capability requirements in ways that are difficult to predict,” said Chris Dagdigian, principal consultant at BioTeam, an independent consulting firm that specialises in high performance IT for research. “Users today need to consider a much broader spectrum of requirements when investing in storage solutions.”
Unfortunately, one of today’s myths is that storage solutions are prepared to do the ‘near-constant slicing and dicing’ Mr. Dagdigian mentions. Too often, high performance computers (née supercomputers) are used to sticking a big storage system on the end and dumping data. The problem is that without industry leading tools to get data out of the storage system, the real challenge doesn’t end in the sequencing…it’s just beginning.
Is this a new problem? Some think so. For example, George Magklaras, senior engineer at the University of Oslo says “The distribution and post-processing of large data-sets is also an important issue. Initial raw data and resulting post-processing files need to be accessed (and perhaps replicated), analyzed and annotated by various scientific communities at regional, national and international levels. This is purely a technological problem for which clear answers do not exist, despite the fact that large-scale cyber infrastructures exist in other scientific fields, such as particle physics. However, genome sequence data have slightly different requirements from particle physics data and thus the process of distributing and making sense of large data-sets for Genome Assembly and annotation requires different technological approaches at the data- network and middleware/software layers.”
New problems need new solutions.
In the Introduction of Tom Peters’ new business book The Little Big Things, he mentions, perhaps unwittingly, a trend in what management consultants were focused on by decade. He writes about being passionate on a number of things, including“…scintillating customer service (I pretty much had that “space” all to myself in the mid-1980s—believe it or not—“everybody” was doing quality, I was doing service)…”. This tiny mention speaks to a macro trend that drives an argument that our businesses will have a huge focus on customer analytics in the coming decade. Where businesses have focused on product analytics in the past, tomorrow’s analytics will predominately focus on customer.
Imagine the following assumptions. First, what if management consultants were a leading indicator or even a causal factor into how managers decided to analyze their businesses. (N.B. One can even ignore causality and say ‘It doesn’t matter if widely published and listened to consultants and visionaries talking about a concept drives use of the concept, or whether the concept is right and its wide usage drives consultants to talk about it.’ ) The fact is Peters marked a trend—the topic of the 80’s was quality for the most part. Perhaps the mass of consultants talking quality—a product attribute—drove companies to develop analytics around product…but this implementation took ten years. Thus the 1990s was the decade of quality analytics. Fast forward to the 2000s. The 2000s were the decade of management consultants talking customers. Which means the 2010s will be the decade of customer analytics.
As an aside, if you want to know what analytics will occur in the 2020s for most companies, they’ll be about interactions. Today, innovation has just begun in collecting huge amounts of interaction and social networking data. The bleeding edge ‘interactions companies’ (which happen to mostly be Internet companies as they happen to have easiest access to a paradigm to drive the most interactions per second and have recordable data on the interactions) are just starting to think about how to analyze interactions as a marker of the health of their business and offerings. It follows that once the ‘interactions companies’ work out best practices by the middle of the 2010s and they start to seep into leading edge companies, by the 2020s you’ll see early and late majority companies investing heavily into recording and analyzing interactions. These interactions will be employee-to-employee, management-to-employee, customer-employee, customer-customer, influencer-customer, and on and on. More ought to be considered on this topic. And one should not forget that if you translate Tom Peters’ passion for service as a proxy for interaction, you can see he’s about 20 years ahead of his peers….
Back to customer analytics, n interesting piece of anecdotal evidence showing the rise of focus in customer analytics follows. Performing a search on Amazon.com for books with the subject of ‘analytics’ shows a steady growth of books by decade: 2,400 come up published in the 1980s; 4,000 come up published in the 1990s; and 5,500 come up as published in the 2000s–a steady rise. However, change the search term to ‘customer analytics’ and it brings back 0 results for the 1980s, 1 result for the 1990s, and 109 results for the 2000s. The curve of acceleration for books mentioning or about customer analytics is exponential.
Assuming the focus on customers of the 2000s will lead to customer analytics in the 2010s, what does this mean for the workers, managers and executives focused on using, investing in, or building these analytics? It means
a) There’s a good chance you’ve had some exposure to customer analytics, although it’s likely failed, given that the majority of business intelligence projects fail—unless they make use of a purpose-built data warehouse appliance,
b) There’s a good chance the more dashboards and leading indicators you have been making include customer as a part of a key performance indicator, measure, or attribute,
c) The momentum toward predictive analytics mixed with this emerging focus on customers should be driving a rise in interest in behavioral economics…which it is: meaning you’ll be exposed to more service offerings and interest in predicting customer behavior,
d) Your analytic technology will need to grow and perform with much more data. It’s a rare company that has fewer customers than products and stores. The business intelligence technologies of the past were about analyzing combinations of products and stores. It’s typical that the number of customer greatly surpasses this combination.
There are many other events, trends, and factors that emerge from this focus that will happen around customer analytics. The chief drivers to success will be to embrace it and be excellent at it, which means one’s method of collection, strategy, and most importantly delivery will be paramount.
- Healthcare . In this visualization, GE took 500,000 records from the millions in its electronic medical record database, and calculated the out of pocket and insurer cost of a handful of chronic conditions by age. One of the few both uses of and displays/designs of a radar graph, the visual changes as one increases and decreases age. This also has a predictive analytics component in that it answers a user’s question “What if I develop a case of hypertension, what will that cost me when I’m 65?” Like any good analytic, being able to see the data brings up more actionable and specific questions that this analytics doesn’t answer but the data set could.
- Digital Media. This visualization is part of The New York Talk Exchange, a visualization project developed by the Senseable City Lab at MIT. Perhaps the potential applications of the analytics are as explosive or more than the specific data they used in this case. The analytic shows starting or sourcing neighborhood within NYC and to where their communications via the AT&T network were destined. Users can see the frequency distribution of endpoints, and by comparison who was talking to whom geographically across sister boroughs. Imagine creating a site map of your web site, application, team or workflow, and seeing the frequency of where the user, function, business process or capital goes next. Predictive analytics can say: “If we invest in area x where is that capital, profit opportunity, or waste most likely to go next, and how does that change if I make another investment?” Perhaps only this specific visualization is best for that type of comparative predictive analytic.
- Retail. According to Well Formed Data, Sankey Diagrams and stacked bar charts informed this (4MB .pdf download) time series visualization of how medical journals in related fields merged into a cohesive ‘basket’ of journals in the emergent field of neuroscience. While on the surface not retail related, it points to a very compelling—and as yet a visualization I’ve never seen produced— which would explain how specific products that drive volume and profit affinitize into specific types of market baskets. Replace a) each journal with a specific retail product, which the user can color code at run-time for visualization, perhaps color-coded for on/off promotion, b) the eigenfactor of each journal which is represented by the weight/width of the specific line with the amount of profit or volume the item produces—again the user can choose profit or volume or some other measure at run time, c) the ten or so portfolio of ending blocks of journals as specific types of market baskets, and d) the breadth of starting lines moves from medical disciplines to aisles or departments in a store. In effect what you have here as a decade long time series one has compressed into a single, specific shopping trip. Data can include many trips, a single store, one day or years of data. The predictive angle is to be able to answer questions like “If I promote this item, does it move away from its core ‘7 items per basket, quick trip’ basket into a ‘destination item weekly stock-up’ basket?” One can also look historically how different store consumers shop categories, volume vs. profit items, and more.
These predictive analytic visualizations start with healthcare as the least complex and become more complex. Depending on how frequently the user needs the data updated, the amount of data (in the digital media/telco example clearly tens or hundreds of gigabytes), the processing (in the retail example, calculating eigenvalues in an analytic that could be analyzed hourly, especially for promotional out-of-stocks, for example), and the speed required means a predictive analytic visualization is not something to try at home with an off-the-shelf database platform and hardware.
In the mid 1990’s, hearing about someone with a 1 terabyte data warehouse (DWH) was a sort of mystical, illusory event, engendering doubt or even suspicion as being a ‘fish that got away’ story. The person telling the story was never the one who actually built the DWH, they were just exposed to it in some way, and they threw the story around as if it was nothing, loving the awed look on the faces of their audience. Invariably this would be someone from the Information Technology (IT) field, since the business users would be unlikely to know, care, or be surprised that a very large amount of data is needed to answer their questions. So the IT person would also carelessly throw out a rejoinder such as ‘You know, at that size, you can’t simply [insert technique IT people do every do with a ‘normal’ large DWH].’
Fast forward a decade. Today, terabyte+ warehouses are common. However, one hears the same stories with one small difference: replace the word terabyte with petabyte . A petabyte, at 1000 terabytes, is a seemingly unreachable stretch of data. However, as we all witness the increasing power of processing and decreased cost of storage, we seem to be seeing enough examples of PB+ warehouses to say, “yesterday’s terabyte is today’s petabyte”.
Before you get a petabyte DWH, you need a petabyte of operational data. When a petabyte of data is present to ‘run’ your business, only then can someone say ‘we need to analyze all this data’. Today’s petabyte-operational business is much more likely to be communication or information based. For example, AT&T reported one year ago that “AT&T currently carries about 16 petabytes of total IP and data traffic on an average business day”. (With our log scale growth in storable communication, presumably it’s on its way to doubling…) Other companies with petabyte businesses include Google, all the major telecommunications companies, all the major web businesses—digital media and telecommunications. It’s nice to know the exception that proves the rule is the PB+ data collection at the Large Hadron Collider.
In a recent conference, a member of Facebook revealed the accelerating growth of their DWH. They reported that in March 2008 they were collecting 200 gigabytes (GB) of data per day. In April 2009, they were collecting 2+ TB of data per day, and in October 2009, they were collecting 4TB+ data per day. If you chart this, you see something approaching a classic logarithmic curve. While Facebook reports its DWH is closing in on 5 PB today, by the time a reader is absorbing this sentence, it has likely long surpassed that.
Does this mean in 2020, more than half of the Fortune 100 will have petabytes size data warehouses? Probably not. However, they’ll all have TB+ warehouses, and a herd of businesses will be PB+:
• All large and mid-size digital media, social media, and web businesses
• Large and mid-size telecommunication firms, driven by their Call Detail Record databases
• Financial market-based companies (think of tracking all stock market transactions to the microsecond level of granularity), and more and more bricks and mortar companies (e.g. banks) who have done as little as dipped a toe into financial markets, social media, streaming communication, and the like.
• Large energy companies recording all seismic and atmospheric ‘communications’ to a very specific latitude/longitude
• The energy grid will be getting close. It’ll be likely that cars are talking to the grid to reduce congestion and to enable metered driving in the fast lane, so chances are the cars will be talking to each other spitting out signals every second. Just like that, we’ve added another 100M four-wheeled ‘people’ in our country communicating and someone will want to analyze it.
And, you know, when your car’s antenna is a source for an exabyte data warehouse, you can’t just change the wiper blades, you have to……
In the 1990’s, very soon after the widespread expansion of standalone business intelligence (BI) software implementations, professionals began talking about extending this power to operational applications. Operational applications were front-office, back-office, horizontal, ERP—they created purchase orders and instructed plant machines to start and stop and automated customer communications. All of these software applications that automated and made companies run suddenly became the loci for future instantiations of BI. And the promise was that once BI was integrated into the very business processes operationalized by this software, that the optimization and smarts would happen automatically.
The promise was not immediately realized. At first, some believed it was because the operational software providers didn’t ‘know BI’. However, after a time, these business process software modules included a spectrum of BI modalities. Everything from hard-coded SQL, to high-science algorithms, to OEM’ing top BI platforms such as Business Objects or Cognos. No one could say these tools had not stumbled on the ‘right’ way of integrating BI into their workflow. Even today, we have yet to see the promise of possibility latent in the fact that operational software is helping companies act, but without organized data-based intelligence outside of the pre-coded ‘rules’ of the human operators.
Have we finally gotten it right? Indeed, after all this time, Gartner in 2009 said in discussing their BI Magic Quadrant:
“Areas that have traditionally been under corporate performance management (CPM), such as business planning and forecasting, are increasingly being embedded with BI capabilities. This, together with a trend of embedding analytics into business processes, will drive further investment in BI.”
Gartner went further, with a vision that by 2012:
“…business units will increase spending on packaged analytic applications, including corporate performance management (CPM), online marketing analytics and predictive analytics that optimize processes, not just report on them” (emphasis mine).
If we’re still not there after all this time, and the vision has been obvious more than a decade ago, then there has to be one or more significant barriers to this happening. Clearly the barrier is not simply technical. My hypothesis is that the largest barriers are:
a) Managers and analysts like the idea of a black box giving them an answer, but not the idea that the answer will be used—without their intervention—to do their job for them. Professionals would still rather type in the number of widgets to be ordered or destroyed, the amount to budget for x, y, or z, and the number of spin cycles to run the toothbrush vat. Workers love the idea of BI laying out the answer for them, but it’s as if we still want to copy the right answer onto the test, rather than letting the robot take the test for us. Maybe it’s because it’s us—not the robot—who gets stuck with the grade.
b) Our communication culture at work and a continuing dedication to transparency at low levels of grain means the chance that the professional will have to answer the question ‘Why?’ is very real. One can imagine being in a meeting and being asked: ‘We always order one truckload of tomatoes. This week you ordered two. And we had so many left over. Why did you do that?’ Imagining our answer as ‘I let the computer decide’ doesn’t have a nice ring to it.
What does work in these situations is having a crib sheet at hand—BI output—that influences the worker to make better decisions. And this is what some of the more innovative BI applications have migrated toward. There are beginning to be applications that support business processes. They don’t give you a BI palette and ask you to envision and create a picture as much as they say: you have to paint this room in your house, and here are the questions you will have or ought to have, and their answers. Instead of giving you a servant asking ‘What should I do’, they give you a vacuuming robot, a making-coffee robot, and a start-the-fire robot.
An example of a software application with this approach is QuantiSense (fair warning: Quantisense is a partner of Netezza). QuantiSense invented what they call ‘Playbooks’. Playbooks are just specific work-flows that any retail merchandiser, planner or allocator is or should be doing anyway. The Playbooks define the workflow, the process of the workflow and the points within the workflow that could be optimized if a BI process were inserted at just that point. As a result, the regular analyst can run the ‘plays’ you might see on Monday Night Football rather than something closer to a little kid quarterback saying ‘Go Long…’ as the extent of their vision to the right play. This sitting of a standalone BI app next to the professional and their operational app, with the BI tool coaching them through a workflow—rather than giving them some output—seems to be the only method that will get today’s workers around the barriers that have plagued knowledge workers for a decade.