Gurjeet Singh is the co-founder and CEO of data analytics company Ayasdi. He earned his Ph.D. from Stanford in computational mathematics, and prior to founding Ayasdi, he worked at Texas Instruments and Google. This Op-Ed is part of a series provided by the World Economic Forum Technology Pioneers, class of 2015. Singh contributed this article to Live Science's Expert Voices: Op-Ed & Insights.
We live in an extraordinary time. The capacity to generate and store data has reached dizzying proportions. What lies within that data represents the chance for this generation to solve its most pressing problems — from disease and climate change to healthcare and customer understanding. The magnitude of the opportunity is defined by the magnitude of the data created — and it is astonishing.
The world's Internet population grew by more than 750 percent in the past 15 years to more than 3 billion and will pass the 50 percent penetration mark in the near future. This population shares more than 2.5 million pieces of content on Facebook, tweets more than 300,000 times and sends more than 204 million text messages — every minute.
Furthermore, the acceleration in data growth will increase dramatically in the coming years as the Internet of Things takes hold, connecting 20 to 30 billion "things" by 2020. These devices will transmit data on everything from the status of your baby's diaper, to the head trauma experienced by NFL players, to the health of your cattle herd. [Money Drives Weather Data, But What About Climate? (Op-Ed )]
Underpinning this explosion are extraordinary advances in data storage technology and architecture. Quality-adjusted prices for data-storage equipment fell at an average annual rate of nearly 30 percent from 2002 to 2014. With an incremental cost to store data effectively at zero, institutions have responded by capturing everything possible, accepting the premise that what lies within will produce meaningful value for the enterprise.
Seeing beyond the numbers
Despite the technical advances in collection and storage, knowledge generation lags. This is a function of how organizations approach their data, how they conduct analyses and how they automate learning through machine intelligence.
At its heart, it is a mathematical problem. For any data set, the total number of possible hypotheses/queries is an exponential one, relative to the size of the data. Exponential functions are difficult enough for humans to comprehend; however, to further complicate matters, the size of the data itself is growing exponentially, and is about to hit another inflection point as the Internet of Things kicks in.
What that means is that we are facing double exponential growth in the number of questions that we can ask of our data. If we choose the same approaches that have served us over time — iteratively asking questions of the data until we get the right answer — we will have lost out on an opportunity to grasp our generational opportunity. [Your Life, and Your Future, Predicted by Data ]
There are not, and will not ever be, enough data scientists in the world to be successful in that approach, nor can researchers arm enough citizen data-scientists with new software to meet that need. Software that makes question asking or hypothesis development more accessible or more efficient fails to address a critical concern: They will only fall further behind as new data becomes available every millisecond.
Teasing out the shape of data
For society to truly unlock the value that lies within our data, we need to turn our attention to the data, setting aside the questions for later.
This too, turns out to be a mathematical problem. Data, it turns out, has shape. That shape has meaning. The shape of data tells you everything you need to know about your data, from its obvious features to its best-kept secrets:
- Regression produces lines
- Customer segmentation produces groups
- Economic growth and interest rates have a cyclical nature (diseases like malaria have this shape, too)
By knowing the shape and where an analysis is within that shape, we vastly improve our understanding of where we are, where we have been — and perhaps more importantly — what might happen next. In understanding the shape of data, we understand every feature of the data set, immediately grasping what is important, thus dramatically reducing the number of questions to ask and accelerating the discovery process.
By changing our thinking — and starting with the shape of the data, not a series of questions (which often come with significant biases) — we can extract knowledge from these rapidly growing, massive and complex data sets.
The knowledge that lies hidden within electronic medical records, billing records and clinical records is enough to transform how we deliver healthcare and how we treat diseases.
The knowledge that lies within the massive data stores of governments, universities and other institutions will illuminate the conversation on climate change and point the way to answers on what we need to do to protect the planet for future generations.
The knowledge that is obscured by Web, transaction, CRM, social and other data will inform a clearer, more meaningful picture of the customer and will, in turn define the optimal way to interact.
This is the opportunity for our generation to turn data into knowledge. To get there will require a different approach, but one with the ability to impact the entirety of humankind.
Read more from the Technology Pioneers on their Live Science landing page. Follow all of the Expert Voices issues and debates — and become part of the discussion — on Facebook, Twitter and Google+. The views expressed are those of the author and do not necessarily reflect the views of the publisher. This version of the article was originally published on Live Science.
Sign up for the Live Science daily newsletter now
Get the world’s most fascinating discoveries delivered straight to your inbox.