Data Science – the Emergence of a New Discipline

Data science goes beyond the use of data mining, business analytics and statistical analysis to look for patterns in large data sets.  It is more multidisciplinary in nature.  According to Wikipedia:  “Data science incorporates varying elements and builds on techniques and theories from many fields, including math, statistics, data engineering, pattern recognition and learning, advanced computing, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products.”

 Data Science: An Introduction, a wikibook being developed as a tutorial on the subject, describes data science as “a child born of the mature parental disciplines of scientific methods, data and software engineering, statistics, and visualization, . . . a mash-up of several different disciplines.”
.. Perhaps the most exciting part of data science is that it can be applied to just about any domain of knowledge, given our newfound ability to gather valuable data on almost any topic.  However, doing so effectively requires domain expertise to identify the important problems to solve in a given area, the kinds of answers one should be looking for, and the best way to present whatever insights are discovered in a way that they can be best understood by domain practitioners in their own terms.
.. “Merely using data isn’t really what we mean by data science,” he writes. “[Data scientists] are inherently interdisciplinary. They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions.  They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined problems: ‘here’s a lot of data, what can you make from it?’”
.. According to experts Loukides interviewed for his article, the best data scientists tend to be physicists and other scientists.  “Physicists have a strong mathematical background, computing skills, and come from a discipline in which survival depends on getting the most from the data.
.. It’s too early to tell whether data science will similarly become a distinct discipline, – with its own research agenda and educational programs that will train future generations of data scientists, – or whether over time it will be absorbed by its parent disciplines.
..Analysis is essentially rational decision making and problem solving.  It’s the standard approach underlying management and engineering practice  It involves a relatively linear set of steps and works quite well when you are looking for a solution to a relatively well defined problem.

But where do the problems come from in the first place?  How do you decide what problems to work on and try to solve?  This second kind of innovation, – which they call interpretation – is very different in nature from analysis.  You are not solving a problem but looking for a new insight about customers and the marketplace, a new idea for a product or a service, a new approach to producing and delivering them, a new business model.  Their research showed that interpretive innovation generally takes place through a process of conversations among people and organizations with different backgrounds and perspectives, until the problems can be identified and clarified to the point where a solution can be developed.

Challenges and Opportunities Confront the Data-Driven Business

Most companies capture a small fraction of their data’s value

It’s often been said that truly transformative innovations are overhyped in the short term but under-hyped in the long term. Think of electricity and automobiles, the internet more recently and now big data.

When first developed in the late 19th century, electricity was mostly used to replace kerosene lamps and candles with light bulbs. It took several decades for electric appliances, the assembly line and mass production to emerge and help create whole new industries. Similarly, the full impact of automobiles was not felt until the mid-20th century with the rise of suburbs, the Interstate Highway system, and the motels, restaurants and gas stations that sprung up all around them.

 .. In 2000, only one-quarter of the world’s stored information was digital and thus subject to search and analysis. Since then, the amount of digital data has been doubling roughly every three years. By now only a small amount of all stored information isn’t digital, around 1% or so. This could not have possibly happened without the digital revolution
..

  • Micro-segmenting a population based on individuals’ characteristics as revealed by data and analytics;