O’Reilly publishes nine books on data science and one of them is titled “What is Data Science?” When you open any of these books you should ask yourself what you are getting into. As a term, data science has come to mean several things.
First, data science has come to mean a body of knowledge, a collection of useful information related to a specific task. In this way, data science is like library science or managerial science. Library science collects the best ways to run a library, and managerial science collects the best ways to run a business. Data science collects the best ways to store, retrieve, and manage data. As a result, a data scientist might know how to set up a hadoop cluster or run the latest type of non-relational database. This is not the type of data science that you will find here.
Second, data science has come to describe a way of doing science. Data scientists use data, models, and visualizations to make scientific discoveries, just as other scientists use experiments. In fact, you can think of data science as a method of science that complements experimental science. Experimental scientists use the experimental method to solve scientific problems, and data scientists use the data science method. Many scientists use both.
This is the type of data science you will learn in this book. You will learn how to use data to make scientific discoveries, and how to justify those discoveries once they are made. Along the way, you will learn how to visualize data, build models, and make predictions.