Many texts about data science (including machine learning, data mining, and predictive analytics) don’t include much about the very first step of the process, which is the step where you come up with what your goal is for your other steps. In traditional science, this might be called the step of making your hypothesis.
This step is often not talked about much because it is the least formalized stage of the process. It is a step that I call “data surfing”. (Although this term is not extremely common, it has some past precedence). Data surfing is the step where the data scientist has learned in their past about something where they have gained “domain expertise”. This process is often not entirely planned, although formal education may be part of it; just as likely it has included a lot of “surfing around” where one finds knowledge that is interesting to them.
It is also a very important stage for teaching about data science. It is the stage that helps foster curiosity in students, which is critical to all science and scientific thinking.
After a data scientist has spent enough time just surfing and learning about that which they are interested in, they will ultimately need to move on to having a more clearly defined goal of what they want to have the data science process accomplish. And to accomplish this, there will next be a need for some data wranglin’…