Recently there was a post on LinkedIn by Erle Hall, lead for the Information and Communication Technologies (ICT) for the California Department of Education (CDE) with a diagram about machine learning. That diagram had 6 steps: Select Data, Model Data, Validate Model, Test Model, Use the Model, and Tune Model. Those 6 steps mostly encapsulate what traditionally has been called the “data mining” phase. But there are 3 other important phases, which I will call “data surfing”, “data wrangling” and “data artistry”. (These names were chosen to be easier to understand and more interesting for students, but also go by different names) I also personally prefer to use the term “algorithm” instead of “model”, because while traditionally in data science, statistical models were used, there are now often times methods like neural networks and other such algorithms that are less like a traditional statistical model. In the next few posts, I’ll dive into each of these 4 steps, and give a basic explanation of what each step does, and why the step is important.
Many texts about data science (including machine learning, data mining, and predictive analytics) don’t include much about the very first step of the process, which is the step where you come up with what your goal is for your other steps. In traditional science, this might be called the step of making your hypothesis.
Before data science/machine learning/data mining/predictive analytics can be done, you need to have the data you are going to use. This may see obvious, but in many cases there is more to this step than may first be assumed, and the whole process is what I will call “data wrangling”, although has other names like “data munging”.
After the data has been gathered and in a form that can be used, it can then have an appropriate algorithm used to accomplish the data mining/machine learning/predictive analytics. This is the stage that traditionally has been called “data mining” because it is the part that gets additional value from the data in the form of some type of knowledge (this is why early on, the process was sometimes called “knowledge discovery in data” (KDD).
The final stage of doing data science/machine learning/data mining/predictive analytics is to use the results, which generally involves some form of communication to one or more types of audiences. This, I will term “data artistry”. (This is not necessarily a common term used, but it does have some precedence in specific contexts)
My first review of various SISes is that of Aspen by Follett. When Highlands Community Charter School recently was looking to switch to a new SIS, Aspen was in our top 3 choices, and only barely lost out to PowerSchool. During our review process, I had the chance to look at a sandbox system (demo) of their product for about a week, and we asked a lot of questions to their sales rep, Dylan Holcomb. As a matter of disclosure, I should note that Dylan was a friend from high school, but I think this review is fairly objective, as there are clearly things I don’t like about the product, along with many things I really like. I have written about Aspen previously also.
Yesterday, I saw a demo of the Aspen SIS from Follett. For full disclosure, Dylan Holcomb, the Sales Consultant who came out, was a friend of mine from high school, but honestly I wasn’t expecting it to be an SIS that we would be interested in, especially because the price tag is high for the size of school that Highlands Community Charter currently is. But, after seeing how Aspen works, and how they addressed my blog article about the 3 features that SIS providers are missing, it is on our school’s radar as a potential. Here is a quick review of what I was impressed with, and what things I still think they could do better.
There is a paradox: Humanity’s most developed organizations and systems are based upon what is learned in our education systems; yet, the field of education lags behind nearly all others. One such area I have seen, is how feature-poor Student Information Systems (SIS) are. Despite such systems being case studies in many database books, most of these systems do not use any data science methods to improve operations. Specifically, I have usually not seen active security, predictive analytics, nor even resource optimization as features. Here is why these are important to have, and my invitation for SIS providers to come into the 21st century.