A while back I wrote on this blog a “cry for help” about some different forms of linear regression… which given the fact that it was a kind of deep topic in statistics and most of my friends and colleagues are not uber statistics nerds, I didn’t really get any replies… But I have persevered, and continued to dive in on my own, because as Khan Academy puts it, struggling with ideas improves the brain like lifting weights.
So after also looking at some real data that my initial data mining attempts found (using World Factbook data), I have realized that in order to avoid Type I errors (false positives) due to outliers, I need to use a form of “robust regression”. This is because while standard “least squares” regression neatly has a formula that will “perfectly” fit the data (which Khan Academy also explains well), any outlier will be squared, and have a far greater impact on the results than it should. And Least Absolute Deviation (LAD), which is more intuitive also has issues with outliers, just not as much.
But, another method is commonly used called “Least Median of Squares” (LMS), which attempts to square all the residuals (the distances between the estimated line of best fit, and each of the real data points), and then find the median of these (the very middle data point). But I have been wondering if the Least Median of Squares was invented to stick with the same paradigm as traditional regression, but maybe isn’t the most efficient method.
Instead, maybe it is better to find the Least Median of Absolute Deviation (LMAD), because using the Median Absolute Deviation (MAD) is more common than finding a Median of Squares for univariate data. And of more importance, I think it might require less computational power, given that I suspect taking the absolute value of a number usually just requires changing one bit in variable, while squaring requires more operations. So if the results from LMAD are not provably worse than the results from LMS, then it would seem that LMAD is likely a better way to go.