Posts by Tyler Schlosser

March 2, 2018 by

Machine Learning: Finding the signal or fitting the noise?

Before machine learning came along, a typical approach to building a predictive model was to develop a model that best fit the data. But will a model that best fits your data provide a good prediction? Not necessarily. Fortunately, there are machine learning practices that can help us estimate and optimize the predictive performance of models. But before we delve into that, let’s illustrate the potential problem of “overfitting” your data. Fitting the Trend vs. Overfitting the Data For a given dataset, we could fit a simple model to the data (e.g., linear regression) and likely have a decent chance of representing the overall trend. We could alternatively apply a very complex model to the data (e.g. a high-degree polynomial) and likely “overfit” the data – rather than representing the trend, we’ll fit the noise. If we apply the polynomial model to new data, we can expect it to make poor predictions given it’s not really modeling the general trend. The example above illustrates the difference between modelling the trend (the red straight line) and overfitting the data (the blue line). The red line has a better chance of predicting values outside of the dataset presented. Due to the powerful...

Read More