Sunday, November 15, 2015

Week 4

We are half way through with the structured part of the program and the curriculum is getting more practical each week.  Still lots of theory, but lectures include more practical guidance on tuning.  Meanwhile, our daily exercises are hands-on working on the modeling tools we will be using as practicing data scientists.


Key topics were Random Forest, Boosting, SVM and Profit Curves.  Although one can learn these topics from the sklearn docs, we are being taught how to apply each algorithm effectively.  

I started working on a old Kaggle competition, Burn CPU Burn (https://inclass.kaggle.com/c/model-t4) to get start applying what I have learned and seeing how well I can do against the competition.  The data is very wide so applying my new EDA skills was critical in narrowing the feature set to something more manageable.

An initial model with  Ridge regression got me to 15th place.  Random Forest improved the model to 8th place.  I will keep working on this as I learn to use new algorithms and tuning techniques, and will likely return to feature engineering.

Time to start thinking about the capstone project...

No comments:

Post a Comment