HN Academy

The best online courses of Hacker News.

Hacker News Comments on
How to Win a Data Science Competition: Learn from Top Kagglers

Coursera · National Research University Higher School of Economics · 4 HN points · 4 HN comments

HN Academy has aggregated all Hacker News stories and comments that mention Coursera's "How to Win a Data Science Competition: Learn from Top Kagglers" from National Research University Higher School of Economics.
Course Description

If you want to break into competitive data science, then this course is for you! Participating in predictive modelling competitions can help you gain practical experience, improve and harness your data modelling skills in various domains such as credit, insurance, marketing, natural language processing, sales’ forecasting and computer vision to name a few. At the same time you get to do it in a competitive context against thousands of participants where each one tries to build the most predictive algorithm. Pushing each other to the limit can result in better performance and smaller prediction errors. Being able to achieve high ranks consistently can help you accelerate your career in data science.

In this course, you will learn to analyse and solve competitively such predictive modelling tasks.

When you finish this class, you will:

- Understand how to solve predictive modelling competitions efficiently and learn which of the skills obtained can be applicable to real-world tasks.

- Learn how to preprocess the data and generate new features from various sources such as text and images.

- Be taught advanced feature engineering techniques like generating mean-encodings, using aggregated statistical measures or finding nearest neighbors as a means to improve your predictions.

- Be able to form reliable cross validation methodologies that help you benchmark your solutions and avoid overfitting or underfitting when tested with unobserved (test) data.

- Gain experience of analysing and interpreting the data. You will become aware of inconsistencies, high noise levels, errors and other data-related issues such as leakages and you will learn how to overcome them.

- Acquire knowledge of different algorithms and learn how to efficiently tune their hyperparameters and achieve top performance.

- Master the art of combining different machine learning models and learn how to ensemble.

- Get exposed to past (winning) solutions and codes and learn how to read them.

Disclaimer : This is not a machine learning course in the general sense. This course will teach you how to get high-rank solutions against thousands of competitors with focus on practical usage of machine learning methods rather than the theoretical underpinnings behind them.

Prerequisites:

- Python: work with DataFrames in pandas, plot figures in matplotlib, import and train models from scikit-learn, XGBoost, LightGBM.

- Machine Learning: basic understanding of linear models, K-NN, random forest, gradient boosting and neural networks.

Do you have technical problems? Write to us: [email protected]

HN Academy Rankings
Provider Info
This course is offered by National Research University Higher School of Economics on the Coursera platform.
HN Academy may receive a referral commission when you make purchases on sites after clicking through links on this page. Most courses are available for free with the option to purchase a completion certificate.
See also: all Reddit discussions that mention this course at reddsera.com.

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this url.
I quite liked "How to win a data science competition" (https://www.coursera.org/learn/competitive-data-science) where I learned a lot about validation strategies and machine learning on tabular data. The course has its own Kaggle competition.

I also really liked "Discrete optimization" (https://www.coursera.org/learn/discrete-optimization). At the time that I took it it also had a competitive element where you would solve optimization problems and there was a leader board comparing all the students in the current batch. That was when courses still started in batches and were free so the experience would probably no longer be the same, unfortunately.

light_hue_1
> I quite liked "How to win a data science competition

As a machine learning researcher I am on the one hand glad that folks are learning more about the topic. On the other hand, this is totally the wrong approach and it will teach you the wrong lessons.

The idea that you can just treat data as a uniform dump of tables and that grinding your way to high numbers is somehow worthwhile is simply terrible. The resulting systems won't work well in the real world and they produce horrific explanations of what is going on. This class teaches you not just the wrong tools, like boosting, it teaches you the wrong mental model.

I really can't think of a worse introduction to ML than this class. Even not knowing anything would actually be better.

jackallis
ok - what is the alternative?
samvher
Interesting. I definitely would not recommend this course as an only course in machine learning or indeed as an introduction, and I see where you’re coming from with the wrong mental models. I can’t be sure that I do have the right ones but I have taken a number of other courses as well and my sense is they’re ok.

My main takeaway from the course was definitely not that just grinding away for higher numbers is the right thing to do (but it might be a necessary evil in a competition context). The key thing I learned here was much more about paying very close attention that your validation strategy and your testing strategy are compatible because there are many ways you can mess it up, making your models valid in-sample only. Most of the other things I had done before were also more around SVMs and neural networks and getting some experience with decision tree based algorithms was interesting.

Two courses taught by faculty at the Russian institute HSE (Higher School of Economics).

1. How to Win a Data Science Competition https://www.coursera.org/learn/competitive-data-science

2. Bayesian Methods for Machine Learning https://www.coursera.org/learn/bayesian-methods-in-machine-l...

Kaggle is more than enough to get started. I would hire anyone who's Master there. Probably not even need for Master, just enough knowledge to explain why that thing work and that would not.

See this course to get into Kaggle: https://www.coursera.org/learn/competitive-data-science

rasikjain
Thank you for the inputs and course reference
Mostly Kaggle -- reading others solutions and notebooks and integrating them into mine code.

Also there's a great Coursera course on ML for Kaggle: https://www.coursera.org/learn/competitive-data-science

I think once you finish it, you're better than 60% of silicon valley data scientists, no kidding.

Oct 26, 2017 · 3 points, 0 comments · submitted by sonabinu
HN Academy is an independent project and is not operated by Y Combinator, Coursera, edX, or any of the universities and other institutions providing courses.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.