HN Books @HNBooksMonth

The best books of Hacker News.

Hacker News Comments on
Learning From Data

Yaser S. Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin · 7 HN comments
HN Books has aggregated all Hacker News stories and comments that mention "Learning From Data" by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin.
View on Amazon [↗]
HN Books may receive an affiliate commission when you make purchases on sites after clicking through links on this page.
Amazon Summary
This book, together with specially prepared online material freely accessible to our readers, provides a complete introduction to Machine Learning, the technology that enables computational systems to adaptively improve their performance with experience accumulated from the observed data. Such techniques are widely applied in engineering, science, finance, and commerce. This book is designed for a short course on machine learning. It is a short course, not a hurried course. From over a decade of teaching this material, we have distilled what we believe to be the core topics that every student of the subject should know. In addition, our readers are given free access to online e-Chapters that we update with the current trends in Machine Learning, such as deep learning and support vector machines. We chose the title `learning from data' that faithfully describes what the subject is about, and made it a point to cover the topics in a story-like fashion. Our hope is that the reader can learn all the fundamentals of the subject by reading the book cover to cover. Learning from data has distinct theoretical and practical tracks. In this book, we balance the theoretical and the practical, the mathematical and the heuristic. Theory that establishes the conceptual framework for learning is included, and so are heuristics that impact the performance of real learning systems. What we have emphasized are the necessary fundamentals that give any student of learning from data a solid foundation. The authors are professors at California Institute of Technology (Caltech), Rensselaer Polytechnic Institute (RPI), and National Taiwan University (NTU), where this book is the text for their popular courses on machine learning. The authors also consult extensively with financial and commercial companies on machine learning applications, and have led winning teams in machine learning competitions.
HN Books Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this book.
I really enjoyed the books "Learning From Data"[1], and "Programming Collective Intelligence"[2].

Both are accessible to beginners.

Learning From Data gives a more theoretical introduction to machine learning. One of the central ideas from the book that I still think about often is that machine learning is merely function approximation. There exists a function which will drive a car perfectly, but we don't know what that function is, so we try to approximate that function with machine learning.

Programming Collective Intelligence is a more hands-on introduction to machine learning. The book has examples in Python, but I believe the Python code is low quality. Ignoring the example code (and I did ignore it), the book is a very enjoyable introduction to many different machine learning algorithms. If you don't know the difference between linear regression, nearest-neighbors clustering, support vector machines, and a neural networks, this book will explain how each of these work and give a good intuition about when to use each.

[1] http://www.amazon.com/gp/product/1600490069 [2] http://www.amazon.com/Programming-Collective-Intelligence-Bu...

Prof Yaser S. Abu-Mostafa's Caltech course "Learning from Data" (http://work.caltech.edu/telecourse) is probably the best introductory course for really understanding the physics of how machine learning works.

See Prof's Yaser's 1 min overview: http://www.youtube.com/watch?v=KlP0DpiM7Lw

The "Learning from Data Book" videos are online for free, and the book is on Amazon...

Videos: http://home.caltech.edu/lectures.html

Book: http://www.amazon.com/Learning-From-Data-Yaser-Abu-Mostafa/d...

The course is also availble on EdX: https://www.edx.org/course/caltechx/cs1156x/learning-data/11...

eli_gottlieb
I've got the book. It's a great book, even though the Machine Learning course here at Technion is more Bayesian than AML's seemingly PAC and VC-focused book.
For a glimpse into machine learning, check out Professor Yaser Abu-Mostafa's "Learning From Data" course from Caltech. The videos are online for free (http://work.caltech.edu/telecourse.html, https://www.edx.org/course/caltechx/cs1156x/learning-data/11...), and its corresponding book is on Amazon (http://www.amazon.com/Learning-From-Data-Yaser-Abu-Mostafa/d...).

Also Professor Ng's course from Stanford (http://cs.stanford.edu/people/ang/?page_id=22).

Don't worry about needing to catch up. Stuff is moving so fast these days, you're always working with something new. Everyone is in a continual update mode so it's not like you have 10 years of catching up to do. Tech has turned over a 10 times since then. You could say 10 years and 2 years are functionally equivalent from a new tech point of view.

And don't worry about corps and recruiters. Focus on a problem you want to solve, and update your skills in the context of learning what you need to know to solve that problem. If you can leverage your industry experience in the problem domain, even better.

Data is driving everything so developing a data analysis/machine learning skillset will put you into any industry you want. Professor Yaser Abu-Mostafa's "Learning From Data" is a gem of a course that helps you see the physics underpinning the learning (metaphorically of course -- ML is mostly vectors, matrices, linear algebra and such). The course videos are online for free (http://work.caltech.edu/telecourse.html), and you can get the corresponding book on Amazon -- it's short (http://www.amazon.com/Learning-From-Data-Yaser-Abu-Mostafa/d...).

Python is a good general purpose language for getting back in the groove. It's used for everything, from server-side scripting to Web dev to machine learning, and everywhere in between. "Coding the Matrix" (https://www.coursera.org/course/matrix, http://codingthematrix.com/) is an online course by Prof Philip Klein that teaches you linear algebra in Python so it pairs well with "Learning from Data".

Clojure (http://clojure.org/) and Go (http://golang.org/) are two emerging languages. Both are elegantly designed with good concurrency models (concurrency is becoming increasingly important in the multicore world). Rich Hickey is the author Clojure -- watch his talks to understand the philosophy behind the design (http://www.infoq.com/author/Rich-Hickey). "Simple Made Easy" (http://www.infoq.com/presentations/Simple-Made-Easy) is one of those talks everyone should see. It will change the way you think.

Knowing your way around a cloud platform is essential these days. Amazon Web Services (AWS) has ruled the space for some time, but last year Google opened its gates (https://cloud.google.com/). Its high-performance cloud platform is based on Google search, and learning how to rev its engines will be a valuable thing. Relative few have had time to explore its depths so it's a platform you could jump from.

Hadoop MapReduce (https://hadoop.apache.org/, http://www.cloudera.com, http://hortonworks.com/) has been the dominant data processing framework the last few years, and Hadoop has become almost synonymous with the term "Big Data". Hadoop is like the Big Data operating system, and true to its name, Hadoop is big and bulky and slow. However, there is a new framework on the scene that's true to its name. Spark (http://spark.incubator.apache.org/) is small and nimble and fast. Spark is part of the Berkeley Data Analytics Stack (BDAS - https://amplab.cs.berkeley.edu/software/), and it will likely emerge as Hadoop's successor (see last week's thread -- https://news.ycombinator.com/item?id=6466222).

ElasticSearch (http://www.elasticsearch.org/) is a good to know. Paired with Kibana (http://www.elasticsearch.org/overview/kibana/) and LogStash (http://www.elasticsearch.org/overview/logstash/), it's morphed into a multipurpose analytics platform you can use in 100 different ways.

Databases abound. There's a bazillion new databases and new ones keep popping up for increasingly specialized use cases. Cassandra (https://cassandra.apache.org), Datomic (http://www.cognitect.com/), and Titan (http://thinkaurelius.github.io/titan/) to name a few (http://nosql-database.org/). Redis (http://redis.io/) is a Swiss Army knife you can apply anywhere, and it's simple to use -- you'll want it on your belt.

If you're doing Web work and front-end stuff, JavaScript is a must. AngularJS (http://angularjs.org/) and ClojureScript (https://github.com/clojure/clojurescript) are two of the most interersting developments.

Oh, and you'll need to know Git (http://git-scm.com, https://github.com). See Linus' talk at Google to get the gist (https://www.youtube.com/watch?v=4XpnKHJAok8 :-).

As you can see, the opportunities for learning emerging tech are overflowing, and what's cool is the ways you can apply it are boundless. Make something. Be creative. Follow your interests wherever they lead because you'll have no trouble catching the next wave from any path you choose.

jnardiello
Thanks for this. Quite incredibly valuable comment. This is why i love HN.
christiangenco
I'm a web developer that considers myself "up-to-date" but there was quite a bit in there that I need to read up on (notably Hadoop and ElasticSearch). Thanks for the links!

I'd also recommend, as some alternatives:

* Ruby as an alternative "general purpose language"

* Mongo as an alternative swiss army database

* Backbone + Marionette as an alternative front-end JS framework

* CoffeeScript as a better Javascript syntax

To tag onto this, I found "Learning from Data" [0] by Abu Mastafa to be a great intro to the field. It's not heavy on the math, but it doesn't gloss over it either

[0]:http://www.amazon.com/Learning-From-Data-Yaser-Abu-Mostafa/d...

veven
Unfortunately Amazon won't ship this book outside of the United States.
hypertext
Actually, according to the authors' website (http://amlbook.com/), Amazon does ship the Learning from Data book to many different countries outside the US.
winter_blue
You should be able to get (illegal) PDFs of most popular books with a simple Google search. I found a PDF of the ML book I mentioned earlier as the top result on Google for "<name of book> pdf".

Admittedly epub is a better format, because it naturally reflows on smaller screens, but "free" epubs are harder to come across. I've been thinking of converting some really good PDFs that I have, to ePub myself, but just haven't gotten around to it yet.

scottedwards
Have to agree. And it's very inexpensive because Yaser refused to give-in to academic publishers, who would've charged the typical $70-80, and self-published so he could offer it for less than half the cost.

Not only is the book great, but his lectures are PHENOMENAL. He breaks concepts down in such a careful, accessible way. Its a bit late to join the online course, but you can see all the lectures on YouTube (work.caltech.edu/telecourse.html) or iTunesU (I prefer the latter, using the app on iOS - awesome b/c you can bookmark and record notes at those marks - otherwise I notice these video types of courses are way less useful - no way to review - wish Coursera/Udacity/EdX had that feature.)

Yaser is an awesome guy btw - he's very active on the forum (see the link from the above caltech site - on right hand side). He is very gracious with his time - I'm not a CalTech student, and yet he has answered all my questions and even helped me find a tutor for the course that was a previous student at CalTech (I live in Pasadena). He truly cares - and that comes off in the lectures as well. Enjoy!

antman
I take notes on all videos with http://videonot.es
manish_gill
Agreed with everything you said. Only thing that's missing from his lectures are the homework assignments, which are only available to those who signed up for the online course (signups are closed now), and I can't even make a post about it on the forums, because I don't have the book. :(
The course teacher's book costs $828 + $4 shipping:

http://www.amazon.com/gp/offer-listing/1600490069/

Is this an error?

andymatuschak
His book is not yet available. That seller is likely not legitimate.
charliel
From http://amlbook.com, which is hidden within the page http://www.amazon.com/gp/product/1600490069, the book will become available on Mar26 on amazon for $28, not $828.
HN Books is an independent project and is not operated by Y Combinator or Amazon.com.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.