Hacker News Comments on
Programming Collective Intelligence: Building Smart Web 2.0 Applications
Hacker News Stories and CommentsAll the comments and stories posted to Hacker News that reference this book.
Some resources to get you started - not including any coursera or udacity courses since others have already mentioned it.
Mathematical Monk - https://www.youtube.com/user/mathematicalmonk#p/c/0/ydlkjtov... (includes a probability primer)
Awesome Courses - https://github.com/prakhar1989/awesome-courses - its a very extensive list of university courses including subjects apart from Machine Learning as well
Programming Collective Intelligence - http://www.amazon.com/programming-collective-intelligence-bu... - heard very good reviews about this
Many other resources available apart from the above. You can access more such resources at http://www.tutorack.com/search?subject=machine%20learning
I think its a good idea to go through one or more beginner level courses like that offered by Andrew Ng on Coursera and then do an actual project.
[Disclaimer - I work at tutorack.com mentioned in the comment]
http://www.amazon.com/Programming-Collective-Intelligence-Bu... has a chapter on recommendation systems
I really enjoyed the books "Learning From Data", and "Programming Collective Intelligence".
Both are accessible to beginners.
Learning From Data gives a more theoretical introduction to machine learning. One of the central ideas from the book that I still think about often is that machine learning is merely function approximation. There exists a function which will drive a car perfectly, but we don't know what that function is, so we try to approximate that function with machine learning.
Programming Collective Intelligence is a more hands-on introduction to machine learning. The book has examples in Python, but I believe the Python code is low quality. Ignoring the example code (and I did ignore it), the book is a very enjoyable introduction to many different machine learning algorithms. If you don't know the difference between linear regression, nearest-neighbors clustering, support vector machines, and a neural networks, this book will explain how each of these work and give a good intuition about when to use each.
I liked "Programming Collective Intelligence" http://www.amazon.com/Programming-Collective-Intelligence-Bu..., but it might be a little dated (in not using the latest libraries). It's a good way to learn some simple algorithms (optimization, clustering).
Also, rather than learning ML in 2 months (which is a very unfocussed and unattainable goal) -- try to narrow it down to some problem domain. You'd get better recommendations if you are more specific.
Programming Collective Intelligence: Building Smart Web 2.0 Applications by Toby Segaran is a bit old now but is excellent, 4.5 stars on Amazon from 100+ reviews. A bit of overlap with this one, but there are some great explanations.
I got a lot of mileage out of this book: http://www.amazon.com/Programming-Collective-Intelligence-Bu... All his examples are in Python, so if you already know Python it should work well for you. The only minor thing is that there are a few typos here and there in the code, but usually you can just use your common sense and figure out what the author intended.
Consider this book by Toby Segaran:
Code Complete 2  was one of the first coding books I've read. As with anything else, it's good to look around (HN is a good place) for people who have problems with the book. I think I learn as much reading the commentary people make about books like that as I do from the book itself.
I think I've listened to every podcast on software engineering radio a few times . The older ones are especially nice because they usually pick a specific topic and cover the high points. I liked that I could listen to it while I was driving, or otherwise not in front of a computer.
I've also got Introduction to Algorithms, which I use as a reference, sometimes. I switched over to The Algorithm Design Manual  after I saw it referenced in an older Steve Yegge post . I read through the intro and it seemed like a book that would be more appropriate from an autodidactic standpoint. I really have no idea if that's going to pan out, since I'm not that far into it, but we'll see, for sure. Doesn't kill me to have an extra algorithms book laying about, though, and I've always got intro to algorithms for cross reference. I've found that I really need to have as many sources available as possible when I'm learning alone. Usually I don't get something until the fifth person describes it from the tenth different angle.
That's most of what I can think of off hand. I really enjoyed The Joy of Clojure , though haven't checked out the newer version. Programming Collective Intelligence  is a fun book, and is what made me want to go back down the maths route to get more into machine learning.
And of course habitually reading hacker news for an hour or three every night :)
So that's my totally inexpert list of random stuff that I enjoy
I found this to be quite a good introduction.
⬐ gautamnarulaI also have this book and highly recommend it.
I'm in a very similar position.
If you really like Codeacademy, there are non-track exercises that involve Python in the API section  and a couple of Python challenges  that aren't listed.
What I'm doing now:
* Solving exercises on Project Euler in Python. 
* Working through each example in the Python Cookbook. It was just updated to the third edition.
* Watched Guido's Painless Python talks from a few years ago . I found his concise explanations of language features really helpful.
Some things I intend to do:
* Finish working through Collective Intelligence . The examples are written in Python.
* Work through Introduction to Algorithms . The course uses Python.
* Read, understand and give a shot at extending Openstack  code.
⬐ westurnerYou can search announced, in progress, future, self-paced, and finished MOOCs (Massive Open Online Courses) with class-central.com : http://www.class-central.com/search?q=python⬐ kilkurduAren't Project Euler's exercises seem more likely maths exercises? It's kinda difficult for those who graduated from social sciences and tries to learn programming from scratch.⬐ westurnerThe Green Tea Press books are great; and free.
Think Python: How To Think Like a Computer Scientist http://www.greenteapress.com/thinkpython/thinkpython.html
Think Complexity: Exploring Complexity Science with Python : http://www.greenteapress.com/compmod/
Think Stats: Probability and Statistics for Programmers : http://www.greenteapress.com/thinkstats/index.html⬐ brandoncapecciYep. Project Euler is a waste of time if you're trying to get up to speed in learning programming.⬐ incision>Aren't Project Euler's exercises seem more likely maths exercises?
Project Euler does involve a math, but so does efficient programming.
Efficiency can seem a pretty abstract thing and it might not crop up right away in more typical programming tasks. Working a Euler problem and refining to a solution that runs in 1% or 0.001% of the time required for the most straightforward solution is a great demonstrator.
>It's kinda difficult for those who graduated from social sciences and tries to learn programming from scratch.
Sure, but the context of the question here isn't really from scatch. The OP has already completed at least the 296 exercises in the Python track at Codeacademy to establish a base.
Personally, I haven't graduated from anything and I treat the Euler exercises as an interesting way to practice/learn a bit of programming and math.
Signal and the noise by Nate Silver(Very nice read. Chapters on climate change and GDP forecasting were a bit slow, but everything else was a page turner) http://www.amazon.com/dp/159420411X
Why I left Goldman Sachs by Greg Smith (Good insight into the 2008 financial breakdown and a look into the day to day operations of Goldman Sachs) http://www.amazon.com/Why-Left-Goldman-Sachs-Street/dp/14555...
Data Mining: Concepts and Techniques(Great intro into data mining) http://www.amazon.com/Data-Mining-Concepts-Techniques-Manage...
Programming Collective Intelligence(You can play around with actual implementations of the concepts in the previous book) http://www.amazon.com/Programming-Collective-Intelligence-Bu...
Ghost in the Wires by Kevin Mitnick (Was really nice to see the details behind Mitnick's adventures) http://www.amazon.com/Ghost-Wires-Adventures-Worlds-Wanted/d...
On War By Clausewitz(Really enjoyed this book.)http://www.amazon.com/War-Carl-von-Clausewitz/dp/1448676290
Ruby's great for recommendation engines and JRuby has all the Java libraries. Programming Collective Intelligence is a good starter book:
Check out [Programming Collective Intelligence](http://www.amazon.com/Programming-Collective-Intelligence-Bu...). I found it to be a good introduction to machine learning because it uses practical (and neat) examples to teach the concepts. One of my top 5 programming books.
Ditto. Here's an Amazon link, for the lazy: http://www.amazon.com/Programming-Collective-Intelligence-Bu...
Just a quick note if you're interested, this book is similar and is an absolutely fantastic 'applied beginner' ML book "Programming Collective Intelligence"
Most of what I learned was from Programming Collective Intelligence (http://amzn.com/0596529325). If you're in the Bay Area, Noisebridge has a self-taught machine learning class that I went to for a while (great group, I just couldn't make the time commitment).
⬐ seymourzfor most of the fundamentals on Collaborative Filtering, You may check Chapters 8 and 9 from the following online book.
Programming Collective Intelligence (http://www.amazon.com/Programming-Collective-Intelligence-Bu...) is a great resource. It serves as a practical introduction to several different machine learning algorithms. Although they are presented from a specific perspective (collaborative filtering), many of the techniques are general and are used across machine learning. The explanations of the algorithms are clear, simple, and the author does a nice job of building up the level of complexity over the course of the book. Also, you will get much more out of it if you follow along with the provided python-based implementations.
⬐ phektusThis is great! Python is actually my favorite language as of the moment, using it on freelance work as well as personal projects. Thanks for the link
You can't take enough maths and statistics classes. Machine Learning - these days at least - is very maths and statistics oriented. Linear Algebra is big, so make sure you have that covered.
If you want to get your toes in the water a bit with ML, there are some great ML libraries that encapsulate some of the popular algorithms. Mahout, Weka and Mallet are popular in the Java world,
A lot of folks use Python for ML as well, and there are some good libraries there.
The R language is also popular in ML circles; as is C++. If you learn some combination of Java, Python, C++ and/or R, you'll be in good shape from a programming language standpoint.
Check out http://mloss.org/software/ also.
Some good books to get started with include:
Algorithms of the Intelligent Web
Programming Collective Intelligence
Collective Intelligence In Action
Stanford make a great series of lectures available online that you might find useful.
Python seems to be pretty popular for AI. This is a shot in the dark but could have something to do with the popularity of "Programming Collective Intelligence". It uses and teaches Python and is a top seller in the AI category on Amazon http://www.amazon.com/gp/product/0596529325?ie=UTF8&tag=...
I have really enjoyed going through the examples in that book, which was the first time I did any AI (although one might argue it's light on the AI side) type stuff since doing some Lisp stuff way long ago.
I've been deep into building a geocoder the past month. While we may get rid of Solr eventually, it was a great foot in the door to information retrieval. It helps that I have a problem to solve and a deadline, so I'm motivated to read and work through these books. These three texts have been very helpful. The last book is an excellent overview of text processing and some real world problems you may encounter writing your search engine.
Solr 1.4 Enterprise Search Server http://www.amazon.com/Solr-1-4-Enterprise-Search-Server/dp/1...
Programming Collective Intelligence http://www.amazon.com/Programming-Collective-Intelligence-Bu...
Building Search Applications: Lucene, LingPipe, and Gate http://www.amazon.com/Building-Search-Applications-Lucene-Li...
Programming Collective Intelligence also has high marks from a lot of programmers whom I respect: http://www.amazon.com/Programming-Collective-Intelligence-Bu...
⬐ henrikliedI agree, this was the eye opener for me. When it comes to data mining, I'll take practise over theory any day.
Diving directly into actual code samples, and a large plus for using Python, this book is one of the few I actually keep on my desk.⬐ gtanithe 2 books on amazon's Bought This Item Also Bought" blurb are more rigorous, and quite useful books, also covering topics like Lucene/SOLR with full java code listsings:
Algorithms of the Intelligent Web by H. Marmanis
Collective Intelligence in Action by Satnam Alag
There's really good books on Data Mining at Borders, and a recent bunch of "collective intelligence" books; the Manning books are excellent, but you have know java; you probably also want to install Weka and R, look at the Python suite (numpy, scipy, matplotlib), tools like that; also look up the ~107 (!) algorithms that Bellkor used for Netflix comp.
⬐ showerstStrong upvote on that book list. If you want to learn from scratch, start with the "Programming Collective Intelligence" book, then move to "Collective intelligence in action", they're both quite good.
If you've covered and understand the material in both, you're probably ready to consider moving to some of the more academic texts.
Since it's a vacation, here's a novel with lots of interesting ideas about ubiquitous computing, augmented reality, etc.: http://en.wikipedia.org/wiki/Rainbows_End
Fun with Python: http://www.amazon.com/Programming-Collective-Intelligence-Bu...
Collective Intelligence book has some examples - http://www.amazon.com/Programming-Collective-Intelligence-Bu...
I am not sure which technology you've used to develop your site, but Project Aura being developed by Sun developers looks really promising and is open source. Here is a link to their PDF from this year's JavaOne, they are suppose to launch it within the next month. http://developers.sun.com/learning/javaoneonline/2008/pdf/TS...
Can I recommend the book "Programming Collective Inteligence" (http://www.amazon.com/Programming-Collective-Intelligence-Bu...)? It doesn't actually describe a Reddit-like algorithm but it does describe looks of recommendation algorithms along with practical Python code. Should get you thinking in the right direction.
⬐ mark_ellulSecond that... That book is a great starting point... to get you thinking⬐ mattdennewitzthe algorithms in this book might get you thinking in generally the right direction, but you might miss out on a couple of critical ingredients -- namely, time-based decay functions that make things float and sink (and float again -- dont forget that!), weighting things based on a user's "input worth" (how much you value someone's vote), and so on.
i see these three components as being the basic broth for a good "organic ranking" site:
1. time since the article was submitted, probably represented logarithmically. 2. how many votes came in, measured by how quickly they came in from the articles submission and how far apart each vote is. 3. the "weight" of the users voting on the article.⬐ maryrosecook4. The amount of comment activity.
5. Votes vs views.