HN Books @HNBooksMonth

The best books of Hacker News.

Hacker News Comments on
Programming Collective Intelligence: Building Smart Web 2.0 Applications

Toby Segaran · 26 HN comments
HN Books has aggregated all Hacker News stories and comments that mention "Programming Collective Intelligence: Building Smart Web 2.0 Applications" by Toby Segaran.
View on Amazon [↗]
HN Books may receive an affiliate commission when you make purchases on sites after clicking through links on this page.
Amazon Summary
Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it. Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general--all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains: Collaborative filtering techniques that enable online retailers to recommend products or media Methods of clustering to detect groups of similar items in a large dataset Search engine features--crawlers, indexers, query engines, and the PageRank algorithm Optimization algorithms that search millions of possible solutions to a problem and choose the best one Bayesian filtering, used in spam filters for classifying documents based on word types and other features Using decision trees not only to make predictions, but to model the way decisions are made Predicting numerical values rather than classifications to build price models Support vector machines to match people in online dating sites Non-negative matrix factorization to find the independent features in adataset Evolving intelligence for problem solving--how a computer develops its skill by improving its own code the more it plays a game Each chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you. "Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details." -- Dan Russell, Google "Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today. If I had this book two years ago, it would have saved precious time going down some fruitless paths." -- Tim Wolters, CTO, Collective Intellect
HN Books Rankings
  • Ranked #15 all time · view

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this book.
Some resources to get you started - not including any coursera or udacity courses since others have already mentioned it.

Mathematical Monk - (includes a probability primer)

Awesome Courses - - its a very extensive list of university courses including subjects apart from Machine Learning as well

Programming Collective Intelligence - - heard very good reviews about this

Many other resources available apart from the above. You can access more such resources at

I think its a good idea to go through one or more beginner level courses like that offered by Andrew Ng on Coursera and then do an actual project.

[Disclaimer - I work at mentioned in the comment]

I really enjoyed the books "Learning From Data"[1], and "Programming Collective Intelligence"[2].

Both are accessible to beginners.

Learning From Data gives a more theoretical introduction to machine learning. One of the central ideas from the book that I still think about often is that machine learning is merely function approximation. There exists a function which will drive a car perfectly, but we don't know what that function is, so we try to approximate that function with machine learning.

Programming Collective Intelligence is a more hands-on introduction to machine learning. The book has examples in Python, but I believe the Python code is low quality. Ignoring the example code (and I did ignore it), the book is a very enjoyable introduction to many different machine learning algorithms. If you don't know the difference between linear regression, nearest-neighbors clustering, support vector machines, and a neural networks, this book will explain how each of these work and give a good intuition about when to use each.

[1] [2]

I liked "Programming Collective Intelligence", but it might be a little dated (in not using the latest libraries). It's a good way to learn some simple algorithms (optimization, clustering).

Also, rather than learning ML in 2 months (which is a very unfocussed and unattainable goal) -- try to narrow it down to some problem domain. You'd get better recommendations if you are more specific.

Programming Collective Intelligence: Building Smart Web 2.0 Applications by Toby Segaran is a bit old now but is excellent, 4.5 stars on Amazon from 100+ reviews.[1] A bit of overlap with this one, but there are some great explanations.


I got a lot of mileage out of this book: All his examples are in Python, so if you already know Python it should work well for you. The only minor thing is that there are a few typos here and there in the code, but usually you can just use your common sense and figure out what the author intended.
Code Complete 2 [1] was one of the first coding books I've read. As with anything else, it's good to look around (HN is a good place) for people who have problems with the book. I think I learn as much reading the commentary people make about books like that as I do from the book itself.

I think I've listened to every podcast on software engineering radio a few times [2]. The older ones are especially nice because they usually pick a specific topic and cover the high points. I liked that I could listen to it while I was driving, or otherwise not in front of a computer.

It's specific, but Javascript: The Good Parts is probably the most used book I have on my shelf. It has such a perfect amount of usable information in it. It's pretty great. Again, it's definitely worth looking up critiques and counterpoints.

I've also got Introduction to Algorithms, which I use as a reference, sometimes. I switched over to The Algorithm Design Manual [5] after I saw it referenced in an older Steve Yegge post [6]. I read through the intro and it seemed like a book that would be more appropriate from an autodidactic standpoint. I really have no idea if that's going to pan out, since I'm not that far into it, but we'll see, for sure. Doesn't kill me to have an extra algorithms book laying about, though, and I've always got intro to algorithms for cross reference. I've found that I really need to have as many sources available as possible when I'm learning alone. Usually I don't get something until the fifth person describes it from the tenth different angle.

That's most of what I can think of off hand. I really enjoyed The Joy of Clojure [7], though haven't checked out the newer version. Programming Collective Intelligence [8] is a fun book, and is what made me want to go back down the maths route to get more into machine learning.

And of course habitually reading hacker news for an hour or three every night :)

So that's my totally inexpert list of random stuff that I enjoy

[1] [2] [3] [4] [5] [6] [7] [8]

I found this to be quite a good introduction.

I also have this book and highly recommend it.
I'm in a very similar position.

If you really like Codeacademy, there are non-track exercises that involve Python in the API section [0] and a couple of Python challenges [1][2] that aren't listed.

What I'm doing now:

* Solving exercises on Project Euler in Python. [3]

* Working through each example in the Python Cookbook[4]. It was just updated to the third edition.

* Watched Guido's Painless Python talks from a few years ago [5]. I found his concise explanations of language features really helpful.

Some things I intend to do:

* Finish working through Collective Intelligence [6]. The examples are written in Python.

* Work through Introduction to Algorithms [7]. The course uses Python.

* Read, understand and give a shot at extending Openstack [8] code.











You can search announced, in progress, future, self-paced, and finished MOOCs (Massive Open Online Courses) with :
Aren't Project Euler's exercises seem more likely maths exercises? It's kinda difficult for those who graduated from social sciences and tries to learn programming from scratch.
The Green Tea Press books are great; and free.

Think Python: How To Think Like a Computer Scientist

Think Complexity: Exploring Complexity Science with Python :

Think Stats: Probability and Statistics for Programmers :

Yep. Project Euler is a waste of time if you're trying to get up to speed in learning programming.
>Aren't Project Euler's exercises seem more likely maths exercises?

Project Euler does involve a math, but so does efficient programming.

Efficiency can seem a pretty abstract thing and it might not crop up right away in more typical programming tasks. Working a Euler problem and refining to a solution that runs in 1% or 0.001% of the time required for the most straightforward solution is a great demonstrator.

>It's kinda difficult for those who graduated from social sciences and tries to learn programming from scratch.

Sure, but the context of the question here isn't really from scatch. The OP has already completed at least the 296 exercises in the Python track at Codeacademy to establish a base.

Personally, I haven't graduated from anything and I treat the Euler exercises as an interesting way to practice/learn a bit of programming and math.

Signal and the noise by Nate Silver(Very nice read. Chapters on climate change and GDP forecasting were a bit slow, but everything else was a page turner)

Why I left Goldman Sachs by Greg Smith (Good insight into the 2008 financial breakdown and a look into the day to day operations of Goldman Sachs)

The Hobbit

Data Mining: Concepts and Techniques(Great intro into data mining)

Programming Collective Intelligence(You can play around with actual implementations of the concepts in the previous book)

Ghost in the Wires by Kevin Mitnick (Was really nice to see the details behind Mitnick's adventures)

On War By Clausewitz(Really enjoyed this book.)

Ruby's great for recommendation engines and JRuby has all the Java libraries. Programming Collective Intelligence is a good starter book:;

Feb 17, 2012 · joverholt on Hacking Hacker News
Check out [Programming Collective Intelligence]( I found it to be a good introduction to machine learning because it uses practical (and neat) examples to teach the concepts. One of my top 5 programming books.
Just a quick note if you're interested, this book is similar and is an absolutely fantastic 'applied beginner' ML book "Programming Collective Intelligence"

Most of what I learned was from Programming Collective Intelligence ( If you're in the Bay Area, Noisebridge has a self-taught machine learning class that I went to for a while (great group, I just couldn't make the time commitment).
for most of the fundamentals on Collaborative Filtering, You may check Chapters 8 and 9 from the following online book.

Programming Collective Intelligence ( is a great resource. It serves as a practical introduction to several different machine learning algorithms. Although they are presented from a specific perspective (collaborative filtering), many of the techniques are general and are used across machine learning. The explanations of the algorithms are clear, simple, and the author does a nice job of building up the level of complexity over the course of the book. Also, you will get much more out of it if you follow along with the provided python-based implementations.
This is great! Python is actually my favorite language as of the moment, using it on freelance work as well as personal projects. Thanks for the link
You can't take enough maths and statistics classes. Machine Learning - these days at least - is very maths and statistics oriented. Linear Algebra is big, so make sure you have that covered.

If you want to get your toes in the water a bit with ML, there are some great ML libraries that encapsulate some of the popular algorithms. Mahout[1], Weka[2] and Mallet[3] are popular in the Java world,

A lot of folks use Python for ML as well, and there are some good libraries there.

The R language is also popular in ML circles; as is C++. If you learn some combination of Java, Python, C++ and/or R, you'll be in good shape from a programming language standpoint.

Check out also.

Some good books to get started with include:

Algorithms of the Intelligent Web[4]

Programming Collective Intelligence[5]

Collective Intelligence In Action[6]

Stanford make a great series of lectures[7] available online that you might find useful.








Python seems to be pretty popular for AI. This is a shot in the dark but could have something to do with the popularity of "Programming Collective Intelligence". It uses and teaches Python and is a top seller in the AI category on Amazon

I have really enjoyed going through the examples in that book, which was the first time I did any AI (although one might argue it's light on the AI side) type stuff since doing some Lisp stuff way long ago.

Norvig's Artificial Intelligence: A Modern Approach also has Python code examples.
Oh wow, do you know what edition started doing that?
I've been deep into building a geocoder the past month. While we may get rid of Solr eventually, it was a great foot in the door to information retrieval. It helps that I have a problem to solve and a deadline, so I'm motivated to read and work through these books. These three texts have been very helpful. The last book is an excellent overview of text processing and some real world problems you may encounter writing your search engine.

Solr 1.4 Enterprise Search Server

Programming Collective Intelligence

Building Search Applications: Lucene, LingPipe, and Gate

Programming Collective Intelligence also has high marks from a lot of programmers whom I respect:
I agree, this was the eye opener for me. When it comes to data mining, I'll take practise over theory any day.

Diving directly into actual code samples, and a large plus for using Python, this book is one of the few I actually keep on my desk.

the 2 books on amazon's Bought This Item Also Bought" blurb are more rigorous, and quite useful books, also covering topics like Lucene/SOLR with full java code listsings:

Algorithms of the Intelligent Web by H. Marmanis

Collective Intelligence in Action by Satnam Alag

There's really good books on Data Mining at Borders, and a recent bunch of "collective intelligence" books; the Manning books are excellent, but you have know java; you probably also want to install Weka and R, look at the Python suite (numpy, scipy, matplotlib), tools like that; also look up the ~107 (!) algorithms that Bellkor used for Netflix comp.

Strong upvote on that book list. If you want to learn from scratch, start with the "Programming Collective Intelligence" book, then move to "Collective intelligence in action", they're both quite good.

If you've covered and understand the material in both, you're probably ready to consider moving to some of the more academic texts.

Since it's a vacation, here's a novel with lots of interesting ideas about ubiquitous computing, augmented reality, etc.:

Fun with Python:

Collective Intelligence book has some examples -

I am not sure which technology you've used to develop your site, but Project Aura being developed by Sun developers looks really promising and is open source. Here is a link to their PDF from this year's JavaOne, they are suppose to launch it within the next month.

Jun 01, 2008 · almost on Popularity Algorithms
Can I recommend the book "Programming Collective Inteligence" ( It doesn't actually describe a Reddit-like algorithm but it does describe looks of recommendation algorithms along with practical Python code. Should get you thinking in the right direction.
Second that... That book is a great starting point... to get you thinking
the algorithms in this book might get you thinking in generally the right direction, but you might miss out on a couple of critical ingredients -- namely, time-based decay functions that make things float and sink (and float again -- dont forget that!), weighting things based on a user's "input worth" (how much you value someone's vote), and so on.

i see these three components as being the basic broth for a good "organic ranking" site:

  1. time since the article was submitted, probably represented logarithmically.

  2. how many votes came in, measured by how quickly they came in from the articles submission and how far apart each vote is.

  3. the "weight" of the users voting on the article.
4. The amount of comment activity.

5. Votes vs views.

HN Books is an independent project and is not operated by Y Combinator or
~ yaj@
;laksdfhjdhksalkfj more things ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.