HN Books @HNBooksMonth

The best books of Hacker News.

Hacker News Comments on
Pattern Recognition and Machine Learning (Information Science and Statistics)

Christopher M. Bishop · 1 HN points · 17 HN comments
HN Books has aggregated all Hacker News stories and comments that mention "Pattern Recognition and Machine Learning (Information Science and Statistics)" by Christopher M. Bishop.
View on Amazon [↗]
HN Books may receive an affiliate commission when you make purchases on sites after clicking through links on this page.
Amazon Summary
This is the first textbook on pattern recognition to present the Bayesian viewpoint. The book presents approximate inference algorithms that permit fast approximate answers in situations where exact answers are not feasible. It uses graphical models to describe probability distributions when no other books apply graphical models to machine learning. No previous knowledge of pattern recognition or machine learning concepts is assumed. Familiarity with multivariate calculus and basic linear algebra is required, and some experience in the use of probabilities would be helpful though not essential as the book includes a self-contained introduction to basic probability theory.
HN Books Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this book.
I’m up to Chapter 6 in ISLR

https://github.com/melling/ISLR

Would Elements of Statistical Learning be my next book?

I’ve seen the Bishop book highly recommended too, and it has been mentioned in this post.

https://www.amazon.com/Pattern-Recognition-Learning-Informat...

scythmic_waves
> Would Elements of Statistical Learning be my next book?

Honestly no I don't think so. ESL is likely too advanced.

I would use that screenshot I posted above as a litmus test. Do you understand that notation? The `E` with the subscript? And why they're using `trace[]`? If you do, then you can likely follow ESL. If not -- which would be understandable because even early undergrads likely can't -- then I say you shouldn't try and follow ISL up directly with ESL. It really is a graduate text.

> I’ve seen the Bishop book highly recommended too, and it has been mentioned in this post.

Bishop has a similar problem: [1]. I had to scroll to chapter 2 for this screenshot (pg. 83), but it really is par for the course.

So, and this is totally my opinion here so YMMV, recommendations for foundational ML info tend to be wildly too advanced for the people seeking them out. I'm a math-y person. I really like learning about the math foundations of ML. But ML builds on a lot of other concepts and you can't just jump into the deep end. In my opinion, ML foundations should come at the end of a lengthy sequence of math and statistics courses. Students will just be too lost without them.

I don't mean to be discouraging here. I think nearly anyone who's willing to put in the time can learn this stuff! But here's a more reasonable sequence I found on reddit a while back that would set someone up nicely for being able to follow ESL: [2]. Without the proper foundation, it's just too difficult to follow ESL or Bishop IMO.

Last, I'll note that you don't need to understand the nitty-gritty of ML math to be an ML practitioner. In fact, I'd argue that taking the effort would be distracting because 1) a basic understanding (like you'd get from working through ISL) is probably good enough to start messing with libraries and 2) practitioners need a whole bunch of other knowledge (like general software skills and how to maintain ML datasets) that they also have to take the time to learn.

[1] https://imgur.com/uXWZ6Bv

[2] https://www.reddit.com/r/learnmachinelearning/comments/ggpzk...

melling
It has been a while, but I do understand Σ, e^x, ln, matrices, vectors, etc.

However, like you mentioned, you don't need to work through the proofs to understand logistic regression, lasso, ridge regression, and bootstrapping, for example.

stevegalla
What is your background and what is your goal for learning the methods?

This is somewhat long and there is a disclaimer towards the end, but hopefully some of this is helpful.

Working through a book can mean reading what’s on the pages and being able to recall names of techniques or methods. Or using pen and paper to work through the examples and be able to solve problems. This could even be deriving what’s in the book from first principles.

This will depend on what you want to do with the material. If you want to apply it using pre-made R packages, you probably don’t need to recreate everything from scratch and you can probably get away with ISL. If you want to be creating new methods or going beyond pre-made R packages, then you probably need to work up to ESL and solve things from first principles.

ISL is used in an undergrad elective course at my uni. The prerequisite stat material covers Devore probability and stats for engineers and intro to linear regression by Douglas Montgomery. ISL would be a third course in stats (see the bottom for the math background 4 courses). There are entire courses dedicated to the topics in ISL, so I really think ISL is most useful to bring previously studied topics together.

ESL is used in a second year MSc course. This assumes knowledge of mathematical statistics (Casella and Berger Statistical Inference + Wasserman All of Statistics), computational statistics (topics: bootstrap, MCMC, EM algorithm, numerical analysis methods, optimization, and matrix decomposition) and courses on linear regression and the general linear model. So it’s a “capstone” of sorts that ties all of the material together. I haven’t taken any of these courses, so I can’t comment on what’s really necessary.

Disclaimers follow: As others have mentioned someone’s background and preparation may be different and more advanced than what is outlined. Above I outlined the course sequences for ISL and ESL at my uni. We do not require a course on real analysis and we do not do measure theoretic probability (PhDs do but ESL is covered in the MSc that is required for PhD admissions). Of course not every chapter in a textbook is covered in each course and I’m sure there is some sort of minimal coverage of topics that will allow you to get to ISL or ESL in a more efficient way. What that is, I am unable to comment on.

Yes there are people admitted to the MSc program without a stats BSc degree. Examples are physics, math, and computer science majors from what I have seen. Usually they have to make up missing BSc math stats courses.

Undergrad level math background assumes calculus to include multi variable calculus (Stewart Calculus omitting the chapters on vector calculus). Partial derivatives, Lagrange multipliers, multiple integrals. Also linear algebra, matrix multiplications, determinants, eigenvalues, trace (linear algebra and its applications by Lay).

kvathupo
I'm a recent graduate from undergrad doing work in deep learning. While by no means an expert, I'm inclined to respectfully disagree with the assessment of /u/scythmic_waves. The Reddit roadmap is total overkill as a pre-req for ESL and Bishop (Analysis, Topology, and proof-based Linear Algebra are certainly not needed for them).

I think the only hard pre-req would be a solid understanding of non-axiomatic probability up through the Law of large numbers. For the rest, I'm of the, perhaps naive, school of thought that one ought to jump in the deep end, and consult a variety of sources as need be. iirc, most of Munkres, and Hoffman & Kunze are not needed for these books. Granted, you might find yourself picking these books up as your focus narrows, but for these books, you don't need them.

With that out of the way, I'd highly, highly recommend Bishop as reading, after ISLR.

Edit: In response to your other comment, I also disagree: proofs, especially for regression problems, are important for understanding why we use them.

scythmic_waves
> While by no means an expert, I'm inclined to respectfully disagree with the assessment of /u/scythmic_waves.

Not a problem! These are all just my opinions.

> Analysis, Topology, and proof-based Linear Algebra are certainly not needed for them

Although this is explained in the prose of the document, I should have highlighted it myself: only the nodes in blue are required. The orange nodes (Analysis, Topology, Functional Analysis, etc.) are extra. They aren't required for ESL.

Honestly, as long as you get up to the level of the Casella & Berger text, you'll probably be fine. And a lot of C&B can be skipped (like the focus on ANOVA or experiment design). But I also like that roadmap because after C&B, there's additional emphasis on Linear models which is helpful for ESL.

> For the rest, I'm of the, perhaps naive, school of thought that one ought to jump in the deep end, and consult a variety of sources as need be.

And I suppose this is where you and I differ. I find it discouraging to need to stop partway though a text and go learn a whole new subject area before continuing. Instead, I find that building up the foundation and then working through a text to be a more enjoyable experience because it's just building on what I know.

But to each their own!

kvathupo
I just wanted to say that I appreciate the courteous reply! :)
From my experience, these resources are worth read:

[1] Pattern Recognition and Machine Learning (Information Science and Statistics) by Christopher M. Bishop

Andreas Brandmaier's permutation distribution clustering is a method rooted in the dissimilarities between time series, formalized as the divergence between their permutation distributions. Personally, I think this is your "best" option http://cran.r-project.org/web/packages/pdc/index.html

Eamonn Keogh's SAX (Symbolic Aggregate Approximation) and iSAX routines develop "shape clustering" for time series

http://www.cs.ucr.edu/~eamonn/SAX.htm

There are approaches based on text compression algorithms that remove the redundancy in a sequence of characters (or numbers), creating a kind of distance or density metric that can be used as inputs to clustering, see, e.g.:

http://link.springer.com/chapter/10.1007/978-0-387-84816-7_4

This paper by Rob Hyndman Dimension Reduction for Clustering Time Series Using Global Characteristics, discusses compressing a time series down to a small set of global moments or metrics and clustering on those:

http://www.robjhyndman.com/papers/wang2.pdf

Chapter 15 in Aggarwal and Reddy's excellent book, Data Clustering, is devoted to a wide range (a laundry list, really) of time-series clustering methods (pps 357-380). The discussion provides excellent background to many of the issues specific to clustering a time series"

http://users.eecs.northwestern.edu/~goce/SomePubs/Similarity...

...and a lot more.

-- URL --

[1] https://www.amazon.com/Pattern-Recognition-Learning-Informat...

golanggeek
Thanks for the great links. I will check it out.
> study textbooks. Do exercises. Treat it like academic studying

This. Highly recommend Russel & Norvig [1] for high-level intuition and motivation. Then Bishop's "Pattern Recognition and Machine Learning" [2] and Koller's PGM book [3] for the fundamentals.

Avoid MOOCs, but there are useful lecture videos, e.g. Hugo Larochelle on belief propagation [4].

FWIW this is coming from a mechanical engineer by training, but self-taught programmer and AI researcher. I've been working in industry as an AI research engineer for ~6 years.

[1] https://www.amazon.com/Artificial-Intelligence-Modern-Approa...

[2] https://www.amazon.com/Pattern-Recognition-Learning-Informat...

[3] https://www.amazon.com/Probabilistic-Graphical-Models-Princi...

[4] https://youtu.be/-z5lKPHcumo

jimmy-dean
Oof those are all dense reads for a new comer... For a first dip into the waters I usually suggest Introduction to Statistical Learning. Then from there move into PRML or ESL. Were you first introduced to core ML through Bishop? +1 for a solid reading list.
sampo
PGMs were in fashion in 2012, but by 2014 when Deep Learning had become all the rage, I think PGMs almost disappeared from the picture. Do people even remember PGMs exist now in 2019?
srean
Fashion is relevant only if you want to approach it as a fashion industry.
vazamb
PGMs also provide the intuition behind GANs and variational autoencoders.
KidComputer
You'll find plate models, PGM junk, etc in modern papers on explicit density generative models and factorizing latents on such models.
godelmachine
Hands up for Bishop and Russel Norvig.

Russel Norvig should be treated as a subtle intro to AI.

The start Bishop to understand concepts.

magoghm
I would also include some books about statistics. Two excellent introductory books are:

Statistical Rethinking https://www.amazon.com/Statistical-Rethinking-Bayesian-Examp...

An Introduction to Statistical Learning http://www-bcf.usc.edu/~gareth/ISL/

In machine learning, hands down these are some of the best related textbooks:

- [0] Pattern Recognition and Machine Learning (Information Science and Statistics)

and also:

- [1] The Elements of Statistical Learning

- [2] Reinforcement Learning: An Introduction by Barto and Sutton

- [3] The Deep Learning by Aaron Courville, Ian Goodfellow, and Yoshua Bengio

- [4] Neural Network Methods for Natural Language Processing (Synthesis Lectures on Human Language Technologies) by Yoav Goldberg

Then some math tid-bits:

[5] Introduction to Linear Algebra by Strang

----------- links:

- [0] [PDF](http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%...)

- [0][AMZ](https://www.amazon.com/Pattern-Recognition-Learning-Informat...)

- [2] [amz](https://www.amazon.com/Reinforcement-Learning-Introduction-A...)

- [2] [site](https://www.deeplearningbook.org/)

- [3] [amz](https://www.amazon.com/Deep-Learning-Adaptive-Computation-Ma...)

- [3] [pdf](http://incompleteideas.net/book/bookdraft2017nov5.pdf)

- [4] [amz](https://www.amazon.com/Language-Processing-Synthesis-Lecture...)

- [5] [amz](https://www.amazon.com/Introduction-Linear-Algebra-Gilbert-S...)

Thriptic
+1 for Elements. I started with Introduction to Statistical Learning and then graduated to Elements as I learned more and grew more confident. Those are fantastic books.
caliber
Could you elaborate how you switched to Elements? I am curious if it makes sense for one to go through both books in sequence.
jrumbut
If you reading Elements is difficult then I would recommend Introduction.

I'm not sure if reading Introduction will prepare you for Elements so much as it will just give you some knowledge you can use and see if it makes sense for you and what you want to do to go and (re)learn some of the math tidbits that you need for Elements.

turingcompeteme
As an engineer who hadn't studied that type of math in quite a while, Elements was pretty tough and I was getting stuck a lot.

ISLR introduces you to many of the same topics in a less rigorous way. Once I was familiar with the topics and had worked through the exercises, Elements became much easier to learn from.

larrydag
For regression I really like Frank Harrell's Regression Modeling Strategies. http://biostat.mc.vanderbilt.edu/wiki/Main/RmS
rwilson4
I recently read Seber and Lee, Linear Regression Analysis, and highly recommend it.

https://www.amazon.com/Linear-Regression-Analysis-George-Seb...

jrumbut
Frank Harrell writes a lot of great stuff and his answers on the Cross Validated Stack Exchange site are worth just reading even if you didn't think you wanted to ask the question they reply to.

His blog, http://www.fharrell.com, also contains interesting posts.

dajohnson89
>[5] Introduction to Linear Algebra by Strang

People seem to love this textbook - and understandably so because it's very approachable. But I really struggled with how informal the tone was, and how friendly it was. Perhaps I'd grown too accustomed to the typical theorem -> proof -> example -> problem set format.

None
None
cbHXBY1D
I have to disagree with The Deep Learning book. I don't find it a good book for anyone. For beginners it's too advanced/theoretical and for experienced ML scientists it's entirely too basic. I very much agree with this review on Amazon [1].

For the former, I would recommend Hands-On Learning with Scikit-Learn and Tensorflow

[1] https://www.amazon.com/gp/customer-reviews/R1XNPL1BX5IVOM/re...

phonebucket
>For beginners it's too advanced/theoretical and for experienced ML scientists it's entirely too basic.

As a scientist coming to deep learning from another field, I found Courville et al to be pitched at the perfect level.

I made the same transition earlier in my career. One book on deep learning that meets your requirements is [0]. It’s readable, covers a broad set of modern topics, and has pragmatic tips for real use cases.

For general machine learning, there are many, many books. A good intro is [1] and a more comprehensive, reference sort of book is [2]. Frankly, by this point, even reading the documentation and user guide of scikit-learn has a fairly good mathematical presentation of many algorithms. Another good reference book is [3].

Finally, I would also recommend supplementing some of that stuff with Bayesian analysis, which can address many of the same problems, or be intermixed with machine learning algorithms, but which is important for a lot of other reasons too (MCMC sampling, hierarchical regression, small data problems). For that I would recommend [4] and [5].

Stay away from bootcamps or books or lectures that seem overly branded with “data science.” This usually means more focus on data pipeline tooling, data cleaning, shallow details about a specific software package, and side tasks like wrapping something in a webservice.

That stuff is extremely easy to learn on the job and usually needs to be tailored differently for every different project or employer, so it’s a relative waste of time unless it is the only way you can get a job.

[0]: < https://www.amazon.com/Deep-Learning-Adaptive-Computation-Ma... >

[1]: < https://www.amazon.com/Pattern-Classification-Pt-1-Richard-D... >

[2]: < https://www.amazon.com/Pattern-Recognition-Learning-Informat... >

[3]: < http://www.web.stanford.edu/~hastie/ElemStatLearn/ >

[4]: < http://www.stat.columbia.edu/~gelman/book/ >

[5]: < http://www.stat.columbia.edu/~gelman/arm/ >

soVeryTired
+1 for Gelman, but I hate Bishop's book [2]. It was an early go-to reference in the field, but there are better books out there now.
hikarudo
What do you hate about Bishop's book? I'm genuinely curious.
soVeryTired
Honestly, I don't understand the way he explains things. The maths is difficult to follow, and it just never clicks for me. Maybe he's writing for someone with a physics background or something, but I feel stupid when I read bishop.

I just read over his description of how to transform a uniform random variable into a variable with a desired distribution (p. 526). It's a fairly easy trick, but if I didn't already know it I wouldn't understand his explanation

bllguo
I'm trying to read through it and I have to agree, his math isn't that clear to me. What do you recommend?
soVeryTired
David Barber!
Iwan-Zotow
Goodfellow book [0] is available for free, http://www.deeplearningbook.org/
The AI for humans series is some reasonable, high level approach. http://www.heatonresearch.com/aifh/

After you've got a grasp of what these things are doing then you can move into the how. For that you will need some math background, with emphasis in calculus and probability.

After that, you can take a look at PRML. https://www.amazon.com/Pattern-Recognition-Learning-Informat...

Some people might prefer seeing things from another approach. http://pgm.stanford.edu/

Good luck.

Dec 04, 2016 · rm999 on Model-Based Machine Learning
The introduction clarifies what the authors mean. In this context "model" isn't about implementing a supervised model, it's about "modeling" your problem to build a bespoke algorithm that closely matches the problem. Unsupervised methods like clustering would probably fit in here too.

I haven't read much of this early access book yet, but I'd give the authors a lot of benefit of the doubt. Christopher Bishop wrote one of my favorite machine learning books (I read it after my graduate study in machine learning and it filled in a lott of the gaps): https://www.amazon.com/Pattern-Recognition-Learning-Informat...

brudgers
From the Hacker News guidelines:

Please don't insinuate that someone hasn't read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that."

It is possible to edit the comment to remove the phrase if you wish.

rm999
It was an honest question, not snark (passive aggressiveness is not my style).

The introduction is kind of hidden on the page, and clarifies the meaning of "model" in this context. Otherwise, GP is correct that "model" is often used to mean a supervised model, and that people generally call it "supervised learning", not "model-based learning".

brudgers
I'm glad it was an honest question. Editing the comment is an option.

I think the guideline exists because even as an honest question it does not add anything to the comment and at best an answer doesn't change anything and at worst it detracts from meaningful dialog.

One feature of this particular guideline is that it provides an alternative phrasing that is likely to avoid misinterpretation.

rm999
>I think the guideline exists because even as an honest question it does not add anything to the comment

I hope you see the irony here considering how much you're derailing this conversation (I'm only responding because I realize your intentions are good). And I'm pretty confident my comment added plenty of value to the discussion - I realize sometimes tone is lost in text, but after my clarification I don't see why you need to harp on this. Anyway, original comment edited.

brudgers
If I had thought of suggesting editing your comment before posting my second comment, then it might have been different. And in a similar situation in the future I well might. That said, until I thought about it a bit more, it didn't occur to me. Anyway, for me, writing is thinking.
Nov 08, 2016 · 1 points, 0 comments · submitted by jahan
Depending on your level of programming ability, one algorithm a day, IMHO, is completely doable. A number of comments and suggestions say that one per day is an unrealistic goal (yes, maybe it is) but the idea of setting a goal and working through a list of algorithms is very reasonable.

If you are just learning programming, plan on taking your time with the algorithms but practice coding every day. Find a fun project to attempt that is within your level of skill.

If you are a strong programmer in one language, find a book of algorithms using that language (some of the suggestions here in these comments are excellent). I list some of the books I like at the end of this comment.

If you are an experienced programmer, one algorithm per day is roughly doable. Especially so, because you are trying to learn one algorithm per day, not produce working, production level code for each algorithm each day.

Some algorithms are really families of algorithms and can take more than a day of study, hash based look up tables come to mind. First there are the hash functions themselves. That would be day one. Next there are several alternatives for storing entries in the hash table, e.g. open addressing vs chaining, days two and three. Then there are methods for handling collisions, linear probing, secondary hashing, etc.; that's day four. Finally there are important variations, perfect hashing, cuckoo hashing, robin hood hashing, and so forth; maybe another 5 days. Some languages are less appropriate for playing around and can make working with algorithms more difficult, instead of a couple of weeks this could easily take twice as long. After learning other methods of implementing fast lookups, its time to come back to hashing and understand when its appropriate and when alternatives are better and to understand how to combine methods for more sophisticated lookup methods.

I think you will be best served by modifying your goal a bit and saying that you will work on learning about algorithms every day and cover all of the material in a typical undergraduate course on the subject. It really is a fun branch of Computer Science.

A great starting point is Sedgewick's book/course, Algorithms [1]. For more depth and theory try [2], Cormen and Leiserson's excellent Introduction to Algorithms. Alternatively the theory is also covered by another book by Sedgewick, An Introduction to the Analysis of Algorithms [3]. A classic reference that goes far beyond these other books is of course Knuth [4], suitable for serious students of Computer Science less so as a book of recipes.

After these basics, there are books useful for special circumstances. If your goal is to be broadly and deeply familiar with Algorithms you will need to cover quite a bit of additional material.

Numerical methods -- Numerical Recipes 3rd Edition: The Art of Scientific Computing by Tuekolsky and Vetterling. I love this book. [5]

Randomized algorithms -- Randomized Algorithms by Motwani and Raghavan. [6], Probability and Computing: Randomized Algorithms and Probabilistic Analysis by Michael Mitzenmacher, [7]

Hard problems (like NP) -- Approximation Algorithms by Vazirani [8]. How to Solve It: Modern Heuristics by Michalewicz and Fogel. [9]

Data structures -- Advanced Data Structures by Brass. [10]

Functional programming -- Pearls of Functional Algorithm Design by Bird [11] and Purely Functional Data Structures by Okasaki [12].

Bit twiddling -- Hacker's Delight by Warren [13].

Distributed and parallel programming -- this material gets very hard so perhaps Distributed Algorithms by Lynch [14].

Machine learning and AI related algorithms -- Bishop's Pattern Recognition and Machine Learning [15] and Norvig's Artificial Intelligence: A Modern Approach [16]

These books will cover most of what a Ph.D. in CS might be expected to understand about algorithms. It will take years of study to work though all of them. After that, you will be reading about algorithms in journal publications (ACM and IEEE memberships are useful). For example, a recent, practical, and important development in hashing methods is called cuckoo hashing, and I don't believe that it appears in any of the books I've listed.

[1] Sedgewick, Algorithms, 2015. https://www.amazon.com/Algorithms-Fourth-Deluxe-24-Part-Lect...

[2] Cormen, et al., Introduction to Algorithms, 2009. https://www.amazon.com/s/ref=nb_sb_ss_i_1_15?url=search-alia...

[3] Sedgewick, An Introduction to the Analysis of Algorithms, 2013. https://www.amazon.com/Introduction-Analysis-Algorithms-2nd/...

[4] Knuth, The Art of Computer Programming, 2011. https://www.amazon.com/Computer-Programming-Volumes-1-4A-Box...

[5] Tuekolsky and Vetterling, Numerical Recipes 3rd Edition: The Art of Scientific Computing, 2007. https://www.amazon.com/Numerical-Recipes-3rd-Scientific-Comp...

[6] https://www.amazon.com/Randomized-Algorithms-Rajeev-Motwani/...

[7]https://www.amazon.com/gp/product/0521835402/ref=pd_sim_14_2...

[8] Vazirani, https://www.amazon.com/Approximation-Algorithms-Vijay-V-Vazi...

[9] Michalewicz and Fogel, https://www.amazon.com/How-Solve-Heuristics-Zbigniew-Michale...

[10] Brass, https://www.amazon.com/Advanced-Data-Structures-Peter-Brass/...

[11] Bird, https://www.amazon.com/Pearls-Functional-Algorithm-Design-Ri...

[12] Okasaki, https://www.amazon.com/Purely-Functional-Structures-Chris-Ok...

[13] Warren, https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...

[14] Lynch, https://www.amazon.com/Distributed-Algorithms-Kaufmann-Manag...

[15] Bishop, https://www.amazon.com/Pattern-Recognition-Learning-Informat...

[16] Norvig, https://www.amazon.com/Artificial-Intelligence-Modern-Approa...

In retrospect, my other comment was stupidly obtuse. Both too technical (in the sense of specificity) and too unstructured (in the sense of presentation order). A more appropriate path from CS might be analogous (well, inverse if anything) to the path Robert Goldblatt has taken. It dips into nonstandard analysis, but not totally without reason. Some subset of the following, with nLab and Wikipedia supplementing as necessary:

0. Milewski's "Category Theory for Programmers"[0]

1. Goldblatt's "Topoi"[1]

2. McLarty's "The Uses and Abuses of the History of Topos Theory"[2] (this does not require [1], it just undoes some historical assumptions made in [1] and, like everything else by McLarty, is extraordinarily well-written)

3. Goldblatt's "Lectures on the Hyperreals"[3]

4. Nelson's "Radically Elementary Probability Theory"[4]

5. Tao's "Ultraproducts as a Bridge Between Discrete and Continuous Analysis"[5]

6. Some canonical machine learning text, like Murphy[6] or Bishop[7]

7. Koller/Friedman's "Probabilistic Graphical Models"[8]

8. Lawvere's "Taking Categories Seriously"[9]

From there you should see a variety of paths for mapping (things:Uncertainty) <-> (things:Structure). The Giry monad is just one of them, and would probably be understandable after reading Barr/Wells' "Toposes, Triples and Theories"[10].

The above list also assumes some comfort with integration. Particularly good books in line with this pedagogical path might be:

9. Any and all canonical intros to real analysis

10. Malliavin's "Integration and Probability"[11]

11. Segal/Kunze's "Integrals and Operators"[12]

Similarly, some normative focus on probability would be useful:

12. Jaynes' "Probability Theory"[13]

13. Pearl's "Causality"[14]

---

[0] https://bartoszmilewski.com/2014/10/28/category-theory-for-p...

[1] https://www.amazon.com/Topoi-Categorial-Analysis-Logic-Mathe...

[2] http://www.cwru.edu/artsci/phil/UsesandAbuses%20HistoryTopos...

[3] https://www.amazon.com/Lectures-Hyperreals-Introduction-Nons...

[4] https://web.math.princeton.edu/%7Enelson/books/rept.pdf

[5] https://www.youtube.com/watch?v=IS9fsr3yGLE

[6] https://www.amazon.com/Machine-Learning-Probabilistic-Perspe...

[7] https://www.amazon.com/Pattern-Recognition-Learning-Informat...

[8] https://www.amazon.com/Probabilistic-Graphical-Models-Princi...

[9] http://www.emis.de/journals/TAC/reprints/articles/8/tr8.pdf

[10] http://www.tac.mta.ca/tac/reprints/articles/12/tr12.pdf

[11] https://www.springer.com/us/book/9780387944098

[12] https://www.amazon.com/Integrals-Operators-Grundlehren-mathe...

[13] http://www.med.mcgill.ca/epidemiology/hanley/bios601/Gaussia...

[14] https://www.amazon.com/Causality-Reasoning-Inference-Judea-P...

Some good books on Machine Learning:

Machine Learning: The Art and Science of Algorithms that Make Sense of Data (Flach): http://www.amazon.com/Machine-Learning-Science-Algorithms-Se...

Machine Learning: A Probabilistic Perspective (Murphy): http://www.amazon.com/Machine-Learning-Probabilistic-Perspec...

Pattern Recognition and Machine Learning (Bishop): http://www.amazon.com/Pattern-Recognition-Learning-Informati...

There are some great resources/books for Bayesian statistics and graphical models. I've listed them in (approximate) order of increasing difficulty/mathematical complexity:

Think Bayes (Downey): http://www.amazon.com/Think-Bayes-Allen-B-Downey/dp/14493707...

Bayesian Methods for Hackers (Davidson-Pilon et al): https://github.com/CamDavidsonPilon/Probabilistic-Programmin...

Doing Bayesian Data Analysis (Kruschke), aka "the puppy book": http://www.amazon.com/Doing-Bayesian-Data-Analysis-Second/dp...

Bayesian Data Analysis (Gellman): http://www.amazon.com/Bayesian-Analysis-Chapman-Statistical-...

Bayesian Reasoning and Machine Learning (Barber): http://www.amazon.com/Bayesian-Reasoning-Machine-Learning-Ba...

Probabilistic Graphical Models (Koller et al): https://www.coursera.org/course/pgm http://www.amazon.com/Probabilistic-Graphical-Models-Princip...

If you want a more mathematical/statistical take on Machine Learning, then the two books by Hastie/Tibshirani et al are definitely worth a read (plus, they're free to download from the authors' websites!):

Introduction to Statistical Learning: http://www-bcf.usc.edu/~gareth/ISL/

The Elements of Statistical Learning: http://statweb.stanford.edu/~tibs/ElemStatLearn/

Obviously there is the whole field of "deep learning" as well! A good place to start is with: http://deeplearning.net/

yedhukrishnan
Those are really useful. Thank you. Books are pricey though!
shogunmike
I know...some of them are indeed expensive!

At least the latter two ("ISL" and "ESL") are free to download though.

alexcasalboni
Those are great resources!

In case you are interested in MLaaS (Machine Learning as a Service), you can check these as well:

Amazon Machine Learning: http://aws.amazon.com/machine-learning/ (my review here: http://cloudacademy.com/blog/aws-machine-learning/)

Azure Machine Learning: http://azure.microsoft.com/en-us/services/machine-learning/ (my review here: http://cloudacademy.com/blog/azure-machine-learning/)

Google Prediction API: https://cloud.google.com/prediction/

BigML: https://bigml.com/

Prediction.io: https://prediction.io/

OpenML: http://openml.org/

yedhukrishnan
I went through the links and your review. They are really good. Thanks!
I found Christopher Bishop's book to have a good deal of knowledge in it. Plus, he compliments many ideas with graphical figures, which is a big plus. http://www.amazon.com/Pattern-Recognition-Learning-Informati...

Also, if you needed more information about optimization methods all of Stephen Boyd's books are really good, just check out his entire website for information. http://www.stanford.edu/~boyd/

My favorite introductory textbook on machine learning is the Tom Mitchell book: http://www.amazon.com/Machine-Learning-Tom-M-Mitchell/dp/007...

The Bishop book is the most popular though: http://www.amazon.com/gp/product/0387310738/ref=pd_rvi_gw_2/...

achompas
So this is a common misconception about the text (and Prof. Mohri's NYU class). In this case, "foundations" does not mean this is an introductory course.

Rather, the class and text provide mathematical foundations for understanding the error bounds and growth complexity of various learning algorithms. So you'll be workin with convex optimization, reproducing kernel Hilbert spaces, and Rademacher complexity--definitely not "introductory" in the least!

It's a completely different beast from Mitchell, Bishop, or EoSL (which I'm studying right now!), so I'm not sure comparisons are valid. It also fills a prominent gap in the ideas reviewed by the popular ML texts.

manaskarekar
I have been putting off buying that book because of the price, maybe I should check the library.
exg
The Elements of Statistical Learning, by T. Hastie, R. Tibshirani and J. Friedman [1] is also a very good one. Plus, the book is freely available on the authors' website.

[1] http://www-stat.stanford.edu/~tibs/ElemStatLearn/

Feb 03, 2011 · JamieEi on Machine learning toolkit
Pattern Recognition and Machine Learning, Bishop http://www.amazon.com/Pattern-Recognition-Learning-Informati...
SeanDav
Seems like a good book - bit surprised that it doesn't seem to cover Genetic Algorithms.
None
None
levesque
Genetic Algorithms are rarely covered in (recent) Machine Learning books. They are more often considered as optimization algorithms.
jdj
Note that this is a book for the more math-inclined, one will have trouble if not familiar with the basics of multivariate analysis/linear algebra.
The author is Chris Bishop... who wrote one of the "essential" machine learning books :

http://www.amazon.com/Pattern-Recognition-Learning-Informati...

If you want a fairly easy read without too many equations, try:

Data Mining: Practical Machine Learning Tools and Techniques (Second Edition) http://www.cs.waikato.ac.nz/~ml/weka/book.html

Which goes nicely with the Weka open source ML toolkit http://www.cs.waikato.ac.nz/ml/weka/

(although it is a good read without the toolkit)

If you want a bit more math, I really like the recent (Oct 2007) book:

Pattern Recognition and Machine Learning by Christopher M. Bishop http://www.amazon.com/Pattern-Recognition-Learning-Informati...

It is nicely self contained, going through all the stats you'll need.

mriley
I didn't particularly like the WEKA book, but Bishop's book is excellent.

If you're interested in introductory data mining stuff, I would recommend Tan's Introduction to Data Mining: http://www-users.cs.umn.edu/~kumar/dmbook/index.php

HN Books is an independent project and is not operated by Y Combinator or Amazon.com.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.