Hacker News Comments on
Probabilistic Graphical Models: Principles and Techniques (Adaptive Computation and Machine Learning series)
·
6
HN comments
- This course is unranked · view top recommended courses
Hacker News Stories and Comments
All the comments and stories posted to Hacker News that reference this book.Didn't read the document, but hopefully it mentions PageRank, the prime example of using probabilistic graphical models to rank nodes in a directed graph. More info: https://www.amazon.com/Probabilistic-Graphical-Models-Princi...I've heard that Google and Baidu essentially started at the same time, with the same algorithm discovery (PageRank). Maybe someone can comment on if there was idea sharing or if both teams derived it independently.
⬐ ppsreejithFrom the wikipedia page of Robin Li, co-founder of Baidu: https://en.wikipedia.org/wiki/Robin_Li#RankDex> In 1996, while at IDD, Li created the Rankdex site-scoring algorithm for search engine page ranking, which was awarded a U.S. patent. It was the first search engine that used hyperlinks to measure the quality of websites it was indexing, predating the very similar algorithm patent filed by Google two years later in 1998.
⬐ nabla9Page Rank is just application of eigenvalues into ranking.The idea came first up in the 70's. https://www.sciencedirect.com/science/article/abs/pii/030645... and several times afterward before PageRank was developed.
⬐ screyeGiven that Pagerank was literally invented and named after Larry Page, I would think that Google had a head start.That being said, Page Rank is a more a stellar example of adapting an academic idea into practice, than a statistical idea in and of itself.
Afterall, it is 'merely' the stationary distribution for a random walk over an undirected graph. I say 'merely' with a lot of respect, because the best ideas often feel simple in hindsight. But, it is that simplicity that makes them even more impressive.
⬐ alanbernstein⬐ mianos> Given that Pagerank was literally invented and named after Larry Page, I would think that Google had a head start.I don't think this means much. The history of science and technology is full of examples of results named after someone other than the first person to find them.
https://en.wikipedia.org/wiki/List_of_examples_of_Stigler%27....
In fact, based on the other comments in this thread, it seems that Pagerank being named after Larry Page is itself one of these examples.
The sort of methods 'PageRank' uses already existed. It reminds of Apple `inventing` (air quotes) the mp3 player. It didn't, it applied existing technology, refined it and publicized it. They did not invent it but maybe 'inventing' something is only a very small part of making something useful for many people.⬐ oneoff786The basic concept behind page rank is pretty obvious. If you stare at a graph for a while, it’ll probably be your big idea if you try to imagine centrality calculations.Implementing it and catching edge cases isn’t trivial
⬐ andi999I heard that the first approach of google was using/adapting an published algorithm which was used to rank scientific publications from the network of citations. Not sure if this is the algorithm you mentioned though.⬐ divbzero⬐ mach1neThe ranking of scientific publications based on citations you’re describing is impact factor [1]. I haven’t heard that as an inspiration for Larry Page’s PageRank [2] but that is plausible.⬐ andi999Impact factor ranks Journals, not Authors or papers. I googled the original paper: https://scholar.google.de/scholar?hl=de&as_sdt=0%2C5&q=the+p...(I do not want to link directly to the pdf shown in the search result). Section 2.1 deals with related work: "There has been a great deal of work on academic citation analysis [Gar95]. Go man [Gof71] has published an interesting theory of how information flow in a scienti c community is an epidemic process......" (and more)
I think that paper is worth a read.
Didn't Larry Page and Sergey Brin openly publicize the PageRank algorithm? It'd seem more likely that Baidu just copypasted the idea.⬐ swyx⬐ bjourneBaidu's patent predated theirs by 2 years https://news.ycombinator.com/item?id=30419414PageRank actually had a predecessor called HITS (according to some sources HITS were developed before PageRank, according to others they were contemporaries), an algorithm developed by Jon Kleinberg for ranking hypertext documents. https://en.wikipedia.org/wiki/HITS_algorithm However, Kleinberg stayed in academia and never attempted to commercialize his research like Page and Brin did. HITS was more complex than PageRank and context-sensitive so queries required much more computing resources than PageRank. PageRank is kind of what you get if you take HITS and remove the slow parts.What I find very interesting about PageRank is how you can trade accuracy for performance. The traditional way of calculating PageRank by means of squaring a matrix iteratively until it reaches convergence gives you correct results but is sloooooow. For a modestly sized graph it could take days. But if accuracy isn't that important you can use Monte Carlo simulation and get most of the PageRank correct in a fraction of the time of the iterative method. It's also easy to parallelize.
⬐ jll29Page's PageRank patent references HITS:Jon M. Kleinberg, "Authoritative sources in a hyperlinked environment," 1998, Proc. Of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 668-677.
> study textbooks. Do exercises. Treat it like academic studyingThis. Highly recommend Russel & Norvig [1] for high-level intuition and motivation. Then Bishop's "Pattern Recognition and Machine Learning" [2] and Koller's PGM book [3] for the fundamentals.
Avoid MOOCs, but there are useful lecture videos, e.g. Hugo Larochelle on belief propagation [4].
FWIW this is coming from a mechanical engineer by training, but self-taught programmer and AI researcher. I've been working in industry as an AI research engineer for ~6 years.
[1] https://www.amazon.com/Artificial-Intelligence-Modern-Approa...
[2] https://www.amazon.com/Pattern-Recognition-Learning-Informat...
[3] https://www.amazon.com/Probabilistic-Graphical-Models-Princi...
⬐ jimmy-deanOof those are all dense reads for a new comer... For a first dip into the waters I usually suggest Introduction to Statistical Learning. Then from there move into PRML or ESL. Were you first introduced to core ML through Bishop? +1 for a solid reading list.⬐ sampoPGMs were in fashion in 2012, but by 2014 when Deep Learning had become all the rage, I think PGMs almost disappeared from the picture. Do people even remember PGMs exist now in 2019?⬐ srean⬐ godelmachineFashion is relevant only if you want to approach it as a fashion industry.⬐ vazambPGMs also provide the intuition behind GANs and variational autoencoders.⬐ KidComputerYou'll find plate models, PGM junk, etc in modern papers on explicit density generative models and factorizing latents on such models.Hands up for Bishop and Russel Norvig.Russel Norvig should be treated as a subtle intro to AI.
The start Bishop to understand concepts.
⬐ magoghmI would also include some books about statistics. Two excellent introductory books are:Statistical Rethinking https://www.amazon.com/Statistical-Rethinking-Bayesian-Examp...
An Introduction to Statistical Learning http://www-bcf.usc.edu/~gareth/ISL/
I used "Probabilistic Graphical Models" By Koller/Friedman[0] https://www.amazon.com/Probabilistic-Graphical-Models-Princi...
In retrospect, my other comment was stupidly obtuse. Both too technical (in the sense of specificity) and too unstructured (in the sense of presentation order). A more appropriate path from CS might be analogous (well, inverse if anything) to the path Robert Goldblatt has taken. It dips into nonstandard analysis, but not totally without reason. Some subset of the following, with nLab and Wikipedia supplementing as necessary:0. Milewski's "Category Theory for Programmers"[0]
1. Goldblatt's "Topoi"[1]
2. McLarty's "The Uses and Abuses of the History of Topos Theory"[2] (this does not require [1], it just undoes some historical assumptions made in [1] and, like everything else by McLarty, is extraordinarily well-written)
3. Goldblatt's "Lectures on the Hyperreals"[3]
4. Nelson's "Radically Elementary Probability Theory"[4]
5. Tao's "Ultraproducts as a Bridge Between Discrete and Continuous Analysis"[5]
6. Some canonical machine learning text, like Murphy[6] or Bishop[7]
7. Koller/Friedman's "Probabilistic Graphical Models"[8]
8. Lawvere's "Taking Categories Seriously"[9]
From there you should see a variety of paths for mapping (things:Uncertainty) <-> (things:Structure). The Giry monad is just one of them, and would probably be understandable after reading Barr/Wells' "Toposes, Triples and Theories"[10].
The above list also assumes some comfort with integration. Particularly good books in line with this pedagogical path might be:
9. Any and all canonical intros to real analysis
10. Malliavin's "Integration and Probability"[11]
11. Segal/Kunze's "Integrals and Operators"[12]
Similarly, some normative focus on probability would be useful:
12. Jaynes' "Probability Theory"[13]
13. Pearl's "Causality"[14]
---
[0] https://bartoszmilewski.com/2014/10/28/category-theory-for-p...
[1] https://www.amazon.com/Topoi-Categorial-Analysis-Logic-Mathe...
[2] http://www.cwru.edu/artsci/phil/UsesandAbuses%20HistoryTopos...
[3] https://www.amazon.com/Lectures-Hyperreals-Introduction-Nons...
[4] https://web.math.princeton.edu/%7Enelson/books/rept.pdf
[5] https://www.youtube.com/watch?v=IS9fsr3yGLE
[6] https://www.amazon.com/Machine-Learning-Probabilistic-Perspe...
[7] https://www.amazon.com/Pattern-Recognition-Learning-Informat...
[8] https://www.amazon.com/Probabilistic-Graphical-Models-Princi...
[9] http://www.emis.de/journals/TAC/reprints/articles/8/tr8.pdf
[10] http://www.tac.mta.ca/tac/reprints/articles/12/tr12.pdf
[11] https://www.springer.com/us/book/9780387944098
[12] https://www.amazon.com/Integrals-Operators-Grundlehren-mathe...
[13] http://www.med.mcgill.ca/epidemiology/hanley/bios601/Gaussia...
[14] https://www.amazon.com/Causality-Reasoning-Inference-Judea-P...
Some good books on Machine Learning:Machine Learning: The Art and Science of Algorithms that Make Sense of Data (Flach): http://www.amazon.com/Machine-Learning-Science-Algorithms-Se...
Machine Learning: A Probabilistic Perspective (Murphy): http://www.amazon.com/Machine-Learning-Probabilistic-Perspec...
Pattern Recognition and Machine Learning (Bishop): http://www.amazon.com/Pattern-Recognition-Learning-Informati...
There are some great resources/books for Bayesian statistics and graphical models. I've listed them in (approximate) order of increasing difficulty/mathematical complexity:
Think Bayes (Downey): http://www.amazon.com/Think-Bayes-Allen-B-Downey/dp/14493707...
Bayesian Methods for Hackers (Davidson-Pilon et al): https://github.com/CamDavidsonPilon/Probabilistic-Programmin...
Doing Bayesian Data Analysis (Kruschke), aka "the puppy book": http://www.amazon.com/Doing-Bayesian-Data-Analysis-Second/dp...
Bayesian Data Analysis (Gellman): http://www.amazon.com/Bayesian-Analysis-Chapman-Statistical-...
Bayesian Reasoning and Machine Learning (Barber): http://www.amazon.com/Bayesian-Reasoning-Machine-Learning-Ba...
Probabilistic Graphical Models (Koller et al): https://www.coursera.org/course/pgm http://www.amazon.com/Probabilistic-Graphical-Models-Princip...
If you want a more mathematical/statistical take on Machine Learning, then the two books by Hastie/Tibshirani et al are definitely worth a read (plus, they're free to download from the authors' websites!):
Introduction to Statistical Learning: http://www-bcf.usc.edu/~gareth/ISL/
The Elements of Statistical Learning: http://statweb.stanford.edu/~tibs/ElemStatLearn/
Obviously there is the whole field of "deep learning" as well! A good place to start is with: http://deeplearning.net/
⬐ yedhukrishnanThose are really useful. Thank you. Books are pricey though!⬐ shogunmike⬐ alexcasalboniI know...some of them are indeed expensive!At least the latter two ("ISL" and "ESL") are free to download though.
Those are great resources!In case you are interested in MLaaS (Machine Learning as a Service), you can check these as well:
Amazon Machine Learning: http://aws.amazon.com/machine-learning/ (my review here: http://cloudacademy.com/blog/aws-machine-learning/)
Azure Machine Learning: http://azure.microsoft.com/en-us/services/machine-learning/ (my review here: http://cloudacademy.com/blog/azure-machine-learning/)
Google Prediction API: https://cloud.google.com/prediction/
BigML: https://bigml.com/
Prediction.io: https://prediction.io/
OpenML: http://openml.org/
⬐ yedhukrishnanI went through the links and your review. They are really good. Thanks!
Self taught programmer here. I would like to think I have at least a bit of knowledge about each of these topics (though definitely not an expertise in most) I would say the one I know the least about is Machine learning, but i'm actively working on changing that :) I just bought this book (and enjoying it!) http://www.amazon.com/gp/product/0262013193