HN Books @HNBooksMonth

The best books of Hacker News.

Hacker News Comments on
The Grammar of Graphics (Statistics and Computing)

Leland Wilkinson, D. Wills, D. Rope, A. Norton, R. Dubbs · 8 HN comments
HN Books has aggregated all Hacker News stories and comments that mention "The Grammar of Graphics (Statistics and Computing)" by Leland Wilkinson, D. Wills, D. Rope, A. Norton, R. Dubbs.
View on Amazon [↗]
HN Books may receive an affiliate commission when you make purchases on sites after clicking through links on this page.
Amazon Summary
Presents a unique foundation for producing almost every quantitative graphic found in scientific journals, newspapers, statistical packages, and data visualization systems The new edition features six new chapters and has undergone substantial revision. The first edition has sold more than 2200 copies. Four color throughout.
HN Books Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this book.
My interest in the different paradigms and approaches to data visualization stems from some problems I'm working on having a lot of parallels with how plotting library APIs work.

Personally, my main encounter with plotting was in Python. I'm not a big fan of matplotlib, I got the impression that with increasing plot complexity, code complexity grew exponentially. Then there's bokeh [0], which I preferred to matplotlib, due to it being more declarative. HoloViews [1] is more declarative than both matplotlib and bokeh, and boasts that "usually [you can] express what you want to do in very few lines of code, letting you focus on what you are trying to explore and convey, not on the process of plotting". I've not used HoloViews yet.

Then I've heard of R's ggplot [2], which is based on (or inspired by?) The Grammar of Graphics [3]. This books is definitely something I want to check out.

Vega [4], an “assembly language” for visualization, is neither here nor there as far as this discussion goes, but nonetheless I just stumbled upon it and I'm quite optimistic about the initiative. Maybe someone will not have heard of it.

[0] https://bokeh.pydata.org/en/latest/docs/user_guide/concepts.... [1] http://holoviews.org/ [2] http://r4ds.had.co.nz/data-visualisation.htm [3] https://www.amazon.com/Grammar-Graphics-Statistics-Computing... [4] https://vega.github.io/vega/about/

ggplot2 was inspired by: https://www.amazon.com/Grammar-Graphics-Statistics-Computing...

and Hadley Wickham wrote about it in http://vita.had.co.nz/papers/layered-grammar.pdf.

I'm no expert, but I think that one of the main ideas is to separate the elements of making a plot from the way that the data is presented. For example, in ggplot2, you have the data that will go into the graph, the type of plot (or "geometry") that defines how the data are presented (scatterplot, bar plot, etc.), and then various "layers" that can be added that affect style.

In order to split a plot into subplots, you simply define how it is to be faceted (what column should be used to define groups). Grammar-of-graphics moves plotting away from the "turtle graphics" model and lets you specify what should be done. Then ggplot figures out how to do it, kind of like SQL vs. writing for loops to retrieve information.

Waterluvian
Aha! Thank you. So it's kind of like a declarative way of plotting.
I'll give you a couple. Note that some of these are rehashes of my earlier comments.

# Elements of Programming

https://www.amazon.com/Elements-Programming-Alexander-Stepan...

This book proposes how to write C++-ish code in a mathematical way that makes all your code terse. In this talk, Sean Parent, at that time working on Adobe Photoshop, estimated that the PS codebase could be reduced from 3,000,000 LOC to 30,000 LOC (=100x!!) if they followed ideas from the book https://www.youtube.com/watch?v=4moyKUHApq4&t=39m30s

Another point of his is that the explosion of written code we are seeing isn't sustainable and that so much of this code is algorithms or data structures with overlapping functionalities. As the codebases grow, and these functionalities diverge even further, pulling the reigns in on the chaos becomes gradually impossible.

Bjarne Stroustrup (aka the C++ OG) gave this book five stars on Amazon (in what is his one and only Amazon product review lol).

This style might become dominant because it's only really possible in modern successors of C++ such as Swift or Rust, not so much in C++ itself.

https://smile.amazon.com/review/R1MG7U1LR7FK6/

# Grammar of graphics

https://www.amazon.com/Grammar-Graphics-Statistics-Computing...

This book changed my perception of creativity, aesthetics and mathematics and their relationships. Fundamentally, the book provides all the diverse tools to give you confidence that your graphics are mathematically sound and visually pleasing. After reading this, Tufte just doesn't cut it anymore. It's such a weird book because it talks about topics as disparate Bayesian rule, OOP, color theory, SQL, chaotic models of time (lolwut), style-sheet language design and a bjillion other topics but always somehow all of these are very relevant. It's like if Bret Victor was a book, a tour de force of polymathical insanity.

The book is in full color and it has some of the nicest looking and most instructive graphics I've ever seen even for things that I understand, such as Central Limit Theorem. It makes sense the the best graphics would be in the book written by the guy who wrote a book on how to do visualizations mathematically. The book is also interesting if you are doing any sort of UI interfaces, because UI interfaces are definitely just a subset of graphical visualizations.

# Scala for Machine Learning

https://www.amazon.com/Scala-Machine-Learning-Patrick-Nicola...

This book almost never gets mentioned but it's a superb intro to machine learning if you dig types, scalable back-ends or JVM.

It’s the only ML book that I’ve seen that contains the word monad so if you sometimes get a hankering for some monading (esp. in the context of ML pipelines), look no further.

Discusses setup of actual large scale ML pipelines using modern concurrency primitives such as actors using the Akka framework.

# Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques for Building Intelligent Systems

https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-T...

Not released yet but I've been reading the drafts and it's a nice intro to machine learning using modern ML frameworks, TensorFlow and Scikit-Learn.

# Basic Category Theory for Computer Scientists

https://www.amazon.com/gp/product/0262660717/ref=as_li_ss_tl...

Not done with the book but despite it's age, hands down best intro to category theory if you care about it only for CS purposes as it tries to show how to apply the concepts. Very concise (~70 pages).

# Markov Logic: An Interface Layer for Artificial Intelligence

https://www.amazon.com/Markov-Logic-Interface-Artificial-Int...

Have you ever wondered what's the relationship between machine learning and logic? If so look no further.

# Machine Learning: A Probabilistic Perspective (Adaptive Computation and Machine Learning series)

https://www.amazon.com/gp/product/0262018020/ref=as_li_ss_tl...

Exhaustive overview of the entire field of machine learning. It's engaging and full of graphics.

# Deep Learning

https://www.amazon.com/gp/product/0262035618/ref=as_li_ss_tl...

http://www.deeplearningbook.org/

You probably have heard about this whole "deep learning" meme. This book is a pretty self-contained intro into the state of the art of deep learning.

# Designing for Scalability with Erlang/OTP: Implement Robust, Fault-Tolerant Systems

https://www.amazon.com/Designing-Scalability-Erlang-OTP-Faul...

Even though this is an Erlang book (I don't really know Erlang), 1/3 of the book is devoted to designing scalable and robust distributed systems in a general setting which I found the book worth it on it's own.

# Practical Foundations for Programming Languages

https://www.amazon.com/gp/product/1107150302/ref=as_li_ss_tl...

Not much to say, probably THE book on programming language theory.

# A First Course in Network Theory

https://www.amazon.com/First-Course-Network-Theory/dp/019872...

Up until recently I didn't know the difference between graphs and networks. But look at me now, I still don't but at least I have a book on it.

bad_user
Amazon links with your affiliate tag, seriously?
None
None
adamnemecek
what about them?
kranner
I see nothing wrong with GP providing their affiliate tag.

They are referring customers to Amazon, and customers don't pay extra.

bad_user
As an ex-Amazon Affiliate myself, I disagree because the incentive to post those links is not aligned with the reader's expectations.

Do you enjoy viewing commercials and product placements without the proper disclaimer? Because this is exactly what this is. I surely don't appreciate hidden advertising, not because of the quality of the advertised products, but because I cannot trust such recommendations, as a salesman can say anything in order to sell his shit.

Notice how this is the biggest list of recommendations in this thread. Do you think that's because the author is very knowledgeable or is it because he has an incentive to post links?

adamnemecek
> As an ex-Amazon Affiliate myself, I disagree because the incentive to post those links is not aligned with the reader's expectations.

Please don't project your behavior onto others. I take book recommendations seriously. I actually really enjoy it, people have told me IRL that my recommendations helped them a lot.

> Notice how this is the biggest list of recommendations in this thread.

They are all books that I've read in the last ~4 monthish (not all in entirety). Just FYI I'm not sure how much money you think I'm making off this but for me it's mostly about the stats, I'm curious what people are interested in.

> Do you think that's because the author is very knowledgeable

I'm more than willing to discuss my knowledgeability.

> or is it because he has an incentive to post links?

It's the biggest list because due to circumstances I have the luxury of being able to read a ton. I own all the books on the list, I've read all of them and I stand by all of them and some of these are really hidden gems that more people need to know about. I've written some of the reviews before. Just FYI I've posted extensive non-affiliate amazon links before and I started doing affiliate only very recently.

Furthermore, HN repeatedly upvotes blog posts that contain affiliate links. Why is that any different?

hackermailman
Practical Foundations for Programming Languages by Bob Harper is really good, plus there's a free draft of the second version on the author's site http://www.cs.cmu.edu/~rwh/pfpl.html

I always go to the book author's page first not only to get the errata but also discover things such as free lectures as in the case with Skeina's Algorithm Design Book

> > And it's really hard to beat ggplot.

> To be honest, matplotlib seems a good contender to me (http://matplotlib.org/).

They're quite different, though, and I can see why many prefer ggplot. It's a declarative, domain-specific language that implements a Tufte-inspired "grammar of graphics" (hence the gg- in the name; see section 1.3 of [1], and [2,3]) for very fast and convenient interactive plotting, whereas matplotlib is just a clone of MATLIB's procedural plotting API.

[1] http://www.amazon.com/ggplot2-Elegant-Graphics-Data-Analysis...

[2] http://www.amazon.com/The-Grammar-Graphics-Statistics-Comput...

[3] http://vita.had.co.nz/papers/layered-grammar.html

If you're looking for a more quantitative version of Tufte, I highly recommend Cleveland (http://www.amazon.com/Visualizing-Data-William-S-Cleveland/d...) and Wilkinson (http://www.amazon.com/The-Grammar-Graphics-Statistics-Comput...)
kylebgorman
The difference between Cleveland and Tufte is that the former is "empirical" (he discusses lots of results from psychological studies on comprehension of graphics) rather than one man's opinion. Also, Cleveland actually has accomplished something other than scolding and praising.

Personally, I liked Cleveland's "Elements of Graphing Data" (http://www.amazon.com/Elements-Graphing-Data-William-Clevela...) better than "Visualizing Data", though they overlap a lot.

Yes, I would recommend "The Grammar of Graphics" for serious attempts to express visualizations: http://www.amazon.com/gp/aw/d/0387245448
jjoonathan
But that's just it -- the various existing implementations of the Grammar of Graphics have solid, proven implementations, tutorials, documentation, and Q/A collections on Stack Overflow. From my admittedly brief impressions of Vega, the procedural implementations also seem to be more concise. Tack on the benefits of a procedural REPL when you're doing exploratory data analysis or nontrivial manipulations and it becomes very hard to justify the switch.

Like most everyone here I think Vega would be much better suited to the role of an intermediate target format for higher-level tools.

I took Edward Tufte's course: http://www.edwardtufte.com/tufte/ That's interesting but most valuable is probably his books.

I was planning to read Interactive Data Visualization for the Web http://ofps.oreilly.com/titles/9781449339739/index.html I think that's mostly an introduction to the tool (D3) but maybe has some info on visualization itself.

Also, this tutorial was an easy intro to D3: http://code.hazzens.com/d3tut/ Hopefully, the author continues to add to it.

Mike Bostock (author of D3) has some interesting blogs at: http://bost.ocks.org and also some great visualizations at http://bl.ocks.org/mbostock

A PHD in my office recommended The Grammar Of Graphics http://www.amazon.com/The-Grammar-Graphics-Statistics-Comput... It seems much more technical and statistical but could be an interesting set of ideas to synthesize. I'm planning on borrowing his copy.

Mike Bostock's stuff also led me to this page on Hive Plots which is a pretty cool description of how to do network visualization better: http://www.hiveplot.com

And I like Flowing Data as a blog: http://flowingdata.com/

This is all of course prior to actually having studied it. I'm hoping using these resources and playing with some toy projects will lead me down further paths. (Assuming I have time which I, of course, probably won't.)

There is this as well http://www.amazon.com/Enterprise-Dashboards-Design-Best-Prac...

The The Grammar of Graphics (Statistics and Computing) by Leland Wilkinson is also great, though more focused on building graphing systems. I believe it is the inspiration for a lot of d3.js http://www.amazon.com/The-Grammar-Graphics-Statistics-Comput...

HN Books is an independent project and is not operated by Y Combinator or Amazon.com.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.