HN Books @HNBooksMonth

The best books of Hacker News.

Hacker News Comments on
Purely Functional Data Structures

Chris Okasaki · 2 HN points · 24 HN comments
HN Books has aggregated all Hacker News stories and comments that mention "Purely Functional Data Structures" by Chris Okasaki.
View on Amazon [↗]
HN Books may receive an affiliate commission when you make purchases on sites after clicking through links on this page.
Amazon Summary
Most books on data structures assume an imperative language such as C or C++. However, data structures for these languages do not always translate well to functional languages such as Standard ML, Haskell, or Scheme. This book describes data structures from the point of view of functional languages, with examples, and presents design techniques that allow programmers to develop their own functional data structures. The author includes both classical data structures, such as red-black trees and binomial queues, and a host of new data structures developed exclusively for functional languages. All source code is given in Standard ML and Haskell, and most of the programs are easily adaptable to other functional languages. This handy reference for professional programmers working with functional languages can also be used as a tutorial or for self-study.
HN Books Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this book.
You should not approach problems in functional languages with imperative techniques. It's fitting a square peg in a round hole. The approaches & data structures aren't always the same.

In my experience with FP you start with more of a top down approach in contrast to the bottom up approach discussed in the blog post.

There are many good resources on this. I've referenced a few below. Of course there are many more.

[1] https://www.amazon.com/Pearls-Functional-Algorithm-Design-Ri...

[2] https://www.amazon.com/Purely-Functional-Data-Structures-Oka...

[3] https://www.amazon.com/Algorithm-Design-Haskell-Richard-Bird...

[4] https://www.amazon.com/Structure-Interpretation-Computer-Pro...

rramadass
I think anybody who wants to understand Functional Programming should start with Lambda Calculus.

I highly recommend An Introduction to Functional Programming Through Lambda Calculus by Greg Michaelson.

You can then see how to express Functional Techniques in your language of choice (i.e. one you already know well eg; Functional Programming in C++ by Ivan Cukic) before learning a new Functional Language. This is to avoid the confusion in syntax/semantic expression of unfamiliar concepts which is a bit too much for a beginning student to handle simultaneously.

exdsq
Thanks for these links, commenting to find them later.
Jtsummers
If you select the timestamp of a comment (also when you reply to it) you can favorite it. This is much cleaner than leaving scattered "to find them later" comments around this site (in fact, I'm pretty sure it was added precisely to avoid those kinds of comments). You can also favorite submissions themselves. Favorites can be found in your profile.
exdsq
Cool, didn’t know that! Cheers
bckr
you can click the timestamp and just keep the URL somewhere, like your browser's bookmarks
dehrmann
> You should not approach problems in functional languages with imperative techniques. It's fitting a square peg in a round hole.

This might be part of the problem with FP. Every popular CPU architecture is imperative.

chriswarbo
I find this unconvincing. Electronic circuits are well described as a Cartesian Closed Category; and hence by lambda calculus (i.e. we can give semantics to functional programs by interpreting them as descriptions of circuits; as opposed to the "usual" interpretations of x86_64 machine code, or symbolic values, etc.)

Imperative systems are "fitting a square peg in a round hole" by introducing hacks like 'clock signals', 'pipeline flushes', etc.

hither_shores
The category with linear circuits as its morphisms, labelled sets of electronic terminals as its objects, and laying circuits in parallel as the monoidal product (which is the usual notion of the category of circuits; perhaps you have something more sophisticated in mind, but if so you should specify that) is not cartesian closed.

One of the many ways to demonstrate this: all linear circuits have duals (voltage <-> current, series <-> parallel, and so on), but in a cartesian closed category only the terminal object is dualizable.

What you're probably thinking of is the fact that this category is compact closed. The corresponding internal language is going to be some sort of process calculus (I think the pi calculus is too strong, but not sure), not the lambda calculus.

imtringued
No, that's the easy part. Computation with FP is very easy. The problems start the moment you have to do persistence and communication. Those aren't part of any pure functional paradigm.
TuringTest
> Every popular CPU architecture is imperative.

That's arguable, given that every (deterministic) imperative computation can be expressed as a pure functional program. "Imperative" is more of a viewpoint that the developer is using to understand the behavior of the computer, rather than a property of the machine itself.

A pure functional analysis will describe a computation on that "imperative" architecture as an inmutable, monotonic sequence of all the states that the CPU goes through over time; or maybe, if you know enough PL theory, as a chain of rewritings of a program-long expression that is updated, one term at a time, when each expression is replaced by the result of a function as it consumes its input parameters.

And no viewpoint is inherently better than the other (except that the functional version can be analyzed in a natural way with mathematical tools, which is often difficult with the imperative description).

whimsicalism
Not really arguable, IMO.

Just because both lambda calculus and assembly are Turing complete does not mean that CPU architecture isn't clearly imperative.

TuringTest
How do you define "imperative" in a way that precludes a functional description of your definition? I'd say that if you're able to do that, you'd have found a contradiction in Turing completeness and win you a Turing award.

Also, this: https://news.ycombinator.com/item?id=30883863

whimsicalism
Just because every program in C is also expressable in Haskell does not mean that C is Haskell or that C is a functional programming language.

The same is true of CPUs, which at their base level execute instructions line-by-line to move and load things in and out of stateful registers as well as doing simple mathematics.

Not going to keep arguing, as I can see from your other comments that you are going to try to hold this line, and also your reasoning doesn't really make sense.

Everyone has heard of Turing completeness. It does not imply that all distinctions between the semantics of languages and the structure of hardware thus are collapsed. It means that you can write an equivalent program using different semantics or different hardware.

TuringTest
> It does not imply that all distinctions between the semantics of languages and the structure of hardware thus are collapsed. It means that you can write an equivalent program using different semantics or different hardware.

No, it means that every computable program or device can be described through functional semantics, and therefore that very same program running on that same hardware can be described mathematically with them (not an 'equivalent' program, but the same program with the same runtime behaviour, just described in low level with a different format).

That's a core principle of computer science that predates computers themselves, and people arguing about the benefits of the imperative style tend to be unaware of this fact, making their arguments look weak.

dahfizz
You're definitely stretching here. I could just as easily argue that "all of main memory" is the output of any function. Therefore every function is a pure function, and C is a function language. Using your definition, the concept of functional programming is meaningless.

Main memory is state. Registers and cache are state. The entire purpose of the CPU is to mutate state.

nightski
No, I don't believe you could argue C is purely functional by framing memory as the output of the function because C doesn't treat it as such. Memory in C can be changed even if it is not listed in the inputs or outputs of the function.

He's not really stretching the definition, that's just the lens through which FP looks at the world.

Of course there is state. FP doesn't prohibit state, it just makes it explicit and referentially transparent. But this is in the semantics, not the execution model of the program.

dahfizz
> Memory in C can be changed even if it is not listed in the inputs or outputs of the function.

If we are going to nitpick syntax, nothing about `mov eax ebx` lists main memory as output. In fact, mov doesn't even have an 'output'.

If you want to model mov as a function with the implicit output of "all of main memory", then you can do the same with any arbitrary C function. If you can't make that leap for C, then you can't make that leap for assembly.

nightski
It's not syntax, it is semantics. There is a lot to programming language theory/design and this is all formalized. C does not have the semantics you are describing in the formal specifications (C99, etc..). Could they have developed a version of C with these semantics? Possibly. Rust was a step in that direction with it's borrow checker.
dahfizz
Yes, and my whole point is that assembly does not have those semantics either. The computer is an imperative machine.
TuringTest
The computer is not an imperative machine. It's a physical machine that you describe with an imperative representation in your mind. Can you tell the difference, or are you unaware of the map-territory distinction going on?

Functional programmers are simply choosing to use a different style of map, but we both represent the same reality.

dehrmann
According to Wikipedia, "the hardware implementation of almost all computers is imperative."

https://en.wikipedia.org/wiki/Imperative_programming

TuringTest
[Citation needed]
kortex
Nope, unless you do some serious contortions and basically say that the universe is a lambda calculus that operates on each moment in time to produce a new view of reality.

Computers that do anything useful have all manner of I/O. Highly oversimplified in modern architecture, but you can think of I/O as a bunch of memory cells that are DMA'd by external forces. Literally the exact opposite of functional, IO registers are the epitome of global mutable state. Then you have interrupts and the CPU operates in response to these events.

The viewpoint of "computers are mutable state" is inherently better because of parsimony - it requires less degrees of indirection to explain what is happening. Just because you could shoehorn reality into a physical model, does not make it equally valid. You basically need to break it down to the Schrodinger equation level to have a purely functional model when even idealized gates are involved because propagation delay impacts things like how fast you can DMA.

TuringTest
> Just because you could shoehorn reality into a physical model, does not make it equally valid.

Hey, physicists do it all the time. Having mathematical representations of how things work in the real world often turn out to give you complex devices that you wouldn't get by intuition alone. Thanks to doing it, we have Special Relativity and Schrodinger equations, that bring you advantages as GPS and insanely-high-speed networks.

There's nothing wrong with modelling interrupts and mutable state as static mathematical entities a.k.a. functions; and if you think it is, you don't know enough about functional programming - in fact, doing it allows for advanced tactics like automatic verification and very high scale generation of electronic devices.

Those less degrees of indirection may make it simple to build some kinds of programs; but they also introduce complexity in modelling the possible states of the system when you allow for arbitrary, unchecked side effects. So no, that representation is not inherently better - only most suited for some tasks than others; but the converse is also true for functional models of computation. If you think it is always better, it's because you're not taking all possible scenarios and utilities into account.

(Besides, as I already said elsewhere, "computers are mutable state" is not incompatible with functional representation; as mutable state can be represented with it). It seems that you're confusing mutable, wich is a property of computational devices, with 'imperative' which is a style of writing source code and defining relations between statements in your written program.

> Literally the exact opposite of functional, IO registers are the epitome of global mutable state.

Yup, you definitively don't know what functional programming is if you think you can't have global mutable state in it. In fact, every top-level Haskell program is defined as a global mutable IO monad...

nightski
I was more talking about how you approach solving problems, not the execution model. You can still get tight imperative loops in FP, it's just approached differently than in a traditionally imperative language.

Everything is a tradeoff. FP gives you a lot of flexibility in how you trade space and time of the execution of your program. It's also not a magic wand, just a really powerful tool.

https://www.amazon.com/Purely-Functional-Data-Structures-Oka...
galaxyLogic
I see. So, let's say I have a tree with arbitrary number of nodes branching into further nodes and so on. (My 'state')

Can I now make a single call to some API-function that gives me an almost identical (immutable) tree as a result except that some leaf somewhere has a modified value, or where some node has been deleted?

And, would this internally happen without much copying?

dwohnitmok
The answer to all your questions is all "yes" technically, but the way you've worded them makes me think just saying "yes" will be a bit misleading to you. So let's take a real-world example. I'll use pseudo-Scala syntax since that's fairly similar to a lot of imperative languages.

Let's construct a tree data structure from scratch and see how to update it.

  abstract class BinaryTreeOfInts
  final class Leaf(final value: Int) extends BinaryTreeOfInts
  final class Node(final value: Int, final left: BinaryTreeOfInts, final right: BinaryTreeOfInts) extends BinaryTreeOfInts
All fields here are by reference.

So because everything is final here, we've constructed an immutable tree. You cannot update any of these values once it's been constructed.

Let's see how you might construct the following binary tree

    5
   / \
  6   1
     / \
    2   3

  myTree = Node(
    5,
    Leaf(6)
    Node(
      1,
      Leaf(2),
      Leaf(3)
  )
I want to modify 5 now to be 10

   newTree = Node(10, myTree.left, myTree.right)
Done. This is effectively one operation. newTree and myTree are both still immutable, but there's a ton of sharing going on. Now this is the best case scenario. Modifying a leaf is the worst case scenario, but even there only requires as many operations as there are levels in your tree, which, as long as you have a fairly balanced tree, only grows logarithmically in the number of elements you have.

To illustrate let's make another change, this time changing 3 to 30.

  anotherTree = Node(
    newTree.value,
    newTree.left,
    Node(
      newTree.right.value,
      newTree.right.left,
      Leaf(30)
    )
  )
Note I've had to instantiate a new Node for each level of the tree, and I've made a new Leaf, but otherwise everything else is shared. Now this is the low-level way of performing these changes. There are various APIs and tricks to make this look a lot cleaner in code so writing this out isn't as tedious, but I figured I'd present the low-level way to make things less magical.

BTW, the logarithmic time factor to update a leaf is why you will often see logarithmic factors in time complexities of various operations on immutable data structures.

thaumasiotes
> BTW, the logarithmic time factor to update a leaf is why you will often see logarithmic factors in time complexities of various operations on immutable data structures.

A mutable tree requires the same logarithmic time factor to update a leaf, because it also requires logarithmic time to access the leaf. It seems like the real difference between the mutable and the immutable structure is that the mutable structure requires O(1) space to do the update, while the immutable one requires O(log n) space in addition to the common requirement of O(log n) time.

dwohnitmok
Sort of. Structural sharing generally actually results in amortized constant space operations if you don't need the copies. In this case, if you aren't actually using the other tree, the nodes could be GC-ed. In the most ideal case they could be GC-ed as you're constructing the new nodes, though I imagine this essentially never happens (hence amortized). But regardless you're right that the thing to focus on is space rather than time when looking at how it would be different in an immutable vs mutable context.

Also my comment was incomplete. What I should've added is that most immutable data structures are trees under the hood in some form or another, hence many operations have a logarithmic factor. Mutable data structures provide more options when you want constant time access.

A great resource for understanding how persistent data structures are implemented is Okasaki’s Purely Functional Data Structures: https://www.amazon.com/Purely-Functional-Data-Structures-Oka...

Also available as a free PDF: https://www.cs.cmu.edu/~rwh/theses/okasaki.pdf

> A data structure cannot be functional.

I think the author is using the term "purely functional data structure" to mean a data structure that lends itself to an implementation in a purely functional language [1,2].

[1] https://en.wikipedia.org/wiki/Purely_functional_data_structu...

[2] https://www.amazon.com/Purely-Functional-Structures-Chris-Ok...

erikb
I feel like I already said "I understand but I disagree".

A round wheel is more useful for a car than a triangular wheel. That doesn't mean it's a "car wheel". It's just as good on a horse wagon or a bike.

davidcuddeback
You can disagree if you want. I just don't want other readers to be misled into thinking that "purely functional data structure" isn't a term of art. Given the number of references to Okasaki's book in this thread, I feel like the interested reader is sufficiently armed to learn more that I don't feel the need to continue this discussion.

You may not find the definition useful, and that's alright. I won't try to convince you.

Yup. Purely Functional Data Structures (thesis as linked, book: https://www.amazon.com/Purely-Functional-Structures-Chris-Ok...) is awesome. I especially found some of the tree-based recursive algorithms eye-opening (once you spend the time to wrap your head around them).

IMHO required reading if you're doing any heavy FP work.

myth_drannon
And there is UChicago course that is following some parts of this book and implements it in Elm language

https://www.classes.cs.uchicago.edu/archive/2017/spring/2230...

The canonical text discussing how this is achieved refers to data structures optimized for immutability as "purely functional data structures". [1] These type of data structures are the default for Clojure as well, and there are a number of blog posts and presentations discussing their implementation. [2] [3]

It's a reasonably complicated topic, but the basic idea is that since the data structures are immutable, much of their structure can be shared between versions with few changes. Most of these data structures end up using a Tree of some sort. Performance characteristics can be influenced to an extent by using bucketing to control the width/height of the tree.

[1] : Purely Functional Data Structures (Amazon: https://www.amazon.com/Purely-Functional-Structures-Chris-Ok...) [2] http://hypirion.com/musings/understanding-persistent-vector-... [3] https://www.youtube.com/watch?v=7BFF50BHPPo

Going back to the basics to solidify my foundation, one each quarter. Good Practice makes one a better engineer!

Digital Electronics using [1] Operating Systems using [2] Functional Data Structures using [3] Graphics Algorithms [4]

Any recommendations for these subjects sincerely appreciated. Thanks.

[1] https://www.amazon.com/Digital-Design-Computer-Architecture-... [2] https://www.amazon.com/Modern-Operating-Systems-Andrew-Tanen... [3] https://www.amazon.com/Purely-Functional-Structures-Chris-Ok... [4] https://www.amazon.com/Graphics-Visualization-Principles-Alg...

The more you practice, the more you can, the more you want to, the more you enjoy it, the less it tires you.” ― Robert A. Heinlein, The Cat Who Walks Through Walls

melvin0008
Operating Systems basic are well covered in this course https://www.ops-class.org/
deepaksurti
Thanks, this on a first skim, looks a very detailed course. And while we are at it, there is also this post on HN on the front page, which contains more resources for learning about OS. https://news.ycombinator.com/item?id=13258063
signa11
> Digital Electronics using [1] Operating Systems using ...

also, in case you are not aware of it, there is always the nand2tetris [http://www.nand2tetris.org/] thingy (currently running on coursera btw). the book is also pretty good imho.

nojvek
Thanks for posting the link. I just signed up for the course. Always wanted to learn how simple logic gates end up become all purpose CPU's. I've always thought that someday we'll have same concepts in a cell which becomes a full turing machine and anyone can grow it.
deepaksurti
thanks signa11 for the nand2tetris reminder. I have worked through that book and it is really awesome. Worth the time and effort for anyone inclined. I had posted my review on Amazon as well. [1]

I think I should enroll for the Coursera thingy and have at least 1 certificate in my kitty ;-)

[1] https://www.amazon.com/gp/customer-reviews/RZ4ME4QH22JML/ref...

signa11
> I have worked through that book and it is really awesome. Worth the time and effort for anyone inclined.

very cool :)

in case you want something more, i have _very_ fond memories of zvi-kohavi's book (switching and finite automata theory) as well. you might find useful/instructive.

Depending on your level of programming ability, one algorithm a day, IMHO, is completely doable. A number of comments and suggestions say that one per day is an unrealistic goal (yes, maybe it is) but the idea of setting a goal and working through a list of algorithms is very reasonable.

If you are just learning programming, plan on taking your time with the algorithms but practice coding every day. Find a fun project to attempt that is within your level of skill.

If you are a strong programmer in one language, find a book of algorithms using that language (some of the suggestions here in these comments are excellent). I list some of the books I like at the end of this comment.

If you are an experienced programmer, one algorithm per day is roughly doable. Especially so, because you are trying to learn one algorithm per day, not produce working, production level code for each algorithm each day.

Some algorithms are really families of algorithms and can take more than a day of study, hash based look up tables come to mind. First there are the hash functions themselves. That would be day one. Next there are several alternatives for storing entries in the hash table, e.g. open addressing vs chaining, days two and three. Then there are methods for handling collisions, linear probing, secondary hashing, etc.; that's day four. Finally there are important variations, perfect hashing, cuckoo hashing, robin hood hashing, and so forth; maybe another 5 days. Some languages are less appropriate for playing around and can make working with algorithms more difficult, instead of a couple of weeks this could easily take twice as long. After learning other methods of implementing fast lookups, its time to come back to hashing and understand when its appropriate and when alternatives are better and to understand how to combine methods for more sophisticated lookup methods.

I think you will be best served by modifying your goal a bit and saying that you will work on learning about algorithms every day and cover all of the material in a typical undergraduate course on the subject. It really is a fun branch of Computer Science.

A great starting point is Sedgewick's book/course, Algorithms [1]. For more depth and theory try [2], Cormen and Leiserson's excellent Introduction to Algorithms. Alternatively the theory is also covered by another book by Sedgewick, An Introduction to the Analysis of Algorithms [3]. A classic reference that goes far beyond these other books is of course Knuth [4], suitable for serious students of Computer Science less so as a book of recipes.

After these basics, there are books useful for special circumstances. If your goal is to be broadly and deeply familiar with Algorithms you will need to cover quite a bit of additional material.

Numerical methods -- Numerical Recipes 3rd Edition: The Art of Scientific Computing by Tuekolsky and Vetterling. I love this book. [5]

Randomized algorithms -- Randomized Algorithms by Motwani and Raghavan. [6], Probability and Computing: Randomized Algorithms and Probabilistic Analysis by Michael Mitzenmacher, [7]

Hard problems (like NP) -- Approximation Algorithms by Vazirani [8]. How to Solve It: Modern Heuristics by Michalewicz and Fogel. [9]

Data structures -- Advanced Data Structures by Brass. [10]

Functional programming -- Pearls of Functional Algorithm Design by Bird [11] and Purely Functional Data Structures by Okasaki [12].

Bit twiddling -- Hacker's Delight by Warren [13].

Distributed and parallel programming -- this material gets very hard so perhaps Distributed Algorithms by Lynch [14].

Machine learning and AI related algorithms -- Bishop's Pattern Recognition and Machine Learning [15] and Norvig's Artificial Intelligence: A Modern Approach [16]

These books will cover most of what a Ph.D. in CS might be expected to understand about algorithms. It will take years of study to work though all of them. After that, you will be reading about algorithms in journal publications (ACM and IEEE memberships are useful). For example, a recent, practical, and important development in hashing methods is called cuckoo hashing, and I don't believe that it appears in any of the books I've listed.

[1] Sedgewick, Algorithms, 2015. https://www.amazon.com/Algorithms-Fourth-Deluxe-24-Part-Lect...

[2] Cormen, et al., Introduction to Algorithms, 2009. https://www.amazon.com/s/ref=nb_sb_ss_i_1_15?url=search-alia...

[3] Sedgewick, An Introduction to the Analysis of Algorithms, 2013. https://www.amazon.com/Introduction-Analysis-Algorithms-2nd/...

[4] Knuth, The Art of Computer Programming, 2011. https://www.amazon.com/Computer-Programming-Volumes-1-4A-Box...

[5] Tuekolsky and Vetterling, Numerical Recipes 3rd Edition: The Art of Scientific Computing, 2007. https://www.amazon.com/Numerical-Recipes-3rd-Scientific-Comp...

[6] https://www.amazon.com/Randomized-Algorithms-Rajeev-Motwani/...

[7]https://www.amazon.com/gp/product/0521835402/ref=pd_sim_14_2...

[8] Vazirani, https://www.amazon.com/Approximation-Algorithms-Vijay-V-Vazi...

[9] Michalewicz and Fogel, https://www.amazon.com/How-Solve-Heuristics-Zbigniew-Michale...

[10] Brass, https://www.amazon.com/Advanced-Data-Structures-Peter-Brass/...

[11] Bird, https://www.amazon.com/Pearls-Functional-Algorithm-Design-Ri...

[12] Okasaki, https://www.amazon.com/Purely-Functional-Structures-Chris-Ok...

[13] Warren, https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0...

[14] Lynch, https://www.amazon.com/Distributed-Algorithms-Kaufmann-Manag...

[15] Bishop, https://www.amazon.com/Pattern-Recognition-Learning-Informat...

[16] Norvig, https://www.amazon.com/Artificial-Intelligence-Modern-Approa...

Well, when you pass a variable around, it doesn't and cannot change. This means that different threads can't get in eachothers' way anymore, for instance, but also that you can't make a big chunk of mistakes at all.

They also have serious disadvantages : they can't be memory managed in the traditional way (since they tend to reuse other instances' memory in complex ways), and thus require a GC (refcounting can work, but ...). They are VERY allocation intensive, and they are worse than most non-persistent data structures. Assuming an O(1) allocator they can match non-persistent data structures in O-ness (ie. when making an invalid assumption that is quite popular in academia. In practice memory allocation is O(1) for small values, then O(n^2) once you get close to the system's memory capacity (scanning for holes in a long list) but don't go over it, and then O(oh fuck it let's just reboot this bloody BOAT ANCHOR) when crossing that line).

Clojure is famous for having good persistent data structures. Rich Hickey went touring academia touting the benefits of immutable/persistent/functional data structures : https://www.youtube.com/watch?v=dGVqrGmwOAw&feature=youtu.be...

There's also a famous book: https://www.amazon.com/Purely-Functional-Structures-Chris-Ok...

GFK_of_xmaspast
> Well, when you pass a variable around, it doesn't and cannot change

Yeah but how's that different than a const?

iofj
It's not, but it still has update methods. It's a const with update methods. Example for a map:

x = map{"five": 5} y = x.put("six": 6}

Now x is map{"five": 5} and y is map{"five": 5, "six": 6}. If any other tread was using x, it hasn't changed.

Generally you build the new list using the tail of the old list (which is unchanged), your inserted value, and each of the values in front inserted onto that. So, immutability is preserved. Any existing references will not see their values changed. And the new references will share some memory with the old ones (how much depends on where in the list the value was inserted).

You can learn more in Okasaki's great book: http://www.amazon.com/Purely-Functional-Structures-Chris-Oka...

This book is more or less the definitive book on functional persistent data structures. More about how to design and reason about them, then a collection book.

http://www.amazon.com/Purely-Functional-Structures-Chris-Oka...

I'm curious if anyone who has read (or is familiar with the data structures described in) Purely Functional Data Structures[0] can weigh in on how this persistent vector compares to say, the real time deque which is described to have worst case runtime of O(1) for cons/head/tail/snoc/last/init (along with plenty of other structures which have similar amortized time guarantees, etc.) Do these structures not have practical performance characteristics, or is there another reason a O(log) solution was chosen? Like the other poster mentioned here, its a bit upsetting having read something that rigorously analyzes worst time behavior to have an article hand wave away log32 as "effectively constant time". Especially because amortization is even trickier to consider with persistent data structures since you have to deal with the fact that you may have many operations taking place on your worst case node (several pushes on your edge case base vector, etc). Perhaps these tries actually account for that very elegantly, but I certainly don't know enough to intuit it from what I've seen here.

0. http://www.amazon.com/Purely-Functional-Structures-Chris-Oka...

Edit: PDF version: http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf

lomnakkus
AFAIR the deques in PFDS don't permit anything better than O(log n) random access -- and the ones that do that are hideously complicated (again AFAIR, it's been years since I read it). Also, the thing is constants matter, and this is what the "log_32(n)" vs. "log(n)" discussion is driving at.

(You're still not going to get anywhere close to a linear scan of an array with directly embedded values, but there's just no way to achieve that kind of raw performance for pure functional data structures.)

Kutta
The key point here is that Clojure-style vectors have logarithmic random reads/writes, while Okasaki's aforementioned deques do the same in linear time. Clojure uses the persistent vector by default, so it's essential that people can port their usual imperative vector algorithms and expect to get acceptable performance. Hence the focus on random access.

As to amortization, operations on Clojure-style vectors have pretty much zero variance (modulo the size of the vector). There is no resizing, no reordering, no rotation or whatever, it's just the exact same algorithm every time.

For those of you who aren't familiar with the author's other works, it's worth taking a peek at his other books.

I can whole heatedly recommend Pearls of Functional Algorithm Design http://www.amazon.com/Pearls-Functional-Algorithm-Design-Ric...

It's a good cross between two other excellent books:

- Jon Bentley's Programming Pearls http://www.amazon.com/Programming-Pearls-2nd-Jon-Bentley/dp/...

and

- Chris Okasaki's Purely Function Data Structures http://www.amazon.com/Purely-Functional-Structures-Chris-Oka....

If you haven't read all three, its well worth your while to do so!

And of course if you are going down the rabbit hole of reading Perls of Functional Algorithm Design then you need to read the "how to read Pearls of Functional Algorithm design" as well.

http://www.atamo.com/blog/how-to-read-pearls-by-richard-bird...

vishnugupta
Seconded! Also, I'd highly recommend "Introduction to Functional Programming using Haskell" by the same author. This was the book that set me forth on the path of FP during post graduation. Though, professionally, I've been programming in Java for close to a decade now most of the FP principals have held me in good stead.

[edit] Preface says "The present book is a completely rewritten version of the second edition of my Introduction to Functional Programming using Haskel" so my recommendation is moot.

Sep 17, 2014 · 2 points, 0 comments · submitted by tosh
Of course it is obligatory to mention the classic: Okasaki's "Purely Functional Datastructures" book.

http://www.amazon.com/Purely-Functional-Structures-Chris-Oka...

Also check out this link for additional data structures worked on since then or simply not included in the book:

http://cstheory.stackexchange.com/questions/1539/whats-new-i...

Arnor
Is this more required reading? Has anyone else's homework load doubled since college?
swannodette
A minor nit, most of the data structures provided through Mori via ClojureScript are not based heavily on Okasaki's work, rather they are based on Bagwell's paper on mutable Ideal Hash Trees http://lampwww.epfl.ch/papers/idealhashtrees.pdf and Rich Hickey's efficient immutable implementations in Java for Clojure
Jul 20, 2013 · lispm on JavaScript Isn't Scheme
> Lots of Lisps have been backwards-incompatible with previous Lisps. Scheme, Common Lisp, Emacs Lisp, and even MACLISP and LISP 1.5 were all significantly backwards-incompatible with their predecessors.

Right. That's what I'm saying. Clojure does not care to be backwards compatible with Lisp.

> Your taxonomy of immutability and persistence is interesting

That's not mine.

Clojure took its base data structures from Haskell and modern ML.

Not Lisp.

See:

http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf

http://en.wikipedia.org/wiki/Persistent_data_structure

The book comes with examples in ML and Haskell.

http://www.amazon.com/Purely-Functional-Structures-Chris-Oka...

kragen
I don't think we're going to get anywhere further in this conversation, although I really appreciate everything you've posted so far, and I heartily second your recommendation of Okasaki's book, even though I haven't finished it myself. And I hope that I have avoided being anything like Erik Naggum in this conversation, despite my frustrations.
sillysaurus
This conversation was incredible!
Jun 26, 2013 · tel on Clojure Tradeoffs
Immutability means that if you're "modifying" the structure you must actually be "making a new copy with a small change". Structural sharing means that usually only the small change itself gets new allocation, but it's still more pointer-chasing than a continuous memory map.

Lots of algorithms don't work with this kind of copying semantics, but you have all of Purely Functional Data Structures [1] to help.

[1] http://www.amazon.com/Purely-Functional-Structures-Chris-Oka...

kleiba
Fair enough, but I wasn't talking about modification, I asked about iteration.
tel
Oh, true—I jumped his point
cgag
Just iterating doesn't make any copies. Replace iterating over with modifying.

Efficient immmutable data structures are easily one of Clojure best features. Easiest trade off ever (though the actual hit to performance isn't that bad) imo.

dragandj
Not only that, but you can also make a persistent structure locally transient for the duration of the (modifying) iteration, so in lots of cases you do not have to pay the persistence penalty.
http://www.amazon.co.uk/gp/aw/d/0521663504

functional data structures - how to bend your brain in a functional world - nearly twenty years old. Still have to take a run up to read it

How different is this PDF from the book version? (see: http://www.amazon.com/Purely-Functional-Structures-Chris-Oka...) Regardless, thanks for the link.
silentbicycle
The book expands upon his thesis, and has Haskell source code in an appendix. I'm more familiar with the book than that PDF (and definitely recommend it), but comparing their tables of contents would probably help.
I have read a few chapters of this very beautiful book by Richard Bird called "Pearls of Functional Algorithm Design": http://www.amazon.com/Pearls-Functional-Algorithm-Design-Ric...

You should definitely check it out. If you already know Haskell and you love algorithms, that book will probably be more interesting than the Intro to Haskell book.

Also check out Okasaki's book of functional data structures: http://www.amazon.com/Purely-Functional-Structures-Chris-Oka...

Not sure if this is what you want, but Chris Okasaki's "Purely Functional Data Structures" are data-structures that can be accessed in multiple threads because changes result in new versions.

http://www.amazon.com/Purely-Functional-Structures-Chris-Oka...

pmjordan
The data structures in that book are also geared towards shared memory; actually, it doesn't cover parallel programming at all, it's just that functional data structures happen to be a good fit for concurrent access in shared memory systems.

You might have more success finding what you want in HPC literature, or maybe in the Erlang community.

That said, it's a decent book, although not exactly a light read; I did struggle to follow some of the numerous proofs. You'll probably find it easier if you have a CompSci background.

Haven't read it, but there is the book Purely Functional Data Structures http://www.amazon.com/Purely-Functional-Structures-Chris-Oka...
mtrimpe
Have (tried to ;) read it, but this just describes patterns for constructing the smallest building blocks for what I described.
Apr 24, 2008 · carterschonwald on Algorithms in Lisp
Chris Okasaki's "Purely Functional Data Structures" covers exactly that which is lacking in a standard reference such as CLRS "Introduction to Algorithms". Read / work through both and you're well on your way to being great at algorithmic problem solving

edit: for those who are too lazy to google, here's the book on amazon http://www.amazon.com/Purely-Functional-Structures-Chris-Oka...

pchristensen
He also has a very good blog:

http://okasaki.blogspot.com/

vmcodes
Thanks, It seems to be freely available too, first hit on google.

http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf

carterschonwald
Actually thats just Bob's copy of Chris' thesis, the book covers that plus a bunch of techniques that other people also cooked up. But the thesis covers enough stuff that it'll give you a good sense of whether or not the work is a worthwhile investment.

Be warned, what you get out of this material in terms of understanding is strictly proportional to your comfort in mathematical reasoning

vmcodes
Right, thats his thesis ... His blog indicates that the book has more than what's in the thesis ...
Also available in extended dead-tree form: http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504

I highly recommend the book if you're implementing or using a functional programming language. There're lots of data structures that aren't mentioned at all in traditional imperative-language textbooks. Some of them even have decent performance.

HN Books is an independent project and is not operated by Y Combinator or Amazon.com.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.