HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Lock-Free Algorithms For Ultimate Performance

Martin Thompson · InfoQ · 147 HN points · 2 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Martin Thompson's video "Lock-Free Algorithms For Ultimate Performance".
Watch on InfoQ [↗]
InfoQ Summary
Martin Thompson discusses the need to measure what’s going on at the hardware level in order to be able to create high performing lock-free algorithms.
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Apr 03, 2013 · 42 points, 6 comments · submitted by ari_elle
svag
The presentation in pdf http://qconsf.com/dl/qcon-sanfran-2012/slides/MartinThompson... for off-line viewing.
mgl
If you interested in HPC topic, especially in terms of HFT, you should his previous presentation on LMAX system: http://www.infoq.com/presentations/LMAX
exDM69
Really good presentation, although a bit long.

What was most interesting to me was the actual performance figures and the small optimizations that yield big improvements.

cnlwsu
great presentation
colanderman
Here's one better, if all your data is strictly less than (not equal to) a cache line (usually 64 bytes) in size, and your processor guarantees memory ordering within a cache line (most do):

1) Keep head and tail pointers local to the consumer and producer.

2) Associate a bit with each entry in the queue which denotes whether the entry is full or not. The bit must live within the same cache line as the entry itself.

3) Block on this bit, rather than head & tail pointers.

4) Set an entry's bit after filling the entry; clear it before.

5) You can now elide the memory fence (implicit in the .lazySet() method of the atomic objects). Performance will skyrocket.

justanotheratom
Interestingly, the USB 3.0 xHCI Specification uses the scheme you mention for circular queues shared between software and hardware.

4.9 TRB Ring: http://www.intel.com/content/www/us/en/io/universal-serial-b...

I assume that the mutex's memory is warmed, so the processor doesn't have to go to RAM. But, it does have to synchronize with any other processors/cores in the machine to prevent them from locking the mutex at the same time.

In a Intel Nahalem or Sandy Bridge system, this goes over the QuickPath Interconnect which has a latency of ~20ns. HyperTransport fills the same role in AMD systems, and probably has a similar latency, but I don't have numbers for that.

I'm basing this on this presentation, especially the architecture diagram at 2m 40s:

http://www.infoq.com/presentations/Lock-free-Algorithms

http://www.infoq.com/presentations/Lock-free-Algorithms early-on gives good numbers on-machine.

I like this visualisation too: http://news.ycombinator.com/item?id=702713

matthavener
For anyone thinking about watching: the presentation is incredibly good. The name is kinda misleading -- they talk more about modern x86 architecture and actual numbers of various algorithms than the lock free algorithms themselves. Both of those guys have a high emphasis on measuring and testing to improve performance.
May 25, 2012 · 100 points, 17 comments · submitted by willvarfar
akg
Here is a paper I found extremely useful when studying Lock Free data structures. It focuses on linked lists, but has a nice exposition of the idea: http://www.cse.yorku.ca/~ruppert/papers/lfll.pdf
espeed
Datomic (http://datomic.com) is also built on an immutable, append-only, lock-free design.
aperture
A great talk, but quite sad that it's on infoq.com , given the difficulty of getting pdf, registration, and that everything is run through .swf so those with flash are left without alternatives. Tidbit of viewing the source gives on lines 989 and 990 some insight on controlling slides, since they seem to load based on time delays. :P It's also quite the array that's created, but I suppose that's all part of making a website.
mbucc
This talk is very informative.

The historical bit about NASA using event-sourced design for the Apollo Program (1969) was pretty interesting. He gives IBM big props for hitting 2,300 transactions/second in the mid-60's (with IMS aka DB1).

@17:15 "We've forgetting a lot of this good stuff in our modern designs."

@17:34 "Transaction queues, pulling things off, uncontended, and processing them."

@17:50 "Some of the systems we have today are woeful and can't even get close to that, considering the hardware we have today, and it's ... how we are writing contended designs"

tomp
Can anyone tell me, how can the InfoQ presentations be controlled (i.e. next/previous slide)?
sp332
If you register, you can download the slides separately.
willvarfar
I haven't worked it out other than clicking on the video timeline; I think the dashes on the timeline on the video show when the slides advance.

In this particular talk, I think the talk is more valuable than skimming the slides. Almost better to listen to than to watch.

drstrangevibes
http : //www.infoq.com/resource/presentations/Lock-free-Algorithms/en/slides/{INSERTSLIDENUMBERHERE}.swf
SlipperySlope
I used the Disrupter lock-free queue in my multi-threaded Java application last year. No problems using it and after I replaced the previous concurrent queue, that portion of the code is no longer a profiling hotspot.

It was not in Maven at the time, so I copied the Disrupter packages into my own utilities Maven module.

willvarfar
In the middle of the talk, they mention specifically recent changes to disruptor to avoid degrading massively if you have more producers than physical cores.

Anything to get you to watch the talk all the way through ;)

judofyr
Direct link to the PDF: https://www.dropbox.com/s/dnf2bxtsvu37z3v/pdf-thing.pdf
dochtman
Thanks! InfoQ "presentations" are so crappy...
lucian1900
I find it rather nice to have synced video and slides.

If they didn't use flash, it'd be perfect.

h84ru3a
Is HTML5 going to signal the end of sites structured like infoq? Will we just be able to extract a download link instead of jumping through endless hoops and tolerating annoyance after annoyance?
pi18n
Don't worry, I am sure they'll find a way to separate content slide-by-slide on a number of ad-ridden pages.
bbrtyth
Thanks!
Nrsolis
I'd add that this (http://www.akkadia.org/drepper/cpumemory.pdf) is a good accompaniment to the slides.
Apr 30, 2012 · 5 points, 0 comments · submitted by aespinoza
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.