Hacker News Comments on
Lock-Free Algorithms For Ultimate Performance
Martin Thompson
·
InfoQ
·
147
HN points
·
2
HN comments
- This course is unranked · view top recommended courses
Hacker News Stories and Comments
All the comments and stories posted to Hacker News that reference this video.⬐ svagThe presentation in pdf http://qconsf.com/dl/qcon-sanfran-2012/slides/MartinThompson... for off-line viewing.⬐ mglIf you interested in HPC topic, especially in terms of HFT, you should his previous presentation on LMAX system: http://www.infoq.com/presentations/LMAX⬐ exDM69Really good presentation, although a bit long.What was most interesting to me was the actual performance figures and the small optimizations that yield big improvements.
⬐ cnlwsugreat presentation⬐ colandermanHere's one better, if all your data is strictly less than (not equal to) a cache line (usually 64 bytes) in size, and your processor guarantees memory ordering within a cache line (most do):1) Keep head and tail pointers local to the consumer and producer.
2) Associate a bit with each entry in the queue which denotes whether the entry is full or not. The bit must live within the same cache line as the entry itself.
3) Block on this bit, rather than head & tail pointers.
4) Set an entry's bit after filling the entry; clear it before.
5) You can now elide the memory fence (implicit in the .lazySet() method of the atomic objects). Performance will skyrocket.
⬐ justanotheratomInterestingly, the USB 3.0 xHCI Specification uses the scheme you mention for circular queues shared between software and hardware.4.9 TRB Ring: http://www.intel.com/content/www/us/en/io/universal-serial-b...
I assume that the mutex's memory is warmed, so the processor doesn't have to go to RAM. But, it does have to synchronize with any other processors/cores in the machine to prevent them from locking the mutex at the same time.In a Intel Nahalem or Sandy Bridge system, this goes over the QuickPath Interconnect which has a latency of ~20ns. HyperTransport fills the same role in AMD systems, and probably has a similar latency, but I don't have numbers for that.
I'm basing this on this presentation, especially the architecture diagram at 2m 40s:
http://www.infoq.com/presentations/Lock-free-Algorithms early-on gives good numbers on-machine.I like this visualisation too: http://news.ycombinator.com/item?id=702713
⬐ matthavenerFor anyone thinking about watching: the presentation is incredibly good. The name is kinda misleading -- they talk more about modern x86 architecture and actual numbers of various algorithms than the lock free algorithms themselves. Both of those guys have a high emphasis on measuring and testing to improve performance.
⬐ akgHere is a paper I found extremely useful when studying Lock Free data structures. It focuses on linked lists, but has a nice exposition of the idea: http://www.cse.yorku.ca/~ruppert/papers/lfll.pdf⬐ espeedDatomic (http://datomic.com) is also built on an immutable, append-only, lock-free design.⬐ apertureA great talk, but quite sad that it's on infoq.com , given the difficulty of getting pdf, registration, and that everything is run through .swf so those with flash are left without alternatives. Tidbit of viewing the source gives on lines 989 and 990 some insight on controlling slides, since they seem to load based on time delays. :P It's also quite the array that's created, but I suppose that's all part of making a website.⬐ mbuccThis talk is very informative.The historical bit about NASA using event-sourced design for the Apollo Program (1969) was pretty interesting. He gives IBM big props for hitting 2,300 transactions/second in the mid-60's (with IMS aka DB1).
@17:15 "We've forgetting a lot of this good stuff in our modern designs."
@17:34 "Transaction queues, pulling things off, uncontended, and processing them."
@17:50 "Some of the systems we have today are woeful and can't even get close to that, considering the hardware we have today, and it's ... how we are writing contended designs"
⬐ tompCan anyone tell me, how can the InfoQ presentations be controlled (i.e. next/previous slide)?⬐ sp332⬐ SlipperySlopeIf you register, you can download the slides separately.⬐ willvarfarI haven't worked it out other than clicking on the video timeline; I think the dashes on the timeline on the video show when the slides advance.In this particular talk, I think the talk is more valuable than skimming the slides. Almost better to listen to than to watch.
⬐ drstrangevibeshttp : //www.infoq.com/resource/presentations/Lock-free-Algorithms/en/slides/{INSERTSLIDENUMBERHERE}.swfI used the Disrupter lock-free queue in my multi-threaded Java application last year. No problems using it and after I replaced the previous concurrent queue, that portion of the code is no longer a profiling hotspot.It was not in Maven at the time, so I copied the Disrupter packages into my own utilities Maven module.
⬐ willvarfar⬐ judofyrIn the middle of the talk, they mention specifically recent changes to disruptor to avoid degrading massively if you have more producers than physical cores.Anything to get you to watch the talk all the way through ;)
Direct link to the PDF: https://www.dropbox.com/s/dnf2bxtsvu37z3v/pdf-thing.pdf⬐ dochtmanThanks! InfoQ "presentations" are so crappy...⬐ lucian1900⬐ bbrtythI find it rather nice to have synced video and slides.If they didn't use flash, it'd be perfect.
⬐ h84ru3aIs HTML5 going to signal the end of sites structured like infoq? Will we just be able to extract a download link instead of jumping through endless hoops and tolerating annoyance after annoyance?⬐ pi18nDon't worry, I am sure they'll find a way to separate content slide-by-slide on a number of ad-ridden pages.Thanks!⬐ NrsolisI'd add that this (http://www.akkadia.org/drepper/cpumemory.pdf) is a good accompaniment to the slides.