HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
How We've Scaled Dropbox

Stanford · Youtube · 76 HN points · 9 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Stanford's video "How We've Scaled Dropbox".
Youtube Summary
(Feburary 22, 2012) Kevin Modzelewski talks about Dropbox and its History. He describes the technological issues faced by Dropbox and the actions they have to take in order to continuously improve it.

Stanford University:
http://www.stanford.edu/

Stanford School of Engineering:
http://soe.stanford.edu/

Stanford Computer Systems Colloquium:
http://www.stanford.edu/class/ee380/

Stanford University Channel on YouTube:
http://www.youtube.com/stanford
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
An important detail which this guide leaves out: The "actual location of the row" is usually the leaf node of another B-Tree. That is the primary B-Tree and it's indexed by the primary key.

The major consequence is every non-primary index query involves dereferencing N pointers, which could mean loading N pages from disk/SSD to get leaf nodes spread out across the primary tree. Whereas if you query a range in the primary B-Tree directly, the rows are consecutive so you would only load N/M consecutive pages based on the ratio of rowsize to pagesize.

That's why some people use composite keys for primary key, to get better data locality in the primary B-Tree index.

See "How We've Scaled Dropbox"[1] to hear the same thing.

At 48:36 they explain why they changed from PRIMARY KEY (id) to PRIMARY KEY (ns_id, latest, id). ns_id is kinda like user_id. So that change groups journal entries for users together on disk

Specifically. PRIMARY KEY (id) orders things by creation date whereas PRIMARY KEY (ns_id, latest, id) orders things by ns_id primarily.

[1]: https://youtu.be/PE4gwstWhmc?t=2914

iaabtpbtpnn
This is true of MySQL (which the guide uses), but not necessarily of other databases such as Postgres.
srcreigh
You're right. Postgres doesn't give any control over primary data locality. That might cause querying 1 row a bit faster in Postgres (no log(N) traversal of the primary B-tree index) but picking out N rows could be a lot slower.

https://www.postgresql.org/docs/13/indexes-index-only-scans....

Aug 02, 2020 · 2 points, 0 comments · submitted by harsilspatel
Sep 25, 2018 · 64 points, 6 comments · submitted by tosh
fermienrico
I’m curious what the state of their affairs is today. 2012-2018 is a huge amount of time in tech world and I’m curious what improvements they’ve made.
zawerf
The evolution of their SQL schema (around ~45:00 on) is pretty cool.

For example to implement undo/version control for files, they just added a single `prev_rev` column. There are some arguably better (but more complicated) ways to do it but it would've been premature optimization since this simple solution clearly worked out for them.

nodesocket
If I am understanding correctly a sort_order column would work as well. Essentially then you can just do select with order by sort_order.
redwood
I wonder if their focus on data center build out (rather than differentiation of their offering and customers, yes admittedly debatable) will be deemed a success or misstep in the long run
leowoo91
Looks like an investor level presentation. Pretty sure, there are 10x more layers of controllers needed to handle 1M> users.
waz0wski
They've posted about this on their tech blog:

https://blogs.dropbox.com/tech/2016/03/magic-pocket-infrastr...

https://blogs.dropbox.com/tech/2018/06/extending-magic-pocke...

I found the Dropbox lecture [1] at Stanford one of the most riveting things ever. There is just so much technology behind Dropbox, it is staggering.

There is a reason why it is so much better than iCloud sync, Google Drive, Box or OneDrive.

[1] https://www.youtube.com/watch?v=PE4gwstWhmc

travbrack
This led me to find another loosely related but very entertaining piece of dropbox history. The original "Show HN" post: [1]. It's funny to see so much skepticism knowing now what the company became.

[1] https://news.ycombinator.com/item?id=8863

billforsternz
Yes, this is one of the classics - right up there with the "less space than a Nomad, no wireless, lame" comment (which wasn't on HN I don't think - but we all know it could have been :)

Edit: I see the motherlode is in place earlier in the thread "Especially when you could build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem"

mixmastamyk
The iPod comment was from slashdot, if memory serves.
wiredfool
Yes. CmdrTaco, when posting to the home page:

https://slashdot.org/story/01/10/23/1816257/Apple-releases-i...

(that post is old enough to vote in this year's election).

anentropic
ha, that's a nice counterpart to the 'trivial' thread above https://news.ycombinator.com/item?id=18071820
imo the best way is to look at what other's have built. Here's some of my favorite talks that go from 0 users to millions.

Dropbox - https://www.youtube.com/watch?v=PE4gwstWhmc

Instagram - https://www.youtube.com/watch?v=oNA2C1vC8FQ

Slack (bonus. not as applicable but good reminder of why initial architecture does matter) - https://www.youtube.com/watch?v=WE9c9AZe-DY

May 07, 2018 · 2 points, 0 comments · submitted by confbase
How we scaled DropBox - Kevin Modzelewski: https://www.youtube.com/watch?v=PE4gwstWhmc

The initial Node.js presentation: https://www.youtube.com/watch?v=ztspvPYybIY&t=597s

Mar 24, 2018 · jesseendahl on Congrats Dropbox
Dropbox is a lot more than a GUI on top of rsync. Even purely from an engineering standpoint (ignoring product & design) that's incorrect.

You might enjoy this talk: https://www.youtube.com/watch?v=PE4gwstWhmc

Applications dont really need to be well architected until they are hitting scale. Then the parts of their system that need to relieve pressure will need to be re-architected. This is almost like a case study and there are a lot of good talks on youtube from places like dropbox and facebook that explain the problem and solution. Example: https://www.youtube.com/watch?v=PE4gwstWhmc

If you dont want to do youtube case studies there are also books to read about distributed systems. Also reading about cloud architecture can help.

LrnByTeach
> Applications don't really need to be well architected until they are hitting scale.

very True, 'a system Well architected' before hitting scale is considered OVER Engineering

> Then the parts of their system that need to relieve pressure will need to be re-architected.

> This is almost like a case study and there are a lot of good talks on youtube from places like dropbox and facebook that explain the problem and solution. Example: https://www.youtube.com/watch?v=PE4gwstWhmc

This is a talk about the evolution of Dropbox's architecture from 2012: http://www.youtube.com/watch?v=PE4gwstWhmc. It is incredibly detailed (down to the exact sql schema they use for file metadata etc).
Dec 09, 2012 · 8 points, 0 comments · submitted by bpuvanathasan
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.