HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Hosting Notebooks for 100,000 Users - Scott Sanderson

O'Reilly · Youtube · 151 HN points · 0 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention O'Reilly's video "Hosting Notebooks for 100,000 Users - Scott Sanderson".
Youtube Summary
Subscribe to O'Reilly on YouTube: http://goo.gl/n3QSYi

Follow O'Reilly on:
Twitter: http://twitter.com/oreillymedia
Facebook: http://facebook.com/OReilly
Instagram: https://www.instagram.com/oreillymedia
LinkedIn: https://www.linkedin.com/company-beta/8459/
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
Apr 30, 2018 · 151 points, 32 comments · submitted by jbredeche
sandGorgon
there is Dash (https://plot.ly/products/dash/) and there is Jupyter.

I wish there was some abstraction to generate a Dash like output from Jupyter. There are a lot of people who would pay serious money for that.

Even Airbnb built a framework to extract code from Jupyter notebooks and push them into a machine learning pipeline (https://medium.com/airbnb-engineering/using-machine-learning...).

Jupyter can be so much more by going closer to how it fits within a production pipeline versus just competing against Rstudio.

minimaxir
> I wish there was some abstraction to generate a Dash like output from Jupyter. There are a lot of people who would pay serious money for that.

Notebook-to-dataviz is the value proposition of Mode Analytics, which has been doing well: https://modeanalytics.com/

rb808
Also I always thought notebooks would be a great devops tool, kind of like a super command line that has easily observable steps grouped in chunks and graphical feedback. No one else seems to think so though so maybe I'm wrong.
craigching
How about Emacs + org mode? https://www.youtube.com/watch?v=dljNabciEGg

I thought this video was pretty cool personally!

cup-of-tea
Very cool! Watching someone else using emacs is always a great learning experience.
amirathi
I built a platform exactly for this. Check it out here: https://www.nurtch.com/
sytse
Wow, it is awesome how you managed to integrate runbooks, Jupyter, and monitoring, well done and great video!
amirathi
Thanks :)

Just saw that you are CEO of GitLab. Good job making your runbooks public [1]. I converted one of your runbook into an executable notebook to convey my point. Check the before/after screenshot here: https://blog.amirathi.com/2018/03/27/codify-infra-runbooks-w...

You should consider using Nurtch :)

[1]https://gitlab.com/gitlab-com/runbooks

wohlergehen
Native Jupyter has the magic %%bash command, so a lot of this should just be possible, i.e. you start a cell with that, and it will invoke bash to execute it.

More generally, there is %%script <x>, which executes the cell using <x>.

0: http://ipython.readthedocs.io/en/stable/interactive/magics.h...

lmeyerov
Yep! We've been experimenting with a variant for security & fraud investigations at Graphistry. Our original experiments with just notebooks resulted in a few people on super early adopter teams getting a kick out of it, but in critical ~operational settings, having to deal with code was... not great. Mostly looked like siloed personal use.

BUT. The core concept is great. We made a form more focused on interactive investigation, so you can jump from an alert in a dashboard into a rich & interactive visual session with pre-wrangled data: https://youtu.be/B3ZZWx9WUEk?t=1m32s . Depending on what you see, can easily pull in more data, or refine what's there. And agreed, I think it'd be fun to try in big devops/netops scenarios!

jacquesm
That makes good sense, in case it helps your confidence.

It's a middle ground between literate programming and a traditional IDE and there is nothing like that aimed at the ops space where it would actually be quite valuable.

nickbarnwell
emacs' org-mode / org-babel is well suited for it. This blog post [1] has some fantastic examples. The problem, of course, is that it's so closely tied to emacs.

1: http://howardism.org/Technical/Emacs/literate-devops.html

craigching
Nice, I just posted the video from that link above ;) Great minds think alike!
nickbarnwell
Howard's blog and github are an amazing resource; I'm a huge fan of his demo-it package as well. Highly recommend checking it out if you ever give technical presentations or talks.
existencebox
As disclaimer/context, am a dev on the Azure hosted Jupyter Notebooks product,

You're not wrong! (Or at the least, it's a topic that has come across our ears before, and is something I certainly agree with) Obviously I probably shouldn't go off spouting all the pipe dreams I have in this space, but given that I got my start doing Ops work and tried to keep an eye for things I might have liked back then, I can assure you there you're not alone.

I always saw the similarity foremost as a direct upgrade to the "runbooks"/"firefighting/deploy checklists" that crop up all too often.

alexeldeib
Another Azure dev checking in :)

Have you seen Application Insights Workbooks [0]? Basically you can have interactive notebooks and run analytics queries against your telemetry, generate charts, add text cells, etc. It's picking up usage for investigating outages, e.g., have a Workbook with a query that looks at your dependency calls and determine what service is failing + produce a visualization.

Workbooks don't actually execute any external actions, though. It's solely an analysis tool. Runbooks skew the other direction, they are for executing scripts (more or less).

Jupyter/python seems to fit in a nice gap where this could be bridged, especially with the level of existing python support from azure sdk + cli.

PS: a dev from Workbooks has seen Azure Notebooks, and was curious a while back about how he could integrate the functionality [1]

[0]: https://docs.microsoft.com/en-us/azure/application-insights/... [1]: http://blog.my-is300.com/2017/06/what-i-work-on-application-...

gardnerjr
wow, a link to my blog (that second link) made hackernews? that's exciting!

Anyway, yeah, workbooks in appinsights is almost like notebooks for non-programmers? kinda? you string together markdown, parameters, and analytics queries (and very soon metrics across more of azure) into reports. But the parameters stuff lets you do more interactive things to hide/show sections now. i really need to do a new blog post about all the new stuff that's in there that wasn't last june!

i've prototyped some stuff to export an AI workbook to an azure/jupyter notebook, as there's some support for querying analytics already from a python package. there just hasn't been enough demand for it so far (not as much as we expected, anyway?)

peatmoss
The R community has a really nice answer for this: https://rmarkdown.rstudio.com/flexdashboard/
ssanderson11235
There have been a couple attempts to add dashboarding to Jupyter:

https://github.com/jupyter/dashboards was/is a dashboard system built by a team at IBM. I think the project stalled somewhat after IBM stopped funding it.

There are a few long threads in the currently active jupyter repos about building dashboard systems as extensions: https://github.com/jupyterlab/jupyterlab/issues/1640, https://github.com/jupyter-widgets/ipywidgets/issues/2018.

deven88
I was searching for such dashboard utility and I found this: https://github.com/oschuett/appmode It may be useful for some of the cases.
ssanderson11235
Speaker here. If you want to follow along with the slides from the talk, you can find them at https://speakerdeck.com/ssanderson/hosting-notebooks-for-100....

Also, happy to answer any questions that people might have.

porterde
Are the cell sharing extensions mentioned in the slides open sourced? (Sorry if it says either way in the video, I didn't get chance to watch it in full yet). Lack of sharing / collaboration extensions for Jupyter Notebooks / Lab are still a weak point I think.
ssanderson11235
The sharing machinery isn't open source, mostly because it's pretty tightly coupled to our community forums, which is a custom rails application.

I know that the jupyterhub team was working on https://github.com/jupyterhub/hubshare for a while as an open source sharing solution. I've also commented in https://github.com/jupyterhub/hubshare/issues/14 and elsewhere that I think PGContents (one of the libraries I talk about in the video) could be used as a basis for many kinds of sharing (though probably not realtime collaboration).

diabeetusman
Do you know if there's any way that we can see what's happening during the mini demo?
ssanderson11235
I gave an earlier version of this talk at JupyterCon 2017 https://www.youtube.com/watch?v=TtsbspKHJGo, which captured my full screen output. The pgcontents demo starts around 19:30 in that video.
chupapuma
Hey Scott, how did you get your unlisted YouTube link for your presentation? I don't think I ever found mine from the same conference.
ssanderson11235
I found it in the YouTube playlist from the event: https://www.youtube.com/playlist?list=PL055Epbe6d5aP6Ru42r7h....
chupapuma
Weird. Mine isn't there, or I am blind. shrugs Though I did find it by searching the O'Reilly user :)
fwdpropaganda
Come on. Quantopian doesn't have 100k users... maybe 5k?

EDIT: Here we go again... downvoted, then probably flagged and reprimanded by mod dang or something. Sigh.

I actually spent hundreds of hours on Quantopian, and from the activity in the forums you wouldn't think it has 100k users. Either that, or it's the most muted community on the internet.

cbanek
I'm working on building an educational environment with Jupyter, and I'm interested in the multiple hubs.

A few basic questions: Why multiple hubs? (was there some point of scale where you needed this) Did multiple hubs allow you to have better migrations? (where you drain one and move it over to the other)

Totally agree that state is the enemy of scale, so having a separate service backing your storage independent of what hub you're on seems like a big win.

Thanks for the great talk!

ssanderson11235
> Why multiple hubs?

A few reasons for this, most of which are related to points you mentioned:

1. Having multiple hubs makes it much easier to do zero-downtime deploys.

2. Having multiple hubs makes us more resilient to transient machine failures.

3. We were worried that having a single proxy for all our notebook traffic might become a system-wide bottleneck. Notebooks with a lot of images can get pretty large, and at the time we were rolling this out JupyterHub was pretty new. We weren't sure how well it was going to scale (the target audience for the JupyterHub team at the time was small labs and research teams), so it seemed safest to aim for horizontal scalability from the start. The JupyterHub team has since done a lot of awesome performance work to support the huge data science classes being taught at UC Berkeley, so it's possible that a single hub with the kubernetes spawner could handle our traffic today, but given points (1) and (2) plus the fact that we already have a working system, I don't have much incentive to find out :).

cbanek
That's great, thanks! I was also curious if you hit scale issues on just one hub. I agree, it's best practice to not have all your eggs in one basket. I'd love to see an HA hub where this would be all taken care of for me, but hopefully by the time we go live we'll have this.
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.