HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Cgroups, namespaces, and beyond: what are containers made from?

Docker · Youtube · 3 HN points · 4 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Docker's video "Cgroups, namespaces, and beyond: what are containers made from?".
Youtube Summary
with Jérôme Petazzoni, Tinkerer Extraordinaire, Docker

Linux containers are different from Solaris Zones or BSD Jails: they use discrete kernel features like cgroups, namespaces, SELinux, and more. We will describe those mechanisms in depth, as well as demo how to put them together to produce a container. We will also highlight how different container runtimes compare to each other.

Learn more about Docker http://www.docker.com/what-docker

--

Docker is an open platform for developers and system administrators to build, ship and run distributed applications. With Docker, IT organizations shrink application delivery from months to minutes, frictionlessly move workloads between data centers and the cloud and can achieve up to 20X greater efficiency in their use of computing resources. Inspired by an active community and by transparent, open source innovation, Docker containers have been downloaded more than 700 million times and Docker is used by millions of developers across thousands of the world’s most innovative organizations, including eBay, Baidu, the BBC, Goldman Sachs, Groupon, ING, Yelp, and Spotify. Docker’s rapid adoption has catalyzed an active ecosystem, resulting in more than 180,000 “Dockerized” applications, over 40 Docker-related startups and integration partnerships with AWS, Cloud Foundry, Google, IBM, Microsoft, OpenStack, Rackspace, Red Hat and VMware.
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
>There is controversy about Docker not running on FreeBSD but I believe (like many others) that FreeBSD has a more powerful tool. Jails are older and more mature - and by far - than any containerization solution on Linux.

If FreeBSD jails and Solaris zones were equivalent to Linux containers, we'd have seen them take over the backend already. We haven't. They're really useful, they provided a degree of safety and peace of mind for multi-tenancy but they're not granular enough for what's done with $CONTAINER_RUNTIME these days.

Jérôme Petazzoni has an old talk where he touches upon container primitives and compared them to jails : https://www.youtube.com/watch?v=sK5i-N34im8

area51org
Jails are not a replacement for containers.
yjftsjthsd-h
I think the problem is that docker is an excellent frontend, and zones and jails are excellent backends. People who say jails are better are probably right but they're missing the point, because they're not really solving the same problem; until I can use jails to create a container image, push it to a registry, and pull it from that registry and run it on a dozen servers - and do each of those steps in a single trivial command - jails are not useful for the thing that people care about docker for.
Apr 29, 2020 · 2 points, 0 comments · submitted by crunchbang123
In the same vein:

Building a container from scratch in Go (Liz Rice) @ Container Camp 2016 -> https://www.youtube.com/watch?v=Utf-A4rODH8

What's a container really let's write one in go from sctach (Liz Rice) @ Golang UK Conference -> https://www.youtube.com/watch?v=HPuvDm8IC-4

Cgroups, Namespaces and beyond: What are containers made from (Jerome Petazzoni) @ DockerCon 2015 - https://www.youtube.com/watch?v=sK5i-N34im8

Building Containers in Pure Bash and C (Jessica Frazelle) @ ContainerSummit 2016: https://containersummit.io/events/nyc-2016/videos/building-c...

First two are basically the same talk, but it doesn't hurt to hear the same ideas more than once.

gt5050
I found this introduction to Linux namespaces very good

https://medium.com/@teddyking/linux-namespaces-850489d3ccf

zerr
So basically it is just the usage of particular API the underlying OS provides.
erikb
That's why some people believe that in 10 years we won't have docker, we won't have kubernetes, but that it will be intuitively integrated into the OS through new system design patterns that still have to appear out of all the crazy experiments we're doing.
hodgesrm
That might be true of Docker. Kubernetes on the other hand is a way to deploy and manage distributed applications, which are by definition bigger than any single OS.
erikb
Most Network protocols are a state machine in the kernel. That the end result of a tcp state machine is a communication between two computers is more of a lucky coincidence, that results from smart state machine design.
monocasa
Not if the OS starts to intrinsically view itself as a node in a distributed system. Mainframes sort of see SMP and separate nodes connected over a network as two sides of the same coin, just loosely vs. tightly coupled.

https://www.ibm.com/support/knowledgecenter/en/SSGU8G_12.1.0...

icebraining
Also Plan9:

"Since CPU servers and terminals use the same kernel, users may choose to run programs locally on their terminals or remotely on CPU servers. The organization of Plan 9 hides the details of system connectivity allowing both users and administrators to configure their environment to be as distributed or centralized as they wish. Simple commands support the construction of a locally represented name space spanning many machines and networks. At work, users tend to use their terminals like workstations, running interactive programs locally and reserving the CPU servers for data or compute intensive jobs such as compiling and computing chess endgames. At home or when connected over a slow network, users tend to do most work on the CPU server to minimize traffic on the slow links. The goal of the network organization is to provide the same environment to the user wherever resources are used."

http://doc.cat-v.org/plan_9/4th_edition/papers/net/

Annatar
That already exists and has existed for several years now: SmartOS. imgadm(1M) and vmadm(1M) are core parts of the OS and do what Docker does, and more. Built not as an experiment, but to power a large scale commercial cloud business. And freeware / open source since before the project went live by virtue of OpenSolaris (now illumos).
ntnn
Yes, I've given talks on what containers actually are and which technologies they combine. Containers oldest technology would be mount namespaces from 2002 - the youngest are user namespaces from 2013 (on linux anyway, solaris and IBM had containers before).

People are always amazed that container products are actually mostly just the glue around what the kernel already provides and that their history goes back almost two decades.

Huggernaut
cgroup namespaces are a little younger and came arrived in 4.6 I think http://man7.org/linux/man-pages/man7/cgroup_namespaces.7.htm...
ntnn
cgroups v2 probably, cgroups v1 is definitely older.
Huggernaut
Nah, I'm talking about the cgroup _namespace_ specifically, not cgroups in general.
tyingq
Maybe even further. Mainframes have had similar for decades. Yes, perhaps more like actual VM's, but the namespacing, cgroup stuff, network overlays, etc, are very similar.

My mainframe folks, when talking about docker, universally yawn.

icebraining
So do the Linux guys. Virtuozzo (later OpenVZ) has been around for almost twenty years.
devmunchies
Can someone explain why starting a docker container usually has a 200-300ms start up penalty? Its fine if I want to start a web server but is long if its a just a precompiled script that runs on-command and then dies.

Is it Docker that is taking 200ms to start the container or just the nature of the OS APIs?

tyingq
Could be as simple as the init() process. Have you tried a slimmed down guest distro like Alpine and compared startup times?
zenlikethat
Mostly it's Docker-specific, but all of the things causing this latency tend to be what makes Docker useful in the first place. In particular, networking and mounts.

For starters, if you're just running the typical Docker set up to talk back and forth using a UNIX or TCP socket, all of _that_ networking and HTTP/JSON encoding+decoding will add some overhead. This is even worse if you're going through a proxy layer like the socket which Docker for Mac creates and forwards to the Linux VM. OK, so there's at least a few milliseconds of latency built in for this, the socket forwarding hop on my Docker for Mac seems to add ~5-15ms for instance.

Once the daemon receives the request, runc, the command that actually does the work to create a container without all that hoopla, is invoked.

Then, to actually start up a container quite a few mounts get set up either for devices (potentially including tty in the case of `run -t`) or for the union filesystem. Take a look at the output of `mount` in a container sometime, around 30-40 mounts (and all those overlay layers...) and a lot of that gets set up on the fly because each container has its own unique mount namespace and view of the world.

Then after all of that, you need to set up networking too (`docker run --net none` will skip this), otherwise containers can't talk to the Internet, to each other, or listen on ports of the host. Remember, each has its own network namespace, created from scratch. So, Docker's doing all manner of adding port proxies, modifying iptables, attaching containers to interfaces, and so on. Otherwise, `docker run alpine wget -qO- https://google.com` or `docker run --net=internalnet` wouldn't work out of the box.

The work isn't easily done concurrently since there are so many dependencies - e.g., you need to have a process in order to set a network namespace, in order to run that process, you need to have a root FS prepared to pivot root into, and so on.

Meanwhile to all of this happening the Docker daemon is doing god knows what, it could be pulling images, or running other containers already, supervising processes, etc. While many of those things are concurrent due to use of goroutines, they do eat up resources and generally too much work saturates and slows down the daemon. That's not really unique to Docker though, any program doing all this stuff at once would have that issue.

Anyway, that's why Docker takes way longer to start a process than good old execve() in your shell.

DannyB2
A container is launching a process in a way that the kernel lets it believe it can be root, have any illusion of a filesystem and network connectivity that you want it to believe, and be fooled into thinking it is the only process running on this kernel.

Therefore, it is as light weight as launching a process?

Is that about right?

eftychis
Pretty much. Usually you run your own init that starts the desired process. If you as a process have id (pid) 1 then you need to manage orphaned processes (and zombies) and general signal handling (e.g. SIGTERM).

There is a lot of cgroups and capabilities tweaking involved also. You can also use bpf (e.g. seccomp-bpf to restrict system calls etc.

In general you i) add the illusion of king in your little own world ii) provide necessary handling that the kernel expects from a king iii) restrict resources (e.g. memory, system calls,...) and facilitate communication with the rest of the worlds.

zenlikethat
There is also now a `--init` flag for docker run and for dockerd that will run a Pretty Reasonable PID1 (tini) automatically.
tyingq
Yup. It's all namespaces and cgroups. The base "docker" command adds a little value with os respositories, network overlays, etc, but not a lot.

Personally, I think K8S with their cri-o (https://github.com/kubernetes-incubator/cri-o/blob/master/RE...) runtime is going to eventually eat Docker's lunch.

As a company, Docker needs Swarm/Enterprise to take off in order to have a differentiator. That isn't happening.

Kubernetes, on the other hand, just needs cri-o (mostly done) and an image repository / builder to kill off Docker.

See this for how small the gap really is: https://github.com/p8952/bocker/blob/master/README.md

Basically, Docker has almost no moat. They have mostly only brand recognition. Not discounting their efforts, but they are playing checkers and Google is playing chess.

raesene9
Well they have multi-platform. Windows containers are intrinsically tied up with Docker (the current Windows containers docs start with a Docker EE Basic install).

My personal expectation is that Docker will be bought by Microsoft, probably this year or next ...

None
None
imtringued
>Basically, Docker has almost no moat. They have mostly only brand recognition. Not discounting their efforts, but they are playing checkers and Google is playing chess.

Well how do you build images without docker?

tyingq
Right. That's the entirety of their moat, plus brand recognition. What would Google have to spend to overcome that?
hardwaresofton
I think this is how I most commonly recognize someone that is familiar with lxc +/- docker and containerization in general. A firm grasp of the fact that they're just beefy processes, isolated wiht namespaces and cgroups is the best, most succint way to describe docker (without even mentioning the benefits), but it also requires that the hearer knows what namespaces and cgroups are.

If I had to rank understanding in explanations of docker:

1. "Makes your application really portable"/other explanations that only cover the benefits of docker not how it works

2. "Lightweight VMs"

3. "LXC + some other stuff"

4. "processes + namespaces + cgroups (+/- image management tools, etc)"

100% agreed on the point you made, pretty sure docker+swarm and other orchestration efforts have basically lost the competition. Kubernetes just has the mind share, and even better than that -- it's actually good.

Kubernetes also has multiple competing container runtimes all rushing to fit the CRI (Container Runtime Interface). Just some of the stuff out there:

* cri-containerd - https://github.com/containerd/cri-containerd

* runv (hypervisor based) - https://github.com/hyperhq/runv

* clear containers (hypervisor based) - https://github.com/clearcontainers/runtime

* kata containers (hypervisor based, collaboration of runv + clear container) - https://katacontainers.io/

* frakti (combination of runv & docker, enabling switching @ runtime) - https://github.com/kubernetes/frakti

A bunch of the projects are in their infancy, but the CNCF/kubernetes and the community are doing the right thing investing in lots of them, and letting the good ones bubble to the top. I don't run a production kubernetes cluster but I've found cri-containerd to be really easy to use and haven't had any major problems with it, mostly small configuration things.

The last one, frakti, gets me really excited, because it lets you be flexible about which containers run in more protected environments and which don't. Also really exciting is frakti v2 (https://github.com/kubernetes/frakti/tree/containerd-kata), which is kata + containerd.

incadenza
I’m about a 1.5

Any good resources you could suggest for learning more about what you describe as the relevant areas?

hardwaresofton
I want to point out that I'm by no means an expert -- the real experts are the people in the talks that I mentioned, the core contributors to the libraries.

I think a good place to start is those talks (and stuff from any container-centric conferences), along with lots and lots of practice using containers.

In general, I'm pretty sure if you read up on chrooting, processes on linux, process isolation on linux, LXC, then the relevant standards/tools that underly docker like runc (https://github.com/opencontainers/runc), you'll have a pretty deep understanding.

Also, for day-to-day use, I honestly think you can do just light research on the above topics and start using docker and know way more than the average developer. As you use it more and more, you'll gain more intuition, and when you bump up against certain issues, you'll probably gain some intuition as to where things are going wrong (though honestly the toolchains are pretty stable now).

The goal of a lot of these projects is to be so stable you don't have to worry about it, so I don't feel too guilty about it, in the same way I don't feel too guilty about not ever having cache-line-optimized a program in my life.

tyingq
Kinda feels like I'm I a foreign country and run into someone that speaks my language :). Thanks for the additional info, like frakti, for example...didn't know about that.

Explaining docker is frustrating for me because I was a Unix admin back in the 90's. So, for me, it's pretty easy to see what it is. And, it isn't new. There isn't much it does that Solaris zones or BSD jails, didn't do. And those predate docker by years. Namespaces, cgroups, and pivot_root. That's basically all of it. Kudos to docker for marketing it better.

Explaining it, on the other hand, to a broad audience...

hardwaresofton
Yeah I didn't either, until not too long ago but now I'm pretty excited to use it.

As a person who wasn't a linux admin back in the 90s, and doesn't normally hang out on mailing lists, the first time I saw lxc was some random article on lwn.net (I don't hang out there, but they have some super high quality articles), and I definitely didn't put together how much of a difference it was poised to make, or even know why it was a good idea (people were still getting used to vagrant everywhere at the time) -- docker definitely did the community a service by bringing the hype train, if only so that once the hype subsided containers would be here to stay.

Annatar
As a person who wasn't a linux admin back in the 90s,

Nobody was a Linux admin in the ‘90’s: we ran HP-UX, IRIX, OSF/1 DECUnix (and Ultrix), AIX, Solaris and NetBSD; those were our Linux we grew up on like you grew up on a Linux ISO on your parents’ PC bucket.

Torgo
We have a 1999 VA Linux box still "in production" (just to see how long it will go, at this point. Nothing mission-critical.)
js2
I ran a small ISP on a pair of Linux (Slackware probably) boxes, a Livingston Portmaster, a bank of Hayes modems, and a T1 in the mid-90s. So some of us were Linux admins then.
Annatar
That's crazy: Linux isn't rightly usable even now 20+ years later, and you ran it in mid '90's when it could barely boot a shell reliably. And Slackware no less, which meant dumping tape archives everywhere instead of package management.

Crazy.

tyingq
Me too. Maintained a small (100 or so) fleet of Slackware Linux desktops for a support org in the early 90's. Rare, but did exist. Was the install 13 1.44mb floppies? Seems to ring a bell. Lots of waiting for the prompt to switch out the floppy. And a more intimate relationship with "dd", "kermit", etc, than I remember ever encountering again.

Also, whoever wrote x3270. Thank you!

d0mine
From outside (I know nothing about zones, jails) it looks like saying Dropbox is not new (in 2007) because rsync was not new.
tyingq
Look into either. They are pretty much the same thing as docker. It's isn't a superficial similarity.

For example, filed in 2003: https://patents.google.com/patent/US20050021788

zbentley
That's sorta true, but the layered-userspace-filesystem support is a pretty big feature that Docker pioneered Linux support for.

Did userspace filesystems/layered filesystem-in-a-box implementations exist before? Sure: from ZFS to squashfs to tar, all the components were around.

But Docker's popularity is due to its integration; not its novelty. By bundling a filesystem-in-a-box abstraction layer with varying degrees of native OS support into something like a mountable artifact/image, hiding the image caching mechanics behind a nice CLI and image-configuration file format (love 'em or hate 'em, Dockerfiles are a phenomenal example of minimalism and convenience: add some meta commands, and everything else is just shell), and integrating all that with the privacy/isolation (namespaces/chroots) and resource management (cgroups/quotas) stuff, Docker made a really, really powerful model of thinking about "containers" as a single concept.

The thought model is Docker's big achievement: other competitors may unseat Docker eventually, but the concept of "container" as a single, unitary thing that's almost like a VM image will stick around, and the more piecemeal understanding of how containers work will be largely unnecessary, and thus save a lot of time for a lot of people.

tyingq
I agree with all of that, but none of it is their intellectual property. It's mostly brand recognition that keeps them afloat.

K8s could easily release a command line clone and unseat docker in fairly short order.

Not dimininishing the value of what the docker folks bundled together and marketed. They did a terrific job. It just isn't very protected.

cyphar
LXC had most (if not all) of what you're describing, and a long time ago Docker was just a simple wrapper around LXC. A lot of the inspiration for containers came from other operating systems (FreeBSD Jails and Solaris Zones) as well as previous work such as Xen.

I think what made Docker popular was that it was easier for developers to use, with small bits of information like what ports the container wants to listen on and so on. From an administration standpoint, LXC was already more than good enough.

yoz-y
One important thing for me is that Docker also handles other operating systems than Linux through their respective hypervisors.

Docker might well die but first I will have to be able to locally build and test containers on Windows and macOS, before deploying them to the Kubernetes cluster–without fooling around with Virtualbox, Vagrant and the like.

tyingq
Docker in ~100 lines of bash: https://github.com/p8952/bocker
I watched this talk couple of years ago that made docker/containers finally click for me.

Cgroups, namespaces, and beyond: what are containers made from? - Jérôme Petazzoni,

https://www.youtube.com/watch?v=sK5i-N34im8

Jérôme Petazzoni's talk on cgroups, namespaces, etc. I think he has given the talk a bunch of times, I saw it in person once, here is a video I found:

https://www.youtube.com/watch?v=sK5i-N34im8

He goes into cgroups, namespaces, etc. Then does a demo where is manually does what docker does, like untars an image, creates namespaces, creates the networking.

Dec 11, 2015 · 1 points, 0 comments · submitted by pandog
HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.