Chris Hickman and Jon Christensen of Kelsus and Rich Staats from Secret Stache continue their discussion about a session at DockerCon titled, “How to Create Effective Docker Images” by Abby Fuller of Amazon Web Services (AWS). Last week, they dove into Docker files, images, and what they are composed of, such as layers and cacheability. In this episode, they share even more tips on how to create effective container images, as well as multi-stage builds, garbage collection, and security.
Some of the highlights of the show include:
- Squash Tip: It’s an experimental Docker image build flag; when building a Docker image, you can pass this flag to the Docker build command to squash multiple layers into one
- Use it correctly, or the trade off is with cacheability; education is needed
- Multi-stage Builds: As you build an application, you go through various stages; may need to add dependencies to Docker image to build artifacts
- Compile Time vs. Runtime: Make images as efficient and small as possible; final image has only the runtime support, not the compile time support
- Multi-stage Technique: Multiple Docker files to build a final image and different base images at each stage; no Docker Delete command to get rid of what you don’t need
- With a single Docker file, you can have multiple base images; these assets put out artifacts into the image and offer subsequent commands and access to stages
- To make sure your images are secure, scan them for known vulnerabilities and to receive alerts; paid and free scanning services are available
- Lifecycle of creating and running images takes up space and generates garbage; Docker Garbage Collection manages clean up but not automatically, has Prune commands
- Takeaways: Less layers is definitely more; share layers when possible; build base image wisely; not all languages build the same; and keep it simple and avoid the extras
Links and Resources:
Rich: In episode 21 of Mobycast, we picked around last week’s conversation with part two of how to create effective container images. Welcome to Mobycast, a weekly conversation about containerization, Docker, and modern software deployment. Let’s jump right in.
Jon: Welcome listeners to another episode of Mobycast, and we have Rich, our producer with us, hey, Rich.
Rich: Hey, how’s it going?
Jon: Good, and Chris.
Chris: Hey Jon, hey Rich.
Jon: Rich, what have you been up to this week?
Rich: Last week, I was on vacation, and so this week is pretty much playing catch up from that vacation which hasn’t been terrible, but I’m a little bit under water. Not a lot of fun interesting things, really just answering emails and trying to put out the small fires that I’ve let kind of burn while I was gone.
Jon: Very cool, welcome back from vacation. Although it didn’t seem like you were gone because we worked virtually nd I think I had like two meetings with you while you’re on vacation. Welcome back.
Rich: Yeah, good to be back.
Jon: Chris, how about you, what have you been up to?
Chris: Let’s see, I’ve been getting on a lot of bike riding which is definitely my hobby of choice here during summers in Seattle, it’s a beautiful weather for it, maybe a little bit on the hot side but not complaining, and then also yesterday I was fortunate enough to be in a small setting with a famous actor, Kyle Maclachlan, the actor from Twin Peaks, and also the mayor in Portlandia.
I was in just a small setting with like 10 other people and just chatting with Kyle for about 20 minutes. He’s the winemaker for a very interesting wine called Pursued by Bear and just got to hang out with him. He’s a really down to earth guy, really easy to talk to, and it was very fun, I really enjoyed it.
Jon: That’s super fun, we’ll make sure to add him on Twitter about this episode. I have not been doing too much, we have some company in town, but then this morning I did something fun which is giving a webinar about Docker, it was called Agility on Steroids with Docker, and I gave it for the Scrum Alliance Group which is the largest group that gives certifications and training for Scrum and other Agile techniques, that was super cool, and they recorded it, so we’ll put a link to the recording on prodockertraining.com as soon as we can get our hands on it.
Today we’re going to continue discussing this topic that we started last week. Of course we’ve been doing several in a series of just recapping what we learned at DockerCon, which is great because we can digest it a little more slowly than a 3-day conference, and we can pick only the stuff that was super interesting, and it stays relevant for at least a few months after it happened so we can catch everybody up on what went down this year.
Last year we talked about Abby Fuller’s talk. She talked about creating effective Docker images and we got through about half of that, and we always pepper in some of our own opinions, and we did that, and today we’ll be doing the rest. I think we should just recap a little bit just at a very high level, like a few sentences of some of the things we covered last week, and then some of the things we’re about to cover today. Chris, would you mind doing that?
Chris: Sure. Last week we kind of dove into just the Docker file itself and images and what they’re composed of. Basically getting to layers, how those impact your actual Docker image, how starting with the right base layer, can assist the framework for what your final Docker image is going to be, and how small it’s going to be, and how fun it’s going to be one when building that image.
Talked about cacheability, how Docker caches these layers, and how that impacts the building of your images. We also talked about some tips around just what are the best practices for making sure that your images are as small as possible, and also as cacheable as possible.
Jon: I noticed in your notes here that there’s one tip that we didn’t talk about, the new experimental Docker build flag called Squash.
Chris: I mean we could we could kind of dive right in that a little bit. It’s an experimental flag, it’s not part of the stable builds yet, and really the intent behind that is that when you’re building your Docker image, you can pass this flag to the Docker build command, and it will then go in. If you’re in the process of building your Docker image, you’re creating four additional layers that Squash command will say, “No, I want to squash those four layers into a single one.”
It’s a way of again reducing the number of layers and making your images smaller. This is one of those things where you can probably get bitten by if you’re not careful where you then tradeoffs with cacheability. You’d have to use it correctly, and I’m sure this is one of the reasons why it’s still experimental, and there’s some education around it.
Jon: We haven’t used that in any of our work yet, but it’s interesting, it sounds like something we might like to experiment with, especially if we start running it up against the images that are hard to get into and out of registries that cause builds to take longer than we want.
Jon: For the rest of the day, it looks like we’re going to talk about multistage builds, security, garbage collection. I don’t even know what multistage build really means, maybe we can start there. What does that mean?
Chris: So this has been an ongoing pattern with Docker in building your images where usually as you want to build your application, you’re going through various different stages, right? Sometimes you may be needing to add things to your Docker image just to build artifacts, and you may need dependencies as part of that process of building the artifacts, in the acts, those dependencies, you don’t need them anymore in the final runtime image if you will.
You think of it maybe as kind of like compiled time versus runtime dependencies, and so there’s been just kind of an ongoing issue with Docker images, and like again in the spirit of how do you make them as efficient and as small as possible, you basically want your final image, you want to end up with something that just has the runtime support, and doesn’t have the compiled time support, if you will.
Chris: Yeah, that’s a good example. Those LESS files, they’re not used in the final runtime, they’re used to produce generated codes. It’s really generated code that is the runtime stuff, and the LESS code, that intermediate code is no longer needed. That is exactly what multistage builds, and then also again, there’s been other techniques to get around this issue of, I shouldn’t be including that stuff in my final Docker image if it’s really not part of the runtime environment. That’s a great example.
Maybe another example would be, you need to install a compiler in the tool chain to actually build native codes. It could be C++ bindings for a few note packages.
Jon: This thing has got Image Magic. This is the classic thing that you have to build into systems, that web application.
Chris: Right, yeah, exactly. These are things that you need as part of the process of like building your final artifacts and the actual runtime support that you need, you only need it during the building process. How do you make sure that they’re not part of the final Docker image, and so in the past before–so the multistage builds, this is a newish feature in Docker. It’s been around for a little over a year now, but relatively newish.
Before then, the way around this was basically do it yourself manually through using multiple Docker files. This technique is kind of similar to having your own shared common base image that you might have, that’s shared across multiple applications that you’re building to kind of an optimized trusted base image that has the stuff that you need in it.
This is kind of like the same technique where you would have multiple Docker files to build your final image, and then you have these different base images as each one of these stages in the pipeline, and then you’re just adding the stuff that you need at each one of these stages. It ends up being––you can do it, but definitely a lot more work and also a lot more just not very reusable.
Jon: I guess I’m a little confused by this because my mind is like, “Wait, why don’t I just build this stuff and just delete the stuff I don’t need?” I’m thinking that the reason that I can’t just delete the stuff I don’t need is because either there isn’t Docker file delete command, or if there is, it doesn’t actually reduce the size of my image because the deleting is just another layer on top of the things that got added already. Do you know which of those it is, or if I’m just misunderstanding altogether?
Chris: I think one of the issue is, there is no Docker delete command. In Docker, you have commands like add and copy. Those are actually aware of the content that it’s putting on there and it’s actually doing checksums on the content that’s been adding with that stuff.
The other commands are run commands. You could delete things with the run command, so you would do run and then RM. When you do that, it’s not taking to account the actual content that’s being removed, it just sees it as a run command.
Jon: Right, so it’s just adding a layer.
Chris: And not only that, it’s probably not how do you delete a compiler tool chain–– how do you actually clean out all that type of thing? Maybe you can do it using your favorite package manager and do an uninstall. But again, it’s not really going to give you what it is that you need. Instead, you kind of want to have these starting from your base image you’re just adding to it. So it’s more of an added process than a subtractive.
Jon: Right, so then just help me understand what would happen with multiple Docker images, in the first one, I would go and do a bunch of compiling after I get my base image, and then I would put some files on it, and compile those files, and then the results of the compilation would end up in some folder, and then in the second image, I would say, “Okay, get my base image again,” and this time instead of compiling, I’m going to grab the stuff that ended up getting compiled from the other image and just put it directly in already prebuilt, is that sort of what you’re saying? So that the whole compilation doesn’t have to take place?
Chris: It depends on the situation. This is definitely what we get with multistage builds in Docker. Using multiple Docker files, it’s a little bit more complicated than that, and I think typically it’s more along the lines of like I have something that’s for the development environment like debug versus my production build.
It’s very much an open-ended thing. It’s going to be very much specific to how you design it. There’s various techniques and how you’re dealing with the intermediate output from each one of these Docker builds is really kind of up to you, and that’s why Docker implemented this multistage builds feature because it cleans all the stuff up and makes it so much more intuitive, and straightforward, and it really gets to the core use case of what it is that you’re trying to do.
Jon: Got it. It makes sense.
Rich: Hey, this is Rich. You might recognize me as the guy who introduced the show, but is pretty much silent during the meat of the podcast. The truth is, these topics are oftentimes incredibly complex and I’m just too inexperienced to provide much value. What you might not know is that Jon and Chris created a training product to help developers of all skill sets get caught up to speed on AWS and Docker. If you’re like me and feel under water in these conversations, head on over to prodockertraining.com and be on the mailing list for the inaugural course. Okay, let’s dive back in.
Jon: Okay, so then, how does the multistage build kind of work?
Chris: At its core, it’s very simple. What it is, is with the single Dockerfile, you essentially can have multiple base images. What the pattern ends up being is you will start off with whatever base image you’re using, it will go and compile, build assets that will put artifacts out and into some area of that particular part of the image, and then you can have subsequent commands that now say, I’m going to start with a different base image. Now, I can go with my lighter image. Maybe I started off with my full Ubuntu image because that has all my compiled time support, and all the other dependencies that I need, I went and go build all my artifacts.
I know my artifacts are going into a certain output directory that’s configured by my tool chain, and now I can have a second stage defined in that Dockerfile that says, “I’m now going to start from maybe an alpine base image or a really small tight image,” and then what I can do is I can say, “I’m going to do a copy command.” With the copy command, you can now give, instead of just saying the files are coming from your local file system, you can actually have them come from one of the previous stages in your Dockerfile.
Now, I can basically copy my artifacts from that previous stage and you specify where you want it to come from. So, really, really powerful. It’s totally aligned with this use case. It does exactly what it is that you need to do where you’re just plucking the artifacts that you need to pull into your image type of thing, so very powerful.
Jon: Yeah, that’s super cool, that makes a lot of sense. I think I get it now, it’s like being able to refer to a different stage and grab files out of that. I don’t think it needs much further explanations. Thanks, that helps. I would guess we probably haven’t had a chance to really try this yet.
Chris: No, we need to do it. It’s definitely on the to-do list to go through, obviously. It’s a pretty big change across all of our projects.
Jon: It’s like I said today on the webinar that I gave, that your CSC pipeline is always about halfway done. No matter how long you work on it like, “Yup, we’re about halfway done.”
Chris: Never quite get there. That’s just true with technology in general. It’s like, “Hey, I’m up to date as of today.” No you’re not, because there were some announcements yesterday.
Jon: The next one makes me nervous because we’ve just already been talking for a long time and you just can’t even say the word security without having an hour long conversation, but maybe let’s see what we have to talk about with security for images?
Chris: Actually this one is pretty simple and straightforward, it’s not going to take much time at all. This is again a recap of the talk Abby Fuller gave at DockerCon 2018 about how to build effective container images.
Again, one of these topics of the discussion was security, what can you do around images? Pretty straightforward images, use a scanning service. Basically scan your images for known vulnerabilities and to be alerted of it. It’s things like, maybe there’s a bug in Ubuntu, a security issue with Ubuntu distribution, a certain version number, and you happen to be using it.
If you’re using one of these scanning services, they’re going to let you know that you’re using this code that has this known vulnerability. There’s multiple of these scanning services out there. I think we’ve touched on this on some previous Mobycast episodes, so there’s a mixture of paid and free tools out there. There’s really no reason not to do this.
Check out Microscanner from Aqua that’s an open source. The community edition has as an open source free tool. There’s also Docker itself. There is security scan with their trusted registry as a paid product, and then there’s other offerings, there’s Clair CoreOS and there’s others out there.
Definitely give that a look. It should be part of your process of just building your Docker images. Put them through a security scan and just see if there’s any binary code there that just is not safe and has known vulnerabilities, so you can go in and patch it up.
Jon: That’s pretty cool, yeah there are scanners at every level now. You can scan your repository, and then later after you build everything, you can scan your image.
Jon: Cool. Next, what is a Docker garbage collection all about? I know what it is in programming languages and running Java garbage collection, and objective C garbage collection, but what’s Docker garbage collection?
Chris: Yeah so, using Docker and the whole lifecycle of creating images, running those images, and then lather, rinse, repeat. This actually ends up taking lots of space, especially disk space. It creates a lot of artifacts and garbage.
You go back to back in the day when Bill Gates said, “I can’t imagine any computer needing more than 640 kilobytes of RAM,” or something like that. He didn’t know about Docker. Just about everyone on the primitives of Docker, the fundamental elements of the Docker tool chain will have these artifacts.
You have images, you have containers, you have volumes, you have things like even networks–all these things and as you are going through the Docker process and running them, They will hang around potentially, and they’re not going to get necessarily cleaned up automatically.
Jon: Which could be a good thing. That could be a good thing because what we want when we have lots of containers running on a machine is like, fast startup, so keeping them around can really help with startup time.
Chris: Yeah, absolutely, and there’s also things like cacheability, layers, and making sure that those layers are there. There’s definitely some tradeoffs that’s going to be very dependent on your environment, and that’s one of the reasons why Docker doesn’t do it automatically for you.
You’ll find if you do, there’s a difference of garbage collection with Docker, there’s different techniques for doing it locally versus production. Locally ends up being definitely much more of a manual process, it’s kind of up to you as the individual to deal with that in production.
You can rely much more on your orchestrator to do this, and mostly orchestration systems will clean up for you automatically. Talking about that, for production, we use Amazon ECS, the elastic container service. That has its own built in garbage collection.
There is an agent or a piece of software that sits on each one of the nodes in the ECS cluster and that just takes on the task of everyone’s local history and just cleans up any images that are no longer being referenced by any containers, any containers that are stopped and no longer being used, and just making sure that it’s being kept relatively clean.
That’s important because if you don’t do that, at some point, you’re going to try to start a container and you’re going to get an error saying, “Sorry, we couldn’t use this container because there’s no more room on disk.” You’ve ran out of disk space, so again, it’s much better now, the orchestrators have gotten more sophisticated, they give this to you and you don’t have to worry about it as much.
A couple of years ago, that definitely wasn’t the case, this is one of the ongoing problems of running Docker and production, was how do you keep your disk clean and not run into this issue of running out of space.
On the local side, to do it yourself manually in the past, again, it’s been more difficult. You had to by hand do this, individually remove images, very tedious process; but recently Docker has added some nice commands into it. These are prune commands. Now you can say, if you want to just get rid of all the images that are no longer being used locally, just run Docker image prune-A, and that will blow them all away for you. What used to take minutes of time is now something that just is instantaneous, it’s really easy to do, and there’s no reason not to do it.
There’s in addition to printing images, there’s also the commands for pruning networks, and for pruning containers, and then there’s an Uber command called Docker system prune and that will do everything for you. Definitely check that out, make sure you’re doing that, keeping your disk from getting clogged up with all these Docker artifacts, and we’re good to go.
Jon: Super helpful. We have one more point on the list of notes that says takeaways and I’m not sure if that was an additional thing that we wanted to talk about, or if you just wanted to wrap basically right here.
Chris: We can just kind of wrap up quickly, which is kind of like the overall takeaways from this particular session of DockerCon, just in a nutshell the major takeaways from this were one, less layers is definitely more. Really think about how you’re building your Docker images, and strive to keep the number of layers minimal. You also want to share layers where possible. Basically this concept of reuse, so that you don’t have to reinvent the build for every single one of your applications, so strive for that.
You also want to choose your base image wisely. You’re not going to be successful if you don’t start off with a good base image–with the right base image, so that’s a pretty important choice, and a lot of times people don’t put too much choice into that. They choose that base image based upon convenience, not necessarily around what was going to generate the best possible final image. Definitely consider that strongly.
Also another key takeaway would be that not all languages should build the same. How do you build a Docker image for a node application is going to be different than how you build one for Rails, don’t treat them the same. There are application demand specific techniques that make more sense and a lot of times this deals with how dependencies get installed, and cacheability and whatnot, definitely keep that in mind. Finally, just keep it as simple as possible and avoid the extra. Keep the service area small and only put in there what you need.
Jon: Super good. I certainly learned quite a bit here. I’m about ready to stop my job as a person that runs a company and become a Docker developer. I appreciate it, Chris.
Jon: Thanks so much, and Rich, our producer Rich, thanks for joining us as well.
Rich: Yeah. Thanks guys. Well, dear listener, you made it to the end. We appreciate your time and invite you to continue the conversation with us online. This episode, along with the show notes and other valuable resources is available at mobycast.fm/21. If you have any questions or additional insights, we encourage you to leave us a comment there. Thank you and we’ll see you again next week.