What are best practices and/or tools for updating AMIs on a regular basis?

Question

I have a Master-Workers style computation model. And, I launch the workers via an AMI through an Ansible playbook.

The workers' AMI has all the necessary configs needed for doing the computation (which is mostly a pipeline of ML tasks).

However, we keep updating and optimizing our algorithms and ML models almost on a regular basis, which means I have to keep my Worker AMI's up-to-date.

So, is there a tool or a technique which would help me do that?

Also, is it a scalable practise, or is doing the containerized way better? [By which, I mean launching an EC2 instance and pulling a container and do the computations inside a container].

Michael Pereira · Answer 1 · 2017-03-15T17:13:32.717

4

We are planning to automate this process for our Jenkins EC2 slaves.

Currently we have to manually update the AMI ID every time we build a new AMI, but by taking advantage of the config.xml file where Jenkins store all its configuration, we should be able to automatically update the AMI value in this file and then restart Jenkins to take those changes into effect.

EDIT: see comments below for a better way to make those changes rather the config.xml file

edited Mar 15 '17 at 17:13

answered Mar 14 '17 at 14:42

Michael Pereira

651
4
12

1

if you are building new AMIs via Jenkins itself in any sort of automated way where you can parse the output for the ID, you can update them via postbuild groovy script, see here - https://github.com/jenkinsci/ec2-plugin/pull/154 – Michael Bravo Mar 14 '17 at 22:00
Wow thanks a lot for that info, it sounds way better than doing the update in the xml file directly! – Michael Pereira Mar 14 '17 at 22:38

score 3 · Answer 2 · answered Mar 14 '17 at 08:58

I can answer on how we do it actually:

We're using a private Gitlab on AWS and its CI system. We have a repository for our build environment (a docker image with packer/scripts/json describing amis) and one with a .gitlab-ci.yml which describe the build and export tasks for each of our base AMI.
Each project build its own AMI from our base one on every push and we use a DynamoDb table to store the versions and ami ID for deploys.

This could be an approach, where your ML taks update would trigger the AMI build.

If you go on the docker way, you should have a look at AWS ECS which will cut the burden of starting an EC2 instance before pulling the docker image.

Another option could be to move totally to Amazon Machine Learning but in this case you will raise the burden of migrating you jobs outside of AWS if the need arise.

There's no "best" way in my opinion, all are valid and the one you're the most comfortable with should be your choice :)

Ami stands for Amazon machine image, ec2 stand for elastic cloud compute (IIRC) I'm pretty sure google can answer that ad nauseam :) — Tensibai, Mar 15 '17 at 11:46
a good way to help people understand what's behind the acronyms is to link to the aws open guide — Michael Pereira, Mar 15 '17 at 17:15

Adam Berry · Answer 3 · 2017-03-14T19:09:14.443

You should take a look at packer, we've found it very helpful in automating builds of image types, including AMI's.

We make use of the chef client provisioner with cookbooks to manage the changes applied to an image, although many other provisioners (shell, puppet, ansible, etc) are supported. We can output docker images, AMIs and vagrant boxes from shared configuration, all version controlled with the rest of our config management code.

See here for a breakdown of where this fits in a broader process, but your approach sounds sensible.

Using tools like packer means that if you decide to switch to containers, that the image build side is pretty trivial to change over.

It might be helpful to explain why you found it useful for readers in future—take a look at [answer] for tips on how to provide a link with details. — Aurora0001, Mar 14 '17 at 18:57

score 3 · Answer 4 · answered Mar 15 '17 at 21:06

Using containers for such computation nodes makes a lot of sense, and it would offer other advantages, too. But it's not a silver bullet. You'd still have to work at it.

To really answer your question I need to know how you're deploying your system (is it with an ASG? manually? with some tool, like Ansible?).

If you're using an ASG, dockerizing it wouldn't solve the problem of having to update the AMI. It would still need to be updated, since the launch-configuration refers to it, and the ASG will spin up "old" versions. of course, you can make your ASG instances smarter, by using Chef or something equivalent and let them provision themselves and fetch the latest version of what they're supposed to be running. But that means investing in this extra infrastructure, if you don't already have it; and it also means increased scaling latency, since it takes time to do this self-update when a new instance is spun up.

If you're not using an ASG, you can just push the latest image onto your instances when you deploy. Here a dockerized solution would indeed be pretty handy. e.g. if you join your instances into a Docker swarm, you can deploy a new version with one instruction and it'll be done with a rolling upgrade, etc.

So, as you can see, there are a bunch of ways to do it. Depends on what you're already doing now and what your needs are.

score 2 · Answer 5 · answered Mar 24 '17 at 15:31

2

Perhaps you might benefit from abstracting your platform (the AMI with base dependencies) from your software. Do you really need to re-roll the AMI every time you update your algorithms and models? Or, do you re-roll when you have a library change and add a deployment setup to your setup that sets up your ML stack on the AMI? If you can rework your execution model to support this you will likely have a higher iteration velocity.

If you can't, I would suggest having your CI process generate and publish new AMIs or images.

answered Mar 24 '17 at 15:31

Matt

109
1

Re rolling the AMI so it's ready to start computing and not configured at launch is an important part of the scalability of a ML cluster. Going a step back to a phoenix server model where they are already on a immutable server pattern sounds counter productive and will introduce delay in the scale up process, which means adding costs in AWS term for non profitable time (the configuration/deployment time). – Tensibai Mar 24 '17 at 15:36

What are best practices and/or tools for updating AMIs on a regular basis?

5 Answers5