7

I have a Master-Workers style computation model. And, I launch the workers via an AMI through an Ansible playbook.

The workers' AMI has all the necessary configs needed for doing the computation (which is mostly a pipeline of ML tasks).

However, we keep updating and optimizing our algorithms and ML models almost on a regular basis, which means I have to keep my Worker AMI's up-to-date.

So, is there a tool or a technique which would help me do that?

Also, is it a scalable practise, or is doing the containerized way better? [By which, I mean launching an EC2 instance and pulling a container and do the computations inside a container].

030
  • 13,235
  • 16
  • 74
  • 173
Dawny33
  • 2,816
  • 3
  • 23
  • 62

5 Answers5

4

We are planning to automate this process for our Jenkins EC2 slaves.

Currently we have to manually update the AMI ID every time we build a new AMI, but by taking advantage of the config.xml file where Jenkins store all its configuration, we should be able to automatically update the AMI value in this file and then restart Jenkins to take those changes into effect.

EDIT: see comments below for a better way to make those changes rather the config.xml file

Michael Pereira
  • 651
  • 4
  • 12
  • 1
    if you are building new AMIs via Jenkins itself in any sort of automated way where you can parse the output for the ID, you can update them via postbuild groovy script, see here - https://github.com/jenkinsci/ec2-plugin/pull/154 – Michael Bravo Mar 14 '17 at 22:00
  • Wow thanks a lot for that info, it sounds way better than doing the update in the xml file directly! – Michael Pereira Mar 14 '17 at 22:38
3

I can answer on how we do it actually:

We're using a private Gitlab on AWS and its CI system. We have a repository for our build environment (a docker image with packer/scripts/json describing amis) and one with a .gitlab-ci.yml which describe the build and export tasks for each of our base AMI.
Each project build its own AMI from our base one on every push and we use a DynamoDb table to store the versions and ami ID for deploys.

This could be an approach, where your ML taks update would trigger the AMI build.

If you go on the docker way, you should have a look at AWS ECS which will cut the burden of starting an EC2 instance before pulling the docker image.

Another option could be to move totally to Amazon Machine Learning but in this case you will raise the burden of migrating you jobs outside of AWS if the need arise.

There's no "best" way in my opinion, all are valid and the one you're the most comfortable with should be your choice :)

Tensibai
  • 11,366
  • 2
  • 35
  • 62
  • Ami stands for Amazon machine image, ec2 stand for elastic cloud compute (IIRC) I'm pretty sure google can answer that ad nauseam :) – Tensibai Mar 15 '17 at 11:46
  • well, did my best honestly to bring light about the terms without directing to google only :) 2) I don't think so, there is already one about devops terminology and I still feel this doesn't bring value to the site. I don't feel giving link to every tool in the answer helps the reader to evaluate them by him/herself
  • – Tensibai Mar 15 '17 at 11:59
  • 1
    a good way to help people understand what's behind the acronyms is to link to the aws open guide – Michael Pereira Mar 15 '17 at 17:15