Archive

Posts Tagged ‘s3’

Scaling Puppet in EC2

January 14, 2013 11 comments

At SmugMug, we’re constantly scaling up and down the number of machines running in ec2. We’re big fans of puppet, using it to configure all of our machines from scratch. We use generic Ubuntu AMIs provided by Canonical as our base, saving ourselves the trouble of building custom AMIs every time there is a new operating system release.

To help us scale automatically without intervention (we use AutoScaling), we run puppet in a nodeless configuration. This means we do not run a puppet master on any machines in our infrastructure. All machines run puppet independent of any other, removing dependencies and improving reliability.

I will first explain nodeless puppet, then I will dive into how we use it.

Understanding Nodeless Puppet

Most instructions for setting up puppet tell you to create a puppet master instance that all puppet agents talk to. When an agent needs to apply a configuration, the master compiles a config and hands it back to the agent. With nodeless, the puppet agent compiles its own configuration and applies it to the host.

We start with a simple puppet.conf file that is pretty generic:

[main]
logdir=/var/log/puppet
vardir=/var/lib/puppet
ssldir=/var/lib/puppet/ssl
rundir=/var/run/puppet
factpath=$vardir/lib/facter
templatedir=$confdir/templates
modulepath=$confdir/modules

Then we create a top-level manifest called mainrun.pp. In our setup this manifest lives in a directory called /manifests. An example mainrun.pp:

include ntp
include puppet
include ssh
include sudo

if $hostname == "foo" {
    include apache2
}

There is also a /modules directory that contains puppet modules. Each include statement in the mainrun.pp manifest exists as a module.

Once we have all of our modules created and listed appropriately in mainrun.pp, we run puppet with the apply command: sudo puppet apply /etc/puppet/manifests/mainrun.pp. Puppet will then run and do all that our manifests tell it to.

Scaling Puppet

Upon booting, all machines download their entire puppet manifest code tree from an Amazon S3 bucket. Then puppet is run and the machine is configured. By using S3, we’re leveraging Amazon’s ability to provide highly available access to files despite server or data center outages.

To help keep our changes to puppet sane, we use a git repository. When anyone does a push to the central repository server, it copies our files to our Amazon S3 bucket. The S3 bucket has custom IAM access rules applied so puppet can only see its bucket and no other.

When we launch a new instance in ec2, we use the --user-data-file option in ec2-run-instances to run a first-boot script that sets us up with puppet.

A simple first-boot script:

#!/bin/sh
AWS_ACCESS_KEY="abcdefghijlmnopqrstu"
AWS_SECRET_KEY="abcdefghijlmnopqrstuabcdefghijlmnopqrstu"
BUCKET_PUPPET="puppet"

apt-get update
apt-get --yes --force-yes install puppet s3cmd
S3CMDCFG="/etc/s3cmd.cfg"
wget --output-document=$S3CMDCFG http://s3.amazonaws.com/$BUCKET_PUPPET/s3cmd.cfg
sed -i -e "s#__AWS_ACCESS_KEY__#$AWS_ACCESS_KEY#" \
    -e "s#__AWS_SECRET_KEY__#$AWS_SECRET_KEY#" $S3CMDCFG
chmod 400 $S3CMDCFG

until \
    s3cmd -c $S3CMDCFG sync --no-progress --delete-removed \
    s3://$BUCKET_PUPPET/ /etc/puppet/ && \
    /usr/bin/puppet apply /etc/puppet/manifests/mainrun.pp ; \
do sleep 5 ; done

s3cmd.cfg in s3 is a publicly accessible template file, containing placeholders for the AccessKey and SecretKey that is supplied by the first-boot script. As s3cmd.cfg is a publicly accessible file, do not place any real credential data in it.

Puppet will install additional tools for keeping puppet running on all machines.

Keeping Puppet Running

As puppet is not running in agent mode, it does not wake up from a sleeping state to apply manifests that have changed since booting. We use cron to run puppet every 30 minutes. Our cron entry:

*/30 * * * * sleep `perl -e 'print int(rand(300));'` && /usr/local/sbin/puppet-run.sh > /dev/null

We have the 5 minute sleep in the cron to ensure that machines run puppet at a staggered interval. This is to prevent all machines from restarting a service at the same moment (Apache for example), causing an interruption for our customers.

We have three simple scripts that are also installed by puppet for puppet:
/usr/local/sbin/puppet-run.sh:

#!/bin/sh
/usr/local/sbin/puppet-update.sh
/usr/local/sbin/puppet-apply.sh

/usr/local/sbin/puppet-update.sh:

#!/bin/sh
BUCKET_PUPPET="puppet"
/usr/bin/s3cmd -c /etc/s3cmd.cfg sync --no-progress \
    --delete-removed s3://$BUCKET_PUPPET/ /etc/puppet/

/usr/local/sbin/puppet-apply.sh:

#!/bin/sh
/usr/bin/puppet apply /etc/puppet/manifests/mainrun.pp

We split puppet runs into three scripts to help with manual maintenance of a box. We run sudo puppet-run.sh if we simply want puppet to run immediately. sudo puppet-apply.sh is handy when making manual changes to the puppet manifests and modules for testing purposes; once testing is complete we copy our changes back into the git repository. sudo puppet-update.sh is infrequently run manually, mostly for resetting the puppet config when making manual testing changes.

Final Thoughts

As you can imagine there is a lot more involved in our manifests. There are a large number of conditional operators that enable and disable different parts of the manifests depending on what role an instance has.

EC2 tags have proven to be invaluable for us; each machine is assigned two tags that exactly describe any role. A script for reading the ec2 tags at boot combined with a custom fact is used to expose the ec2 tags to puppet.

Future posts about puppet may include:

  • How we use EC2 tags for determining instance roles
  • Speeding up initial booting of instances
  • Using custom facts to enable one-off configurations for testing or debugging

— Shane Meyers, SmugMug Operations

Categories: DevOps Tags: , , ,
%d bloggers like this: