When your company’s growing like a bean stalk on steroids (and weight gain supplements) and you live by Lean principles and practice JIT operations, you need to figure out how to efficiently manage a rapidly growing staff, and the IT infrastructure that comes with it, using a small, capable team of tech ops engineers.
A lot of ops people are rockstars when it comes to automating their production, testing, and dev environments and customer facing infrastructure, and the Conductor team is certainly in that category. But sitting in a standup (yes, sitting, legs were tired) a few months ago, we had this update from Ryan (the engineer who runs our internal IT):
Ryan: “Yesterday I provisioned two machines for new hires. Today I have some more to do.”
Me: “How many are you doing today?”
Me: “What about tomorrow?”
Me: “And the rest of the week?”
Ryan: “Two, two, and two.”
If you’ve ever managed or worked in a help desk, IT service bureau, or NOC, this should sound familiar. We had 10 new machines to provision that week, and one guy to do them, two per day. Quick math can tell you why this moved my hairline back a millimeter or two:
- Ryan can provision 2 computers a day (1 machine every 4 hours).
- We need to provision 10 computers a week.
- Ryan was our only internal IT resource at the time who could do the work. He was also responsible for all of our other internal systems.
10 computers x 4 hours per computer = 40 hours of work
40 hour work week – 40 hours of work = 0 hours of time to deal with the rest of the internal service requests
0 hours spent on service requests per week = 0 satisfied, productive, fellow employees who made those requests in the first place.
This was an extreme, but if this continued, we would be saturated: not able to cover any of the other, business-critical, internal IT services we provide, and one out-of-band request away from missing our OLAs (Operational Level Agreements). Hiring wasn’t going to slow, and we had to find a way to keep up with our normal ops work and still support the high rate of growth we were seeing.
Everyone in the meeting (almost at once): Hey, why don’t we just automate it?
The answer to the capacity problem came in the form of automation, and automation came in the form of two easy to use, reliable, and proven technologies that many ops engineers use and love already: Puppet and The Foreman.
The Foreman is a complete lifecycle management tool for physical and virtual servers.1
Puppet Open Source is a flexible, customizable framework available under the Apache 2.0 license designed to help system administrators automate the many repetitive tasks they regularly perform.2
The idea was to stop looking at these purely as server management tools, and to apply the same configuration management principles to our employees computers. This gave us a number of advantages:
- Provisioning new employee machines went from a 4 hour process of manually installing various packages and configurations to a 5 minutes process of registering the machine with Foreman and triggering a Puppet run.
- Rolling out upgrades across the company became a fully automated process, requiring us only to upload a Puppet module on the Puppet master server and wait for or force Puppet runs on the client machines.
- Configuration compliance became a non-issue, since we could manage all sensitive configurations with Puppet modules.
- Detailed reporting on the configuration of all employee machines allowed us to troubleshoot issues quickly, and sometimes diagnose and resolve them before the user was even aware of the problem.
Our basic network architecture looks like this:
Getting this set up involves a few steps. Both theforeman.org and PuppetLabs have excellent installation and getting started tutorials available, so I’m not going to repost their documentation here. Here are the highlights:
- Install Foreman. We installed it on an Ubuntu 12.04 VM, but you can use your favorite distro (RHEL, Debian, and Fedora are all supported), and physical or virtual hardware. If you don’t already have a Puppet master installed, the Foreman installer will give you an all-in-one option that will include Foreman, a Puppet master and the Smart Proxy. I recommend running at least the Puppet master on a separate machine because of the resource requirements. PuppetLabs recommends a KVM virtual server, with 2-4 processor cores and 4 GB RAM for the master, which lets it manage up to ~1000 machines.
- Install the Smart Proxy. You can separate this from the Foreman machine with very little performance impact, but the proxy is extremely lightweight anyway. We ended up installing it on the same machine using the Foreman installer, and have left it there.
- Install Puppet. We separated our Puppet master from our Foreman box, but if you’re managing a relatively small number of machines, you can run it alongside Foreman. This is a simpler setup, and lets you get all 3 of these steps done with a single run of the Foreman installer.
puppetto your Puppet master. The Puppet agent attempts to reach a host named
puppetby default, so this will save you headache and the additional step of messing around with your Puppet agent config or hosts file on your agent machines.
- Find a Puppet package that matches your Puppet master version, and save that in your DSL (definitive software library). This is incredibly important. Older agent nodes can get catalogs from a newer Puppet master. The inverse is not always true.3 Running a mixed version environment can also lead to very hard to diagnose issues with the way catalogs are compiled and interpreted.
- Make sure everything can talk to each other (and that nothing outside your network can talk to Foreman or the Puppet master). Your Puppet master is a certificate authority, and once you sign a certificate with it, it will be trusted. If you turn on autosigning (which you can manage from the Smart Proxies page in Foreman, or on the Puppet master directly), and your Puppet master is accessible to the world, or to any untrusted machines, you’re opening a large security hole.
- Port 8140 should be open into the Puppet master from the Smart Proxy machine (you might need to open this to the whole network).
- Port 8443 should be open from the network into your Smart Proxy.
- Ports 80 and 443 should be open from your ops engineers machines (at least) into Foreman.
- Download or write your Puppet modules. Puppet Forge has a great library of open source modules, and anything you can’t find there is easily written as a custom module. Our team uses a Puppet IDE called Geppetto for module development.
- Create your Foreman host groups. Create a hostgroup for each machine profile you’re going to manage, and add the relevant Puppet modules to the hostgroup. For example, if your design team is using Macs, and they all need the Adobe Creative Suite installed, create a hostgroup named mac-design and add the adobe-creative-suite Puppet module to it.
With that done, you should be in good shape to try your first provisioning run. We follow this procedure:
- Install Puppet on the new machine (use the right version!)
puppet agent --test --waitforcert 15on the new machine
- Log into Foreman
- Sign the new certificate from the Smart Proxy > Certificates page
- Assign the new machine to a hostgroup with the base Puppet classes you want to apply
- Assign any additional classes to the new machine from the Hosts page
- Run Puppet on the new machine, either manually or from within Foreman using the Puppet Run feature on the machine’s profile page.
After the run completes, your new machine is fully provisioned and ready for your user. And the best part of the process is that it doesn’t take a systems guru or Linux Jedi to run it. With the exception of one quick command line invocation, everything’s done from the very intuitive, and good looking, Foreman UI.
Although the week I described above isn’t the norm, running our internal machine provisioning this way is saving us a lot of time. We can get new employees up and running faster, with less advance notice than we were ever able to before. Brian (our highly talented tech ops intern who joined us while we were setting this system up) and Ryan can now focus more of their time on new projects, like our internal monitoring and IT self-service system, and on keeping our fast-growing internal infrastructure running.