There are a bunch of best practices that can be followed to make your life easier with Puppet. The following diagram is a visual representation of how I place files on a puppet master to follow a couple of best practices that I've found interesting.
Here's a quick list of points that I follow when placing files on a puppet master:
- Files directly in /etc/puppet should be managed by Puppet itself. This way, you can easily bootstrap a new puppet master if/when needed.
- All your manifests should be placed in a Git repository, placed in a directory that will contain different environments. The repository will help you keep a log of all changes, and will help you set up the next point. Also, working with a distributed VCS like git gives your team the opportunity to work on the same files at the same time.
- Pusing a new branch to your repository will checkout that branch in a new directory, thus creating a new environment dinamically. This makes it possible to apply the "topic-branch" workflow the distributed VCSes are so powerful with. Also, with topic branches becoming different environments on your puppet master, you can test out your changes on a few test nodes without impacting production by using the
--environment=
argument to puppet on the selected nodes. The trick to setting this up is to use the $environment variable in your puppet.conf file. Check out this blog post to see how to set this up. - Modules should extract values for variables from an external store, instead of expecting to use global variables. Use Hiera for this. The abstraction of data from your modules makes your code a lot easier to understand and to maintain. Global variables tend to suffer from some very bad side effects from class or include hierarchy, and tend to make your life maintaining your modules a living hell.
- Hiera data files should be inside your repository, and part of environments (the git repository). Changing data, especially in a hierachichal data store, must be tested out. Having your data files in the repository will carry all files with your modifications over to a new environment so that you can test that change out without it having an impact on production. The hiera.yaml file will decide where it gets information from. You can use the
$environment
variable in the path, as this post describes (see the section titled "Hiera Best Practices") to have per-environment data stores. - Your main manifest, site.pp, and your node definitions (if kept inside a manifest) should also be part of environments (the git repository). You can place the site.pp file in the environment directory by adding a line to your puppet.conf file that looks like this:
manifest=$confdir/environments/$environment/manifests/site.pp
. - modules should be split in two directories: "modules" should contain generic modules that could very well be pushed to a public repository, while "site" should contain your site-specific modules and module overrides. This will possibly not apply to every organisation, but splitting things in two makes it a lot easier to contribute your modifications to open sourced modules. It also helps in creating a psychological barrier between "general tools for managing different services" and "how do we manage things". I usually define node types and node roles in the site directory.
gabster, this is fantastic. Thanks so much for sharing this info! I'm new to puppet, setting up my first installation and I want to do things right the first time (as much as possible) to avoid major refactoring down the road.
In your setup, under environments/production/site/ I see some directories: "node_type", "role", "site-". Could you give a specific example of what goes under each of those? For instance, what is "node_type/vps", "role/web_server", and what makes up the wildcard in "site-" ?
Thanks!
interesting question.
what's under "site", "modules" and "hieradata" is mostly gonna be dependant on how you want to place things around and name things. I put those names mostly to show an example of how I do it.
The modules I use come in majority from here:
https://labs.riseup.net/code/projects/sharedpuppetmodules
and those modules search for files under site-
something
when the modulesomething
would let you override configuration files.now, I also like to split my own things in two other modules under site/ to have a really clear classification.
I normally use node_type/ to define what kind of hosts there are: so possibly a bare-metal machine, a xen-dom0, a xen-domU, KVM host vs. VM, and so on. Those classes define what should be installed to make the host into such a container:
note that some of those details can be handled by a generic module and simply included in the
node_type::something
class. You can also use node_type to build high-level node classifications like "web_cluster_apache_node", "web_cluster_mysql_master" or some such if you intend on building multiple such clusters or infrastructures.You can also use the classes in node_type to define some general-purpose resources like the usual packages you're gonna want to have installed everywhere.
I usually use the
role/
directory to define more fine-grained service roles like "role::mysql_master" or "role::nginx::reverse_proxy" or "role::puppetmaster".The general idea is this: nodes should include one class from node_type, and may include one or more classes from role, and the specifics should be handled by either "role", modules in "site-*" or hiera (for data, you want to hardcode as few values as possible in your manifests). This way, you can keep your node definitions as simple and understandable as possible.