There are a bunch of best practices that can be followed to make your life easier with Puppet. The following diagram is a visual representation of how I place files on a puppet master to follow a couple of best practices that I've found interesting.

puppet hierarchy.png

Here's a quick list of points that I follow when placing files on a puppet master:

  • Files directly in /etc/puppet should be managed by Puppet itself. This way, you can easily bootstrap a new puppet master if/when needed.
  • All your manifests should be placed in a Git repository, placed in a directory that will contain different environments. The repository will help you keep a log of all changes, and will help you set up the next point. Also, working with a distributed VCS like git gives your team the opportunity to work on the same files at the same time.
  • Pusing a new branch to your repository will checkout that branch in a new directory, thus creating a new environment dinamically. This makes it possible to apply the "topic-branch" workflow the distributed VCSes are so powerful with. Also, with topic branches becoming different environments on your puppet master, you can test out your changes on a few test nodes without impacting production by using the --environment= argument to puppet on the selected nodes. The trick to setting this up is to use the $environment variable in your puppet.conf file. Check out this blog post to see how to set this up.
  • Modules should extract values for variables from an external store, instead of expecting to use global variables. Use Hiera for this. The abstraction of data from your modules makes your code a lot easier to understand and to maintain. Global variables tend to suffer from some very bad side effects from class or include hierarchy, and tend to make your life maintaining your modules a living hell.
  • Hiera data files should be inside your repository, and part of environments (the git repository). Changing data, especially in a hierachichal data store, must be tested out. Having your data files in the repository will carry all files with your modifications over to a new environment so that you can test that change out without it having an impact on production. The hiera.yaml file will decide where it gets information from. You can use the $environment variable in the path, as this post describes (see the section titled "Hiera Best Practices") to have per-environment data stores.
  • Your main manifest, site.pp, and your node definitions (if kept inside a manifest) should also be part of environments (the git repository). You can place the site.pp file in the environment directory by adding a line to your puppet.conf file that looks like this: manifest=$confdir/environments/$environment/manifests/site.pp.
  • modules should be split in two directories: "modules" should contain generic modules that could very well be pushed to a public repository, while "site" should contain your site-specific modules and module overrides. This will possibly not apply to every organisation, but splitting things in two makes it a lot easier to contribute your modifications to open sourced modules. It also helps in creating a psychological barrier between "general tools for managing different services" and "how do we manage things". I usually define node types and node roles in the site directory.