Puppet custom type validation woes

Since I’ve just lost a full day to troubleshooting this issue, I’m documenting it in case it hits anyone else. In at least puppet versions 4.7.0 and earlier, global type validation cannot be used to ensure the presence of paramters without breaking puppet resource.

Simplified example type:

Puppet::Type.newtype(:entropy_mask) do
  @desc = "Mask packages in Entropy"


  newparam(:name) do
    desc "Unique name for this mask"

  newproperty(:package) do
    desc "Name of the package being masked"

  validate do
    # This will break for `puppet resource`
    raise(ArgumentError, "Package is required") if self[:package].nil?

This works fine to validate that in a puppet manifest the `package` parameter is provided, but not when puppet resource interrogates the state of the existing system due to the way the object is constructed.

Puppet calls the `provider.instances` to obtain a list of the resources on the system managed by that provider. In the example above, the provider was a child of ParsedFile and took care of parsing the contents of /etc/entropy/packages/package.mask, splitting the lines into the various properties including package

Puppet then tries to create an instance of the Type/resource for each object retrieved by the provider. It does so by instantiating an instance of the type, but passing in the namevar and provider only. It then attempts to iterate through all the provider properties and set them on the resource one by one. The problem is that validation happens on the call to new() and so the required properties have not yet been set.

Here’s the code from lib/puppet/type.rb from puppet 4.7.0, with irrelevant bits stripped out and comments added by me:

def self.instances
  # Iterate through all providers for this type
  providers_by_source.collect do |provider|
    # Iterate through all instances managed by this provider
    provider.instances.collect do |instance|
      # Instantiate the resource using just the namevar and provider
      result = new(:name => instance.name, :provider => instance)
      # Oops, type.validate() got called here, but not all properties
      # have been set yet</code>

      # Now iterate through all properties on the provider and set
      # them on the resource
      properties.each { |name| result.newattr(name) }

      # And add this to the list of resources to return

Of course, once the problem is understood, finding out that someone else already discovered this 2 years ago becomes much easier. Here’s the upstream bug report: https://tickets.puppetlabs.com/browse/PUP-3732


I’ve just uploaded my first puppet module to the forge, optiz0r-sabayon, which improves support for the Sabayon Linux distribution in puppet.

This does the following things:

  • Overrides the operatingsystem fact for Sabayon hosts
  • Adds a provider for entropy package manager, and sets this as the default for Sabayon
  • Adds a shim service provider, that marks systemd as the default for Sabayon hosts
  • Adds an enman_repo type which manages installation of Sabayon Community Repositories onto a Sabayon host
  • Adds entropy_mask and entropy_unmask types, which manage package masks and unmasks

I’ll add more features as and when I need them. In the meantime, pull requests welcome!

Removing stale facts from PuppetDB

PuppetBoard and PuppetExplorer are both excellent tools but can be slowed down significantly if there are a very large number of facts in PuppetDB. I recently had an issue with some legacy facts tracking stats about mounted filesystems causing a significant amount of bloat, and this is how I cleaned them up.

The problem

A long time ago, someone decided it would be useful to have some extra fact data recording which filesystems were mounted, the types and how much space was being used on each. These were recorded such as:


It turned out that none of these ever got used for anything useful, but not before we amassed 1900 unique filesystems being tracked across the estate and with three facts each that accounted for almost 6000 useless facts.

Too many facts!

The PuppetDB visualisation tools both have a page that lists all the unique facts, retrieved from the PuppetDB API using the /fact-names endpoint. Having several thousand records to retrieve and render caused each tool to delay page loads by around 30 seconds, and typing into the realtime filter box could take minutes to update, one character appearing at a time.

Removing the facts

Modifying the code to stop the fact being present on the machine is the easy part. Since the /fact-names reports the unique fact names across all nodes, in order to make them disappear completely we must make sure all nodes check in with the updated fact list that omits the removed facts.

How you do this depends on your setup. Perhaps you have the puppet agent running on a regular schedule; maybe you have mcollective or another orchestration tool running on all your nodes; failing any of those a mass-SSH run.

So we update all the nodes and refresh PuppetExplorer… and it’s still slow. Damn, missed something.

Don’t forget the deactivated nodes!

If we take a closer look at the documentation for the /fact-names documentation we see the line:

This will return an alphabetical list of all known fact names, including those which are known only for deactivated nodes.

Ah ha! The facts are still present in PuppetDB for all the deactivated nodes, but since they’re not active we didn’t/cannot do a puppet run on them to update the list of facts. We’re going to have to remove them from the database entirely.

Purging old nodes from PuppetDB

By default, PuppetDB doesn’t ever remove deactivated nodes, which means the facts hang around forever. You can tweak this by enabling node-purge-ttl in PuppetDB’s database.ini. As a once-off tidy up, I set node-purge-ttl = 1d and restarted PuppetDB. Tailing the logs I see PuppetDB runs a garbage collection on startup and all of my deactivated nodes were purged immediately.


Now.. to deal with the thousand entries from the built-in network facts…

Setting up hiera-eyaml-gpg

It’s inevitable at some point while writing puppet manifests that you’ll need to manage some sensitive configuration; be that a database password, an SSH deploy key, etc. One way to deal with this is to lock down your puppet code so that only trusted developers can see the contents. Another approach is to encrypt the secrets within the puppet code, which is where hiera-eyaml comes in. Hiera-eyaml provides a pluggable backend for your hiera data that can contain secrets encrypted through different means. By default hiera-eyaml uses a symmetric passphrase to protect the secrets, but hiera-eyaml-gpg adds a GPG backend allowing secrets to be protected using asymmetric keys.

The puppetmaster of course will need access to the secrets in order to provide them to end machines (whether using passwords or GPG keys), so hiera-eyaml does nothing to help secure the master itself. You should work on the basis that the secrets are effectively plaintext on the puppetmaster and protect it appropriately. If someone does compromise your puppetmaster, you’ll have bigger problems than someone being able to read the hiera secrets. However hiera-eyaml does protect the secrets while they reside outside of the puppetmaster, for example on workstations and in version control systems.

I prefer the GPG backend because it means developers can have passphrase-protected keys on workstations and use the gpg-agent to securely access the key. It means workstation machines don’t need the same rigorous protection as the puppetmasters to keep the secrets secure.

Installing the hiera backends

On Sabayon, use my community repo and install using entropy:

equo install dev-ruby/hiera-eyaml dev-ruby/hiera-eyaml-gpg -a

Or install using portage from my overlay:

emerge dev-ruby/hiera-eyaml dev-ruby/hiera-eyaml-gpg -av

On RedHat type systems, use fpm to build RPMs from the ruby gems and install natively. Note we’re building with sudo so that fpm picks up the right GEMPATH and doesn’t build packages that install to the builder’s home directory.

cd ~/rpmbuild
sudo -E fpm -s gem -t rpm -n hiera-eyaml -a noarch --version 2.0.2 --iteration 1 -p RPMS/noarch/hiera-eyaml-VERSION-ITERATION.ARCH.rpm hiera-eyaml
sudo -E fpm -s gem -t rpm -n hiera-eyaml-gpg -a noarch --version 0.4 --iteration 1 -p RPMS/noarch/hiera-eyaml-gpg-VERSION-ITERATION.ARCH.rpm -d hiera-eyaml hiera-eyaml-gpg
sudo rpm -Uvh RPMS/noarch/hiera-eyaml{,-gpg}*.rpm

I’m assuming here that the ~/rpmbuild environment is already setup for rpmbuild to use. If not you will need to do this first.

Creating your puppetmaster keys

On each of your puppetmasters, you’ll need to create a PGP keypair that the master can use to decrypt the secrets.

First up, create a directory to contain the keyrings:

sudo mkdir /etc/puppet/keyrings

Now generate the keypair and export the public part (be sure not to set a passphrase here):

sudo gpg --homedir /etc/puppet/keyrings --gen-key
sudo gpg --homedir /etc/puppet/keyrings/ --export -o /tmp/puppetmaster.pub

Copy the puppetmaster.pub to your local workstation using something like scp so you can encrypt data using it later.

You can reuse the same key on all puppetmasters by copying the keyrings directory around, but it would be better to repeat this process to generate a unique key on each of your puppetmasters. This means you can later revoke a single master’s key without having to re-key every machine.

Creating your personal keys

If you don’t already have a personal GPG keypair, create one for yourself now:

gpg --gen-key

Next up you need to import all of the puppet master keys into your keyring (be sure to set a strong passphrase here):

gpg --import puppetmaster.pub

We need to list and sign each of the puppetmaster keys with your personal key:

gpg --list-keys
pub 4096R/CDADE567 2013-05-04
uid [ultimate] Ben Roberts <ben@example.com>
sub 2048R/FDF62278 2013-05-04

pub 4096R/CBF58456 2013-05-04
uid [ full ] master1.example.com <hostmaster@example.com>
sub 2048R/234E54BF 2013-05-04

pub 4096R/427659C4 2014-11-22
uid [ full ] master2.example.com <hostmaster@example.com>
sub 4096R/C645C3FB 2014-11-22

gpg --sign-key CBF58456
gpg --sign-key 427659C4

Setting up hiera

Now we need to configure hiera to use the hiera-eyaml and hiera-eyaml-gpg backends. Configure your hiera.conf contain the following to have it query

 - eyaml
- common
 :datadir: /etc/puppet/environments/%{::environment}/data
 :extension: 'yaml'
 :encrypt_method: gpg
 :gpg_gnupghome: /etc/puppet/keyrings

Next up we need to tell eyaml which keys to encrypt new data with. This is a simple text file that goes inside your hiera data directory and contains the names of the PGP keys we generated earlier. You’ll need to include the names of all the puppetmasters so the catalogs can be generated plus all the developers who need to be able to make changes to the secrets.


Make sure the updated hiera.yaml and hiera-eyaml-gpg.recipients files are available on all puppet masters and reload/restart the master to pick up the changes.

Your first secret

Now we’re all set, it’s time for the fun bit. We’ll use the eyaml tool to edit the common.yaml to add a new secret value.

eyaml edit data/common.csv

Add a new value into the file, wrapping the plaintext secret in DEC::GPG[ and ]!. This will tell eyaml that the secret is in decrypted form, and should be encrypted with the GPG backend.

host::root_password: "DEC::GPG[superseekrit]!"

Now save and exit the file, then re-open it, and you’ll see something like this instead:

 host::root_password: "ENC[GPG,hQEMA4LkLtcnPlS+AQf/SabmYb9US3HTv8B1Bxx3CN9Tw29Lt3WcC4OeOnq1a5xzlhP5dolMcSV/qPqo4j3hq+ z2D1e+POZSd+ 3cH4lD6wRr3IWjJkyHyGmibVlIUPv2Y7CNMOXPcGJaAEFCKTpTEKlS87zDied19b9jS6yoCDVtGgLlUF32Et66P6pimVelWSb4REnv3rRVR7goCLmlaFk30/ UqeJfwmwNxPPsO+Ne8SreA0dfukkkyZ3JnSTmbtXlGJfMPLA7bjW8+Jexb/0c6WJiEDXCxuncvzkBeMz6+ cuKjZ6SHLIxiQtZUDrxAvkpiId6cWM49nYpbxdZVvzfoyiQkDtK7uw/hF92wxYUCDAOVcE7kxjXDjgEP/ 2XcJQnRSuagdOUPMZMW4RkC3pNXRV8IcoLWQVDP08YuICCdL5iVaNbU66fU034UyJmHRyZREU+ NiTUvxj92gkuNSG4jqMiDEdehNTnkCmij9qSjiZGaHHcIx6OwfYanLsWm5b0R+HBRCg1EXqwjmeUqi3sFCu6qlRPaDLc77xRCxJdvGRHZ04JUnyjYS/ leRxdVo2FEzJVHAW/Psm2wa+wkTcuW6g2Uv65WzANxaNBcP+vWAlErMHbxmkFiRvYHPBxbS6L/w5+Umh+5LLrx6M/op3iQWAialqNd8NKFYKkVqb/ Y7Tmfaj6W0XV+JiEkwoYY0SMD6wTtQwH6OPk99VfDUPiU7uQ+i8Q8doK8J8OH7sQTj/ye1Rq0e6dF7xGhvhm7YOa3UMSx/V33eZAr4EQ/n+ bMVZxDfZ6Qmi5wVw9oZ9KO826zkUy1K/ 4QrxjQZfz0YZTzDIrc8lGcHXuroIbiUemPbgkX6GEiXInha5tt7chTiiyjFgfCtSOcekeQ4VAMcBb66LUp2M8D3k4Aqp3j+ wK7KesDTaoTF1gN4FyVsXuest6YB6v67Zv+Wox30z+AG97RIzHZlWqioPxtB98QAbg5pT2a5brnRuD2/6rllO4dCRE1lMO1Sh8v5ZiV824rxVMo4z+ NzybSB2kDN4DoeubDUCzExeJXM9MRqpz7hQEMA5Fhm/f79iLoAQf9EZ8XH2jNgHY8K4oJ/TKapivcEqZm5a/ 35eWzFigBHKaBwag05q2M5imtFbI4Ez7ugFrwSdeUFeQHW16Mt9Jka7KfAmo9CuxYuOcc5/3T6qjzwf1nQtRiX/ 9LMxAQWz5vQRYXbIPhPzMif6JfUxGfT5fg4oNBsDc2mIo6K7gxUg1EDhqznVpnclVuv4LrTieZgq2FPue95IM1SGsFFHak5y3f+sbQUl8xvVQohq+ hyXhsxGmMASkt6ZPIQE1v3u35FUA8ovKQg5cIOdt5sYp1EV7tDL6kPieaVF0Ba20v01MY0dsHFxuGmeAIHWJxukxXDB8bPOQBoW45TRTZ0u0GOdJHAaGi6dM JwzVz2Dt/IQZlhjG3Yh0VPkUgQ78bsHKYuL7k0CDpDr3vb4mT0PljNEot7wDb4pBUL/3KtumvmDRxJ20TA4sFaNI=]"


Encrypt all the things

Running eyaml edit again will decrypt the file and open it in your text editor. Saving and quitting the editor will re-encrypt the file again. While editing the file, the wrapper string will change to DEC::GPG(N) where N is a unique number within the file. This is used for eyaml to keep track of which values have been edited so you don’t see all the encrypted blobs change every time the file is edited, and only the actually changed values will show up in version control diffs. Neat.

Adding a new developer/puppetmaster

If in future you need to deploy a new puppetmaster or a new developer joins the team there are three things you need to do.

  1. Import the new user’s key into your keyring and sign it
  2. Update data/hiera-eyaml-gpg.recipients with the name of the new keys
  3. Run eyaml recrypt <filename> for each file that contains secrets so they are re-encrypted with the new keys.

Preventing information leaks

Secrets are now protected on disk and in version control, which is good, but that’s not the only place secrets can be leaked. Recall when you do a [startCodepuppet agent –test –noop run to see what things puppet would change? That shows diffs of files secrets and all. Oops. And logged to syslog by default. Oops. And if you’re running with stored configs/puppetdb the reports including the diffs will be stored there as well. Oops.

To prevent this, make sure that when managing any file resource in puppet that could potentially include secrets, that you make use of the show_diff option to hide the diffs. This has the slight downside that doing a –noop run no longer shows you what would change in the file, but that’s probably better than having a password available in plaintext somewhere.

Also remember that just because the current version of the heira data is protected doesn’t mean that any history is. If you start using hiera-eyaml to protect existing secrets, remember that previous revisions may contain the plaintext versions. Bringing in heira-eyaml is a good excuse to change passwords too.


If you see a message like this, then your gpg-agent or pinentry programs may not be working properly:

[gpg] !!! Warning: General exception decrypting GPG file
[hiera-eyaml-core] !!! Bad passphrase

Make sure you have a pinentry application installed (e.g. pinentry-curses, and that your gpg-agent is running:

$eval(gpg-agent --daemon)

Then try again.

Using puppet on Sabayon Linux

I like puppet and I like Sabayon but out of the box they don’t play nicely together. Sabayon is a Gentoo derivative and looks to puppet like a Gentoo system which causes it to use the Gentoo providers for package and service resources. Unlike a stock gentoo install, Sabayon hosts use systemd and a binary package manager (entropy). While entropy is compatible with portage I don’t want to compile things from source on my Sabayon boxes.

I’ve put together a a puppet module (https://github.com/optiz0r/puppet-sabayon/) which adds support for Sabayon by doing the following things:

  • Overriding the operatingsystem fact to distinguish between Gentoo and Sabayon (osfamily still describes it as Gentoo)
  • Adding an init fact which reports whether systemd is running
  • Subclassing the Systemd service provider to make it the default when the init fact reports systemd is in use
  • Adding a has_entropy fact that checks whether equo is available
  • Adding an Entropy package provider that’s the default for systems where has_entropy is true (and when operatingsystem is Sabayon, but that’s mostly a hack to override the portage default)

To use it, just add the module to your environment, and pluignsync will do all the hard work.

Known issue: The entropy provider doesn’t behave well when it tries to install a package which has a new license not already included in [startCodet]license.accept. Equo will present a prompt to accept the license in this case, which fails since there’s no stdin to read the answer from. This causes the prompt to be reprinted in a tight loop while puppet captures the output and fills up /tmp. I’ve got a bug report to change the equo behaviour. Once this issue is resolved I’ll package up the module properly and submit it to the forge, but in the meantime it’s around if people might find it useful.

Puppetenvsh Mcollective Agent

There is no shortage of different ways to setup Puppet and to manage how code is deployed. Like many people, I’m using git to store my puppet code. Perhaps a little less normally, I have multiple puppetmasters. For me these solve two problems; resilience in case one master needs to be taken offline, and geographic diversity which means I can target puppet runs at the nearest master and save a bit of time during puppet runs. This does however raise a different problem: how to keep both masters in sync so that each serves the same content. My answer to this is puppetenvsh, a MCollective agent which is triggered via a git pre-receive hook and updates all the puppet environments on all masters concurrently.

Continue reading