Removing stale facts from PuppetDB

PuppetBoard and PuppetExplorer are both excellent tools but can be slowed down significantly if there are a very large number of facts in PuppetDB. I recently had an issue with some legacy facts tracking stats about mounted filesystems causing a significant amount of bloat, and this is how I cleaned them up.

The problem

A long time ago, someone decided it would be useful to have some extra fact data recording which filesystems were mounted, the types and how much space was being used on each. These were recorded such as:


It turned out that none of these ever got used for anything useful, but not before we amassed 1900 unique filesystems being tracked across the estate and with three facts each that accounted for almost 6000 useless facts.

Too many facts!

The PuppetDB visualisation tools both have a page that lists all the unique facts, retrieved from the PuppetDB API using the /fact-names endpoint. Having several thousand records to retrieve and render caused each tool to delay page loads by around 30 seconds, and typing into the realtime filter box could take minutes to update, one character appearing at a time.

Removing the facts

Modifying the code to stop the fact being present on the machine is the easy part. Since the /fact-names reports the unique fact names across all nodes, in order to make them disappear completely we must make sure all nodes check in with the updated fact list that omits the removed facts.

How you do this depends on your setup. Perhaps you have the puppet agent running on a regular schedule; maybe you have mcollective or another orchestration tool running on all your nodes; failing any of those a mass-SSH run.

So we update all the nodes and refresh PuppetExplorer… and it’s still slow. Damn, missed something.

Don’t forget the deactivated nodes!

If we take a closer look at the documentation for the /fact-names documentation we see the line:

This will return an alphabetical list of all known fact names, including those which are known only for deactivated nodes.

Ah ha! The facts are still present in PuppetDB for all the deactivated nodes, but since they’re not active we didn’t/cannot do a puppet run on them to update the list of facts. We’re going to have to remove them from the database entirely.

Purging old nodes from PuppetDB

By default, PuppetDB doesn’t ever remove deactivated nodes, which means the facts hang around forever. You can tweak this by enabling node-purge-ttl in PuppetDB’s database.ini. As a once-off tidy up, I set node-purge-ttl = 1d and restarted PuppetDB. Tailing the logs I see PuppetDB runs a garbage collection on startup and all of my deactivated nodes were purged immediately.


Now.. to deal with the thousand entries from the built-in network facts…