Basics of puppet code : terminology

Puppet code is composed primarily of resource declarations which describes the state of the system such as a certain user or file should exist, or a package should be installed.

Resources are the fundamental building blocks used to model system state in Puppet. They describe the desired end state of unique elements managed by Puppet on the system. Everything that Puppet manages is expressed as a resource.

Puppet uses a declarative language to define our configuration items (resources). Being declarative creates an important distinction between Puppet and many other configuration tools. A declarative language makes statements about the state of our configuration, for example, it declares that a package should be installed or a service should be started.

Most configuration tools, such as a shell or Perl script, are imperative or procedural. They describe how things should be done rather than the desired end state, for example, most custom scripts used to manage configuration would be considered imperative.

Puppet just declares what the state of their hosts should be: what packages should be installed, what services should be running, and so on. With Puppet, the system administrator doesn't care how this state is achieved. Instead, we abstract our host's configuration into resources.

Let's take a look at an example of a user resource declaration.

The format for resource declarations are as follows:

resource_type { 'resource_name'
  attribute => value
  ...
}

Or (though it's the same):

<type> { <title> :
 attribute1 => value1,
 attribute2 => value2,
}

Type: The type of a resource determines the system component Puppet manages. Some common types are: user, group, file, service and package. A resource declaration always contains the type of resource being managed.
Title: The title of a resource identifies an instance of that resource type. The combination of type and title refers to a single unique element managed by puppet, such as a package name 'nginx'.
Attributes: Each resource supports a list of key value pairs called attributes. These attributes provide a detailed description that Puppet uses to manage the resource. For example, the package 'nginx' should be present.

Here is a real sample describes a user resource named 'k', with the specified attributes.

user { 'k':
  ensure     => present,
  uid        => '2008',
  gid        => '4008',
  shell      => '/bin/bash',
  home       => '/home/k'
}

The following example is applied by Puppet to ensure that a package named apache2 is installed:

package { 'apache2':
  ensure => present,
}

To list all of the default resource types that are available to Puppet, enter the following command:

root@ip-172-31-50-172:~# puppet resource --types
augeas
computer
cron
exec
file
filebucket
...
user
vlan
whit
yumrepo
zfs
zone
zpool
root@ip-172-31-50-172:~#

Namevar

A namevar is a special kind of attribute that serves as the identity of a resource on the underlying system. When creating a new resource type, the first task is to choose a namevar. The most important property about a namevar is that it must uniquely identify the resource. In this sense, the namevar can be thought of as the resource's primary key. Most resources that need to be managed have unique identifiers:

Path of a file
Name of a user, group, package, or service

If we don't specifically assign a value for the namevar, its value will default to the title of the resource.

file { '/etc/passwd':
  owner => root,
  group => root,
  mode  => 644
}

In this example, /etc/passwd is the title of the file resource; other Puppet code can refer to the resource as File['/etc/passwd'] to declare relationships. Because path is the namevar for the file type and we did not provide a value for it, the value of path will default to /etc/passwd.

Although the title and namevar are commonly the same, they serve two different purposes in Puppet. The title is used to reference the resource in the Puppet catalog, and the namevar indicates the system's name for the resource.

The example below demonstrates a situation where the namevar is not the same as a resource's title. The title of that resource is apache and its namevar is httpd. This resource can be referenced as apache, but the package under management is httpd:

custom_package { 'apache':
 name => 'httpd',
}

Providers

A user account may contains settings like username, group, and home directory. These attributes are defined as a part of its type. These users are managed differently on Windows, Linux, or ldap. The methods to create, destroy, and modify accounts are implemented as a separate provider for each of these.

Providers implement the procedure used to manage resources. A resource is simply declared as a list of attributes because all of the instructions for managing that resource have been encapsulated in the provider. Additionally, multiple providers can be implemented for a single type, allowing the same resource to be applied on different operating systems. in other words, providers

Puppet includes one or more providers for each of its native types. For example, Puppet's User type includes eight different providers that implement support across a variety of Unix, Linux, and even Windows platforms.

Although Puppet will automatically select an appropriate default provider, we can override the default with the provider attribute. (For example, package resources on Red Hat systems default to the yum provider, but we can specify provider => gem to install Ruby libraries with the gem command.) - Docs: Type Reference

Properties

Puppet's Resource Abstraction Layer (RAL) provides a clear separation between types and providers. Properties are the key to this separation. They describe the attributes of a resource that its providers are responsible for managing.

The ensure is a special property that models the existence of a resource. Until we implement ensure, resources cannot be created or destroyed.

package { 'apache2':
  ensure => present,
}

Properties are also the main integration point between types and providers. Types specify which properties exist, and providers supply the implementation details for how those properties are managed on the system.

Figuring out if an attribute should be a property is one of the most important design decisions for a resource type. In general, we can decide if an attribute should be a property by asking the following questions:

Can I discover the state of this attribute?
Can I update the state of this attribute?

If the answer to both of those questions is yes, then that attribute should be implemented as a property. In general, if the answer to one or both of these questions is no, then the characteristic should not be a property. - from Puppet Types and Providers

Parameters vs Properties

Parameters supply additional information to providers, which is used to manage its properties. In contrast with properties, parameters are not discovered from the system and cannot be created or updated.

Parameters allow us to specify additional context or the ability to override a provider's default behavior. For example, the service resource supports the following parameters: start, stop, status, and restart. None of these attributes reflect the state of a service. Instead, they override the commands a provider uses to interact with services on the system. - from Puppet Types and Providers

RAL - Resource Abstraction Layer

With our resource created, Puppet takes care of the details of managing that resource when our agents connect. Puppet handles the how by knowing how different platforms and operating systems manage certain types of resources. Each type has a number of providers. A provider contains the how of managing packages using a particular package management tool.

The package type, for example, has more than 20 providers covering a variety of tools, including yum, aptitude, pkgadd, ports, and emerge.

When an agent connects, Puppet uses a tool called Facter to return information about that agent, including what operating system it is running. Puppet then chooses the appropriate package provider for that operating system and uses that provider to check if a specific package is installed. If the package is not installed, Puppet will install it. If the package is already installed, Puppet does nothing. Again, this important feature is called idempotency.

Puppet will then report back to the Puppet master of its success or failure in applying the resource.

facts and facter

Puppet gathers facts about each of its nodes with a tool called facter. By default, the facter gathers information that is useful for system configuration such as OS names, hostnames, IP addresses, SSH keys, and etc. These facts are gathered when the agent runs. The facts are then sent to the Puppet master, and automatically created as variables available to Puppet at top scope.

It is possible to add custom facts if we need other facts to perform our configurations.

We can see the facts available on our clients by running the facter binary from the command line. Each fact is returned as a key => value pair, for example, to see a list of facts that are automatically being gathered on our EC2 agent node, we do:

$ facter
ubuntu@puppetagent:~$ facter
architecture => amd64
augeasversion => 1.2.0
blockdevice_xvda_size => 8589934592
blockdevices => xvda
domain => example.com
facterversion => 1.7.5
filesystems => ext2,ext3,ext4,iso9660,vfat
fqdn => puppetagent.example.com
hardwareisa => x86_64
hardwaremodel => x86_64
hostname => puppetagent
id => ubuntu
interfaces => eth0,lo
ipaddress => 172.31.37.15
ipaddress_eth0 => 172.31.37.15
ipaddress_lo => 127.0.0.1
is_virtual => true
kernel => Linux
kernelmajversion => 3.13
kernelrelease => 3.13.0-36-generic
kernelversion => 3.13.0
lsbdistcodename => trusty
lsbdistdescription => Ubuntu 14.04.1 LTS
lsbdistid => Ubuntu
lsbdistrelease => 14.04
lsbmajdistrelease => 14
macaddress => 0a:b2:46:83:a8:c1
macaddress_eth0 => 0a:b2:46:83:a8:c1
memoryfree => 874.86 MB
memoryfree_mb => 874.86
memorysize => 992.44 MB
memorysize_mb => 992.44
memorytotal => 992.44 MB
mtu_eth0 => 9001
mtu_lo => 65536
netmask => 255.255.240.0
netmask_eth0 => 255.255.240.0
netmask_lo => 255.0.0.0
network_eth0 => 172.31.32.0
network_lo => 127.0.0.0
operatingsystem => Ubuntu
operatingsystemrelease => 14.04
osfamily => Debian
path => /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
physicalprocessorcount => 1
processor0 => Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
processorcount => 1
ps => ps -ef
puppetversion => 3.4.3
rubysitedir => /usr/local/lib/site_ruby/1.9.1
rubyversion => 1.9.3
selinux => false
sshdsakey => AAAA...
sshecdsakey => AAAAE2V...
sshfp_dsa => SSHFP 2 1 c07b...
sshfp_ecdsa => SSHFP 3 1 86a...
sshfp_rsa => SSHFP 1 1 273eab17...
sshrsakey => AAAAB3NzaC1yc2EA...
swapfree => 0.00 MB
swapfree_mb => 0.00
swapsize => 0.00 MB
swapsize_mb => 0.00
timezone => UTC
uniqueid => 1fac0f25
uptime => 18:28 hours
uptime_days => 0
uptime_hours => 18
uptime_seconds => 66506
virtual => xenu
ubuntu@puppetagent:~$

These facts are made available as variables that can be used in our Puppet configuration. When combined with the configuration we define in Puppet, they allow us to customize that configuration for each host. For example, they allow us to write generic resources, like our network settings, and customize them with data from our agents.

Facter also helps Puppet understand how to manage particular resources on an agent. For example, if Facter tells Puppet that a host runs Ubuntu, then Puppet knows to use aptitude to install packages on that agent. Facter can also be extended to add custom facts for specific information about our hosts.

Puppet's engine

Puppet's engin is the Puppet's transactional layer. A Puppet transaction encompasses the process of configuring each host, including these steps:

Interpret and compile our configuration.
Communicate the compiled configuration to the agent.
Apply the configuration on the agent.
Report the results of that application to the master.

The first step Puppet takes is to analyze our configuration and calculate how to apply it to our agent. To do this, Puppet creates a graph showing all resources, with their relationships to each other and to each agent. This allows Puppet to work out the order, based on relationships we create, in which to apply each resource to our host.

Puppet then takes the resources and compiles them into a catalog for each agent. The catalog is sent to the host and applied by the Puppet agent. The results of this application are then sent back to the master in the form of a report.

The transaction layer allows configurations to be created and applied repeatedly on the host. Again, Puppet calls this capability idempotency, meaning that multiple applications of the same operation will yield the same results. Puppet configuration can be safely run multiple times with the same outcome on our host, ensuring that our configuration stays consistent.

Puppet is not fully transactional, though our transactions aren't logged (other than informative logging), and so we can't roll back transactions as we can with some databases. We can, however, model transactions in a noop, or no-operation mode, that allows us to test the execution of our changes without applying them.

Manifests

Puppet codes are called manifests. Manifests are composed of puppet code and their filenames use the .pp extension. The default main manifest in Puppet installed via apt is /etc/puppet/manifests/site.pp.

We've already written very simple manifest in the previous chapter, Puppet with Amazon AWS III - Puppet running Hello World.

The puppet agent periodically checks in with the puppet master, typically every 30 minutes. It sends facts about itself to the master, and pull a current catalog which is a compiled list of resources and their desired states that are relevant to the agent, determined by the main manifest. The agent node will then attempt to make the appropriate changes to achieve its desired state. This cycle will continue as long as the Puppet master is running and communicating with the agent nodes.

Classes

In Puppet, classes are code blocks that can be called in a code elsewhere. Using classes allows us to reuse Puppet code, and can make reading manifests easier.

A class definition is where the code that composes a class lives. Defining a class makes the class available to be used in manifests, but does not actually evaluate anything.

Here is a sample:

class sample_class {
...
}

The above defines a class named "sample_class", and the Puppet code would go between the curly braces.

A class declaration occurs when a class is called in a manifest. A class declaration tells Puppet to evaluate the code within the class. Class declarations come in two different flavors: normal and resource-like:

A normal class declaration occurs when the include keyword is used in Puppet code:

include sample_class

This will cause Puppet to evaluate the code in sample_class.

A resource-like class declaration occurs when a class is declared like a resource:

class { 'sample_class': }

The resource-like class declarations allows us to specify class parameters, which override the default values of class attributes:

node 'host10' {
  class { 'apache': }             # use apache module
  apache::vhost { 'example.com':  # define vhost resource
    port    => '8080',
    docroot => '/var/www/html'
  }
}

Modules

A module is a collection of manifests and data, and they have a specific directory structure. Modules are useful for organizing our Puppet code, because they allow us to split our code into multiple manifests. It is considered best practice to use modules to organize almost all of our Puppet manifests.

To add a module to Puppet, place it in the /etc/puppet/modules directory.

Noop mode (dry-run mode)

Noop mode is a way for Puppet to simulate manifests and report pending changes. When noop mode is enabled (using the --noop flag), Puppet queries each resource and reports differences between the system and its desired state. This is useful for seeing what changes Puppet will make without actually executing the changes

When Puppet is run in noop mode, it skips steps for updating the underlying system, and records differences between desired and observed state as events without making any modifications to the system.

Catalog

When configuring a node, puppet agent uses a document called a catalog, which it downloads from a puppet master server. The catalog describes the desired state for each resource that should be managed, and may specify dependency information for resources that should be managed in a certain order.

A Puppet catalog is a collection of resources compiled from a set of manifests. The catalog is a composition of resources that are used to model a service or a system. The catalog is easily introspected to better understand how a system should be configured, and what dependencies might exist.

graph & dependencies

The data structure of the catalog is a graph. Graphs are characterized as a collection of objects where some of the object pairs are interconnected. The objects are referred to as vertices (Puppet resources) and the the links between pairs of those objects are edges (dependencies).

Resources deploying an application often require individual components to be configured in a specific order. These dependencies are expressed as relationships in Puppet.

The order of resources can be specified using the require and before resource metaparameters.

Let's look at the following example for dependency:

# modules/nginx/manifests/init.pp
# Manage nginx webserver
class nginx {
  package { 'apache2.2-common':
    ensure => absent,
  }
  package { 'nginx':
    ensure => installed,
    require => Package['apache2.2-common'],
  }
  service { 'nginx':
    ensure  => running,
    require => Package['nginx'],
  } 
}

On Ubuntu, the default setup includes the Apache web server, which would conflict with nginx if we tried to run it at the same time. So, by specifying ensure => absent, we remove the Apache package. Then, the next section declares the nginx package:

class nginx {
  ...
  package { 'nginx':
    ensure => installed,
    require => Package['apache2.2-common'],
  }
  ...
}

The require attribute tells Puppet that this resource depends on another resource (here, Apache), which must be applied first. In this case, we want the removal of Apache to be applied before the installation of nginx.

Next, we declare the nginx service:

   service { 'nginx':
     ensure  => running,
     require => Package['nginx'],
}

features

Features are abilities that some providers may not support. Generally, a feature will correspond to some allowed values for a resource attribute; for example, if a package provider supports the purgeable feature, we can specify