Cumin - An automation and orchestration framework

Cumin provides a flexible and scalable automation framework to execute multiple commands on multiple hosts in parallel.

It allows to easily perform complex selections of hosts through a user-friendly query language which can interface with different backend modules and combine their results for a fine grained selection. The transport layer can also be selected, and can provide multiple execution strategies. The executed commands outputs are automatically grouped for an easy-to-read result.

It can be used both via its command line interface (CLI) cumin and as a Python 2 library. Python 3 support will be added soon, as the last dependency that was Python 2 only added support for Python 3 recently.

The documentation is available on Wikimedia Documentation and Read the Docs. The details on how Cumin it's used at the Wikimedia Foundation are available on Wikitech.

Main components

Query language

Cumin provides a user-friendly generic query language that allows to combine the results of subqueries from multiple backends. The details of the main grammar are:

  • Each query part can be composed with any other query part using boolean operators: and, or, and not, xor.
  • Multiple query parts can be grouped together with parentheses: (, ).
  • Each query part can be one of:
    • Specific backend query: I{backend-specific query syntax} (where I is an identifier for the specific backend).
    • Alias replacement, according to the aliases defined in the configuration: A:group1.
  • If a default_backend is set in the configuration, Cumin will try to first execute the query directly with the default backend and only if the query is not parsable with that backend it will parse it with the main grammar.


The backends are the ones that allow to select the target hosts. Each backend is free to define its own grammar. Those are the available backends:


The transport layer is the one used to convey the commands to be executed into the selected hosts. The transport abstraction allow to specify different execution strategies. Those are the available backends:



Simple example without fine-tuning the options:

  • Execute the single command systemctl is-active nginx in parallel on all the hosts matching the query for the alias cp-esams, as defined in the aliases.yaml configuration file.
$ sudo cumin 'A:cp-esams' 'systemctl is-active nginx'
23 hosts will be targeted:
Confirm to continue [y/n]? y
===== NODE GROUP =====
(23) cp[3007-3008,3010,3030-3049].esams.wmnet
----- OUTPUT of 'systemctl is-active nginx' -----
PASS:  |████████████████████████████████████████████████| 100% (23/23) [00:01<00:00, 12.61hosts/s]
FAIL:  |                                                             |   0% (0/23) [00:01<?, ?hosts/s]
100.0% (23/23) success ratio (>= 100.0% threshold) for command: 'systemctl is-active nginx'.
100.0% (23/23) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.

More complex example fine-tuning many of the parameters using the long form of the options for clarity:

  • Execute two commands in each host in sequence in a moving window of 2 hosts at a time, moving to the next host 5 seconds after the previous one has finished.
  • Each command will be considered timed out if it takes more than 30 seconds to complete.
  • If the percentage of successful hosts goes below 95% at any point it will not schedule any more hosts for execution.
$ sudo cumin --batch-size 2 --batch-sleep 5 --success-percentage 95 --timeout 30 --mode async \
  '(P{R:class = role::puppetmaster::backend} or P{R:class = role::puppetmaster::frontend}) and not D{rhodium.eqiad.wmnet}' \
  'date' 'ls -la /tmp/foo'
4 hosts will be targeted:
Confirm to continue [y/n]? y
===== NODE GROUP =====
(2) puppetmaster[2001-2002].codfw.wmnet
----- OUTPUT -----
Thu Nov  2 18:45:18 UTC 2017
===== NODE GROUP =====
(1) puppetmaster2002.codfw.wmnet
----- OUTPUT -----
ls: cannot access /tmp/foo: No such file or directory
===== NODE GROUP =====
(1) puppetmaster2001.codfw.wmnet
----- OUTPUT -----
-rw-r--r-- 1 root root 0 Nov  2 18:44 /tmp/foo
PASS:  |████████████▌                                      |  25% (1/4) [00:05<00:01,  2.10hosts/s]
FAIL:  |████████████▌                                      |  25% (1/4) [00:05<00:01,  2.45hosts/s]
25.0% (1/4) of nodes failed to execute command 'ls -la /tmp/foo': puppetmaster2002.codfw.wmnet
25.0% (1/4) success ratio (< 95.0% threshold) of nodes successfully executed all commands. Aborting.: puppetmaster2001.codfw.wmnet


Simple example without fine-tuning of optional parameters:

import cumin

from cumin import query, transport, transports

# Load configuration files /etc/cumin/config.yaml and /etc/cumin/aliases.yaml (if present).
config = cumin.Config()
# Assuming default_backend: direct is set in config.yaml, select with the direct backend 5 hosts.
hosts = query.Query(config).execute('host[1-5]')
target = transports.Target(hosts)
worker =, target)
worker.commands = ['systemctl is-active nginx']
worker.handler = 'sync'
exit_code = worker.execute()  # Execute the command on all hosts in parallel
for nodes, output in worker.get_results():  # Cycle over the results
    print('{nodes}:\n{output}\n------'.format(nodes=nodes, output=output))

More complex example fine-tuning many of the parameters:

import cumin

from cumin import query, transport, transports

config = cumin.Config(config='/path/to/custom/cumin/config.yaml')
hosts = query.Query(config).execute('A:nginx')  # Match hosts defined by the query alias named 'nginx'.
# Moving window of 5 hosts a time with 30s sleep before adding a new host once the previous one has finished.
target = transports.Target(hosts, batch_size=5, batch_sleep=30.0)
worker =, target)
worker.commands = [
    transports.Command('systemctl is-active nginx'),
    # In each host, for this command apply a timeout of 30 seconds and consider successful an exit code of 0 or 42.
    transports.Command('depool_command', timeout=30, ok_codes=[0, 42]),
    transports.Command('systemctl restart nginx'),
    transports.Command('systemctl is-active nginx'),
    transports.Command('repool_command', ok_codes=[0, 42]),
# On each host perform the above commands in a sequence, only if the previous command was successful.
worker.handler = 'async'
exit_code = worker.execute()
for nodes, output in worker.get_results():
    print('{nodes}:\n{output}\n------'.format(nodes=nodes, output=output))