Skip to main content
Version: V1-Next

Testing Integration in Ansible Playbooks

To improve reliability and reduce feedback loops during deployments, we integrate automated tests directly into the Ansible Playbook that provisions the platform. In the playbook and this documentation, these tests are also referred to as checks.

Why Integrate Tests?

Integrating tests into the playbook serves two main purposes:

  1. Early Error Detection: Failures are caught as early as possible, reducing the risk of discovering issues only at the end of a long execution.
  2. Fail Fast: By detecting critical problems early, we can halt the deployment process immediately, saving time and avoiding cascading errors.

This strategy ensures faster debugging and a more reliable deployment process.

Best Practices for Writing Tests

When implementing tests, follow these principles to maintain performance and clarity:

1. Keep Tests Fast and Lightweight

Tests should execute quickly to avoid significantly slowing down the playbook. Avoid complex or long-running checks. Instead, focus on verifying critical assumptions that are likely to cause downstream failures if unmet.

2. Minimize False Positives

Be conservative in failing a test. In distributed systems, timing issues can lead to transient states. When in doubt, prefer to wait or retry rather than fail early. For example, services might need a few seconds longer to become ready.

3. Use Template-Based Tests

To avoid repetitive boilerplate, tests should be implemented using reusable task templates. Use include_tasks along with input variables to invoke these test templates. Platform-specific defaults can be embedded in the templates, keeping test invocation concise and maintainable.

This modular approach:

  • Encourages code reuse
  • Makes tests easier to maintain
  • Reduces duplication across the playbook

4. Allow Tests to Be Skipped

Each test should support being skipped via a variable toggle (e.g., for development or specific environments). For now either all tests are executed or none, via the variable when: inv_checks.enable in the templates.

When to write tests

When possible every application should have a test, which tests its internal state, i.e. pod health. If an application is exposed to the internet it should also have one API test, which checks its reachability. These tests should cover the most common problems. Other tests can be added if a problem with an application regularly happens. When writing tests, keep different platform configurations in mind.

How to write tests

Prefer testing/verifying if a step was successful as soon as possible. Otherwise the deployment might continue although the platform is in an unstable state. Some components such as APISIX routes are added later on so some tests are only then possible. To write a test, define what you want to test and if possible find an appropriate task template, which does this. Include this task and set all necessary variables. If necessary variables like max_retries can also be set, if it is known, that the application takes more time to start. You can verify a deployment step with multiple steps. For example first test if all pods are ready and then, if the API is reachable. The following is an example of how a task template can look like and how it can be called.

Example Structure of a Test

# Deploy tool ...

- name: "<tool>: [Check] API reachable"
include_tasks: tasks/templates/api_health.yml
vars:
url: "https://<tool>.{{ DOMAIN }}"

Inside tasks/templates/api_health.yml:

- name: "[API health] Check status code"
uri:
url: "{{ url }}"
status_code: "{{ status_code | default(200) }}"
ca_path: "{{ inv_k8s.ingress.ca_path }}"
register: result
until: "result.status == (status_code | default(200))"
retries: "{{ max_retries | default(inv_checks.api.default_max_retries) | default(10) }}"
delay: 2
when: inv_checks.enable