Testing Integration in Ansible Playbooks
To improve reliability and reduce feedback loops during deployments, we integrate automated tests directly into the Ansible Playbook that provisions the platform. In the playbook and this documentation, these tests are also referred to as checks.
Why Integrate Tests?
Integrating tests into the playbook serves two main purposes:
- Early Error Detection: Failures are caught as early as possible, reducing the risk of discovering issues only at the end of a long execution.
- Fail Fast: By detecting critical problems early, we can halt the deployment process immediately, saving time and avoiding cascading errors.
This strategy ensures faster debugging and a more reliable deployment process.
Best Practices for Writing Tests
When implementing tests, follow these principles to maintain performance and clarity:
1. Keep Tests Fast and Lightweight
Tests should execute quickly to avoid significantly slowing down the playbook. Avoid complex or long-running checks. Instead, focus on verifying critical assumptions that are likely to cause downstream failures if unmet.
2. Minimize False Positives
Be conservative in failing a test. In distributed systems, timing issues can lead to transient states. When in doubt, prefer to wait or retry rather than fail early. For example, services might need a few seconds longer to become ready.
3. Use Template-Based Tests
To avoid repetitive boilerplate, tests should be implemented using reusable task templates. Use include_tasks
along with input variables to invoke these test templates. Platform-specific defaults can be embedded in the templates, keeping test invocation concise and maintainable.
This modular approach:
- Encourages code reuse
- Makes tests easier to maintain
- Reduces duplication across the playbook
4. Allow Tests to Be Skipped
Each test should support being skipped via a variable toggle (e.g., for development or specific environments).
For now either all tests are executed or none, via the variable when: inv_checks.enable
in the templates.
When to write tests
When possible every application should have a test, which tests its internal state, i.e. pod health. If an application is exposed to the internet it should also have one API test, which checks its reachability. These tests should cover the most common problems. Other tests can be added if a problem with an application regularly happens. When writing tests, keep different platform configurations in mind.
How to write tests
Prefer testing/verifying if a step was successful as soon as possible.
Otherwise the deployment might continue although the platform is in an unstable state.
Some components such as APISIX routes are added later on so some tests are only then possible.
To write a test, define what you want to test and if possible find an appropriate task template, which does this.
Include this task and set all necessary variables.
If necessary variables like max_retries
can also be set, if it is known, that the application takes more time to start.
You can verify a deployment step with multiple steps.
For example first test if all pods are ready and then, if the API is reachable.
The following is an example of how a task template can look like and how it can be called.
Example Structure of a Test
# Deploy tool ...
- name: "<tool>: [Check] API reachable"
include_tasks: tasks/templates/api_health.yml
vars:
url: "https://<tool>.{{ DOMAIN }}"
Inside tasks/templates/api_health.yml
:
- name: "[API health] Check status code"
uri:
url: "{{ url }}"
status_code: "{{ status_code | default(200) }}"
ca_path: "{{ inv_k8s.ingress.ca_path }}"
register: result
until: "result.status == (status_code | default(200))"
retries: "{{ max_retries | default(inv_checks.api.default_max_retries) | default(10) }}"
delay: 2
when: inv_checks.enable