Ansible Network Automation -Configuration Backup & Restore
In Any Network Infrastructure, maintaining reliable backups of device configurations is paramount. These backups serve as a safety net, allowing you to recover swiftly from unexpected failures, configuration errors, or security incidents. Ansible provides robust tools for automating the backup and restoration process.
When it comes to Hybrid Cloud Network Automation, one of the most impactful victories is establishing a robust backup process. Here’s why it matters:
- Certified Collections and Common Data Model:
- Ansible’s certified collections, provided by networking partners, adhere to a common data model.
- This consistency allows you to create vendor-agnostic backups.
- Imagine being able to compare backups across a diverse fleet of devices, even when they don’t perfectly align with a corporate standard (which is often the reality).
2. Vendor-Agnostic Backups:
- Ansible’s standardized approach ensures that backups work seamlessly across different network devices.
- Whether it’s Routers, Switches, or Firewalls, the same playbook can handle them all.
3. Setting Up for NetOps Workflow:
- Establishing a backup and restore process is a crucial step toward a NetOps workflow.
- NetOps involves automating network operations, and confidence in automation is essential.
- We’ll dive deeper into NetOps in a future post ! :)
A significant benefit you can gain quickly with Ansible for network automation is establishing a dependable backup system. Because all the approved collections from our network vendors follow a consistent data structure, these backups are compatible with any vendor.
This allows you to analyze backups across a group of devices that might not strictly follow a company standard (which is often the case).
Creating a backup and restore process also lays the groundwork for a Ansible NetOps workflow that requires a higher level of trust in the automation, but I’ll delve into that in a future post. For now, let’s explore how to set up a backup and restore process using Git and Ansible.
High Level Design Automation Workflow
The backup and restore playbooks are designed to function collaboratively. The backup playbook creates a YAML configuration file tailored to each machine in your inventory, while the restore playbook retrieves a specific backup version for deployment.
A popular implementation of this workflow involves incorporating intermediate steps where you can deploy an update and execute a smoke test to guarantee no disruptive modifications were introduced. If the smoke test encounters an issue, you can automatically revert to the backed-up configuration and diagnose the problem.
Backup Playbook Considerations
Here are some key points to remember about the backup playbook:
- Initial Configuration Retrieval: The playbook starts by cloning the configuration Git repository.
- Vendor-Neutral Tasks: The playbook leverages the
ansible.netcommon
collection, ensuring compatibility across different network device vendors. - Resource Discovery and Backup: The first step identifies the resources supported by the target device. Then, for each resource, the configuration is retrieved, and
- A commit and tag are pushed to the remote repository if any changes are detected compared to the previous backup.
tasks:
- name: Identify supported network resources
ansible.utils.facter:
facts: network_resources
register: supported_resources
- name: Retrieve configuration for each resource
loop: "{{ supported_resources.ansible_facts.network_resources }}"
loop_control:
loop_var: resource
include_role:
name: get_network_config
tasks_from: get_resource_config.yml
vars:
resource_name: "{{ resource }}"
- name: Create configuration file locally
ansible.builtin.template:
src: config_template.j2
dest: "{{ config_dest }}"
mode: "0755"
vars:
device_configs: "{{ gathered_configs }}"
...
tasks:
- name: List resource modules
ansible.netcommon.network_resource:
register: r_net_modules
- name: Gather resource configs
loop: "{{ r_net_modules.modules }}"
loop_control:
loop_var: resource
ansible.builtin.include_tasks:
file: tasks/get_resource_config.yml
- name: Write the config to a file
delegate_to: localhost
ansible.builtin.copy:
content: "{{ device_configs | to_Cisco-PaloAlto_yaml }}"
dest: "{{ config_dest }}"
mode: "0755"
Restore Playbook Configuration Options:
The restore playbook allows you to define its operation using a couple of parameters:
- Target Selection (
_hosts
): This variable restricts the playbook's execution to specific devices. You can specify a group or an individual hostname. - Backup Version Selection (
backup_config_tag
): This parameter determines the specific backup version that will be deployed to the targeted devices.
# backup repository is cloned
...
- name: Restore device configurations (if backup exists)
include_tasks: restore_configs.yml
when: r_config_path.stat.exists
- name: (Optional) Verify restored configuration
include_tasks: verify_config.yml
when: restore_configs.finished and verify_config is defined
...
# tasks/restore_configs.yml
- name: Identify resources needing restoration
include_role:
name: identify_resources
register: restore_resources
- name: Apply configuration for each resource (if needed)
loop: "{{ restore_resources.resources }}"
loop_control:
loop_var: resource
when: resource.config is defined
delegate_to: "{{ resource.device }}"
ansible.netcommon.network_resource:
name: "{{ resource.name }}"
config: "{{ resource.config }}"
state: overridden
...
# tasks/verify_config.yml (Optional)
# Add tasks to verify restored configuration (diff, compare commands etc.)
...
- name: Restore device configs from the backup
# ensure a config backup exists for the target hosts
when: r_config_path.stat.exists
block:
- name: Apply resource config
when:
- resource.value != {}
- resource.value != []
loop: "{{ lookup('file', config_path) | from_yaml | dict2items }}"
loop_control:
loop_var: resource
label: "{{ resource.key }}"
# configs are applied per resource
ansible.netcommon.network_resource:
name: "{{ resource.key }}"
config: "{{ resource.value }}"
# other states are available, for a restore we do a complete override
state: overridden
Integrate Network Automation Playbooks into a Comprehensive Workflow
Now that you have the backup and restore playbooks in place, consider incorporating them into a broader workflow that encompasses the intermediate steps involved in applying updates. In addition to scheduling regular backups at a frequency that aligns with your organization’s needs, it’s highly recommended to establish a restore point immediately before implementing any changes.
This allows for a seamless rollback in case of unexpected issues.
Here’s an example of a workflow (not shown) where I integrate custom configuration updates and incorporate automated restoration from the most recent backup upon encountering failures.
Following the “Apply Configs” step, it’s highly advisable to incorporate a dedicated testing job. This job’s purpose is to verify that the implemented changes haven’t caused any disruptive functionality issues.
A potential approach would be to test the connectivity of various routes and confirm that they behave as anticipated. By integrating this testing phase into your workflow, you gain the confidence to implement changes with a robust safety net in place, minimizing the risk of unintended consequences.
References
open_in_newansible.netcommon: Collection documentation for vendor-agnostic network modules.
netops-issues-with-event-driven-ansible : A simple example — No Shut, No Problem
Ansible Cisco Network Assurance Engine Playbooks
developing_resource_modules_network :Example report built from Ansible common data model for network devices