Is the internet's down?!?!
We all know how dependent daily life relies on internet connections and how frustrations quickly mount when it fails. For years Cisco IOS versatility has offered various solutions to this problem via IP SLA and Track features to detect failures and reroute traffic via alternative interfaces.
While this solution works reliably, there are some draw backs, particularly where NAT is employed, due to stale translations mapped to failed interfaces. Inconveniently not all failures result in interfaces being down.
So what?
Stale NAT translation persisting after fail-over result in extended interruption, until they timeout. Non-DNS UDP translations time out after 5 minutes; DNS times out in 1 minute. TCP translations time out after 24 hours, unless a RST or FIN is seen on the stream, in which case it times out in 1 minute.The common solution
The common solution is the use Cisco IOS Embedded Event Manager (EEM) the automate the clear ip nat translation * command following a fail-over. This sledgehammer approach has drawbacks. Clearing all translations effectively resets all NAT sessions on all interfaces, not just the failed interface. Compounding the issue, is no TCP RST or FIN is sent when the translation is cleared, leading to packets being silently dropped, until the client times out and re-initiates the connection.Sadly IOS lacks a command to clear NAT translations per interface, but it does provide a command to clear individual translations... Not exactly useful when you have thousands of translations.
clear ip nat translation $protocol inside $inside_global_ip $inside_global_port $inside_local_ip $inside_local_port outside $outside_local_ip $outside_local_port $outside_global_ip $outside_global_port
Dual ISP config
Before we get into the specifics, I suggest familiarizing yourself with the standard Dual ISP setup under Cisco IOS. You'll find plenty of examples online, explaining how to configure a Cisco ISR with multiple ISP connections, employing route maps and leveraging the IP SLA and Track features of IOS to detect failures and reroute traffic.
Once such example can be found here
The challenge - Clearing NAT translations per interface.
One of the less known and used features of Cisco IOS is TCL scripting. The below script can be used to selectively clear ip NAT translation per interface, and triggered via EEM in the same manner.Clearing only the NAT translations on failed interfaces will ensure the smoothest fail-over and avoid unnecessary interruptions, particularly where successive failures occur in close succession.
This method will only clear the translations of an interface that has failed SLA.
Part 1 - The TCL Script
If you want to verify the operation of the script, you can manually invoke from the enable (#) prompt using; tclsh bootflash:clearnat.tcl <interface>
tclsh bootflash:clearnat.tcl dialer0
The script is basic.
- Parses the output of show ip interface to grab the IP
- Matching the interface provided in the argument it finds an IPv4 address it continues
- Parse the output of show ip nat translations
- For each translation, it checks for a match IP the interface
- If a match is found, it executes the command clear ip nat translation with the translation parameters.
#!/usr/bin/env tclsh proc clear_nat_translations {interface} { set interface_details [exec "show ip interface $interface"] regexp {Internet address is ([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)} $interface_details -> interface_ipv4 if {[llength interface_ipv4] == 0} { puts "" puts "No IPv4 found for $interface" return } puts "Clearing NAT translations on $interface. IPv4: $interface_ipv4" set nat_entries [exec "show ip nat translations"] set nat_lines [split $nat_entries "\n"] foreach line $nat_lines { if {[regexp {([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)} $line inside_global_ip]} { if {$inside_global_ip == $interface_ipv4} { regexp {(tcp|udp)} $line protocol regexp {:(\d+)} [lindex $line 1] -> inside_global_port regexp {([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)} [lindex $line 2] inside_local_ip regexp {:(\d+)} [lindex $line 2] -> inside_local_port regexp {([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)} [lindex $line 3] outside_local_ip regexp {:(\d+)} [lindex $line 3] -> outside_local_port regexp {([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)} [lindex $line 4] outside_global_ip regexp {:(\d+)} [lindex $line 4] -> outside_global_port puts "clear ip nat translation $protocol inside $inside_global_ip $inside_global_port $inside_local_ip $inside_local_port outside $outside_local_ip $outside_local_port $outside_global_ip $outside_global_port" exec "clear ip nat translation $protocol inside $inside_global_ip $inside_global_port $inside_local_ip $inside_local_port outside $outside_local_ip $outside_local_port $outside_global_ip $outside_global_port" } } } } set argc [llength $argv] if {$argc == 0} { puts "" puts "no command line argument passed" return } set target_interface [lindex $argv 0] clear_nat_translations $target_interface
Part 2- EEM Trigger
To trigger the script add the following EEM configuration.
event manager applet clear_nat_2 authorization bypass
event track 5 state any maxrun 10
action 1.0 syslog msg "Cellular Failover. Clearing NAT Translations"
action 2.0 cli command "enable"
action 3.0 cli command "tclsh bootflash:clearnat.tcl cell0/2/0"
action 4.0 syslog msg "Cleared NAT Translations"
event manager applet clear_nat_1 authorization bypass
event track 10 state any maxrun 10
action 1.0 syslog msg "Dialer Failover. Clearing NAT Translations"
action 2.0 cli command "enable"
action 3.0 cli command "tclsh bootflash:clearnat.tcl dialer0"
action 4.0 syslog msg "Cleared NAT Translations"
Hopefully you find this script useful. Please leave a comment if you find it useful, have issues or want to suggest improvements. 👍