It is possible to configure Highly-Available IPSec VPN tunnel on IOS so that the SA information is replicated between the routers. This ensures that a potential failover will be transparent to users and it will not require adjustments or reconfiguration of any remote peers.
There are two protocols used to deploy this feature, HSRP and Stateful Switchover (SSO). HSRP is one of the First Hop Redundancy Protocols that provide network redundancy for IP networks, ensuring that user traffic immediately and transparently recovers from failures in network edge devices. The protocol monitors the interfaces so that if either interface goes down, the whole router is deemed to be down and the ownership of IKE and IPSec SAs is passed to the standby router (which now transitions to the HSRP active state). SSO allows the active and standby routers to share IKE and IPSec state information so both routers have enough information to become the active router at any time.
Before we take a look at the configuration, let’s have few words about our topology. The internal network (VLAN 146 below) configuration is outside the scope of this post, but it would be normally configured with a separate HSRP instance, tracking not only internal but also external interfaces. The goal is to make sure that the traffic leaving the VPN is entering the Active router (SSO/HSRP –active). So things like default route pointing to the internal VIP or using RRI is something you would definitely want to look at.
Our focus will be the “outside” part, so where the tunnel is terminated. The session will land on a Virtual Address (6.6.156.100) that is associated with our HSRP instance. Configuration of R2 is going to be like a regular L2L tunnel – R10 and R11 is where HSRP and SSO will be deployed. Let’s first look at R2 config :
crypto isakmp policy 10
encr aes
authentication pre-share
group 2
crypto isakmp key cisco address 6.6.156.100
crypto isakmp keepalive 10 3 periodic
crypto ipsec transform-set SET2 esp-aes esp-sha-hmac
crypto ipsec security-association replay window-size 1024
ip access-list extended HA_VPN
permit ip 6.6.2.0 0.0.0.255 6.6.146.0 0.0.0.255
crypto map MAP2 10 ipsec-isakmp
set peer 6.6.156.100
set transform-set SET2
set pfs group2
match address HA_VPN
int g0/0
crypto map MAP2
Again, it is pretty much a regular IKEv1 L2L configuration where we define our Phase I and II Policies, Authentication Credentials, Encryption Domain and we use a crypto map to bind these together and associate with an interface. Two things that were done as well was enabling DPD (Legacy Keepalives are not supported by this feature) and expanding Anti-Reply window. DPD is used to detect liveliness of the remote peer, where Anti-Reply window was increased to avoid any potential problems that might be related to how SSO replicates Sequence Number updates to the standby SA. By default, this happens every X-number of packets and this “X” is then explicitly set to a minimal value on R10 and R11 (1000 packets). Also, note that we are using Pre-Shared Keys for authentication – that’s another limitation of Stateful IPSec Failover.
All right, what do we have to configure on R10 and R11? The same regular settings plus HSRP and SSO :
R10 :
crypto isakmp policy 10
encr aes
authentication pre-share
group 2
crypto isakmp key cisco address 6.6.25.2
crypto isakmp keepalive 10 3 periodic
ip access-list extended HA_VPN
permit ip 6.6.146.0 0.0.0.255 6.6.2.0 0.0.0.255
crypto ipsec transform-set SET2 esp-aes esp-sha-hmac
crypto map MAP2 10 ipsec-isakmp
set peer 6.6.25.2
set transform-set SET2
set pfs group2
match address HA_VPN
crypto ipsec security-association replay window-size 1024
Now let’s look at that “extra” configuration. Here’s how you can tune Anti-Reply updates :
crypto map MAP2 redundancy replay-interval in 1000 out 1000
Next, we need to tell the router what addresses and ports will be used by SSO to replicate the state information (timeout settings shown are based on the default values). A similar configuration will be done on R11 but the addresses and ports will be reversed (local will be R11, remote R10) :
ipc zone default
association 1
no shutdown
protocol sctp
local-port 5000
local-ip 6.6.146.10
retransmit-timeout 300 10000
path-retransmit 5
assoc-retransmit 5
remote-port 5000
remote-ip 6.6.146.11
Now we should build a tracking object. Instead of looking only at G0/1, we will be looking at two interfaces (inside and outside), to ensure that failover takes place no matter which of the two interfaces fails (remember, our Active Router must be active for both networks to avoid traffic black-holing).
track 1 interface GigabitEthernet0/0 line-protocol
track 2 interface GigabitEthernet0/1 line-protocol
track 3 list boolean and
object 1
object 2
It is important to keep HSRP priorities the same on both routers. This is needed because the SSO-standby device always reboots to sync its state with the Active box. If you left one router with a higher priority, and this device failed, the other router would self-reboot after the previously active box comes alive again.
interface GigabitEthernet0/1
ip address 6.6.156.10 255.255.255.0
standby 2 ip 6.6.156.100
standby 2 preempt
standby 2 priority 100
standby 2 name HSRP
standby 2 track 3 decrement 30
crypto map MAP2 redundancy HSRP stateful
Note that the crypto map was applied with the “redundancy stateful” option. Finally we need to activate inter-device SSO communication:
redundancy inter-device
scheme standby HSRP
A very similar configuration is done on R11. The only changes from R10 config are done to SSO (as explained earlier) :
R11 :
crypto isakmp policy 10
encr aes
authentication pre-share
group 2
crypto isakmp key cisco address 6.6.25.2
crypto isakmp keepalive 10 3 periodic
ip access-list extended HA_VPN
permit ip 6.6.146.0 0.0.0.255 6.6.2.0 0.0.0.255
crypto ipsec transform-set SET2 esp-aes esp-sha-hmac
crypto map MAP2 10 ipsec-isakmp
set peer 6.6.25.2
set transform-set SET2
set pfs group2
match address HA_VPN
crypto ipsec security-association replay window-size 1024
crypto map MAP2 redundancy replay-interval in 1000 out 1000
ipc zone default
association 1
no shutdown
protocol sctp
local-port 5000
local-ip 6.6.146.11
retransmit-timeout 300 10000
path-retransmit 5
assoc-retransmit 5
remote-port 5000
remote-ip 6.6.146.10
track 1 interface GigabitEthernet0/0 line-protocol
track 2 interface GigabitEthernet0/1 line-protocol
track 3 list boolean and
object 1
object 2
interface GigabitEthernet0/1
ip address 6.6.156.11 255.255.255.0
standby 2 ip 6.6.156.100
standby 2 preempt
standby 2 priority 100
standby 2 name HSRP
standby 2 track 3 decrement 30
crypto map MAP2 redundancy HSRP stateful
redundancy inter-device
scheme standby HSRP
NOTE : If you are using 15.2 (3)T to test it (like what we have on our routers), it is definitely advisable to disable the Hardware VPN module due to a Bug (on both, R10 and R11) :
no crypto engine onboard 0
Once you configured the devices the HSRP standby unit will now reload to synchronize SAs with the Active unit.
Time to verify our configuration:
R11#sh crypto engine brief
crypto engine name: Virtual Private Network (VPN) Module
crypto engine type: hardware
State: Disabled
Location: onboard 0
Product Name: Onboard-VPN
HW Version: 1.0
Compression: Yes
DES: Yes
3 DES: Yes
AES CBC: Yes (128,192,256)
AES CNTR: No
Maximum buffer length: 0000
Maximum DH index: 0000
Maximum SA index: 0000
Maximum Flow index: 3200
Maximum RSA key size: 0000
crypto engine name: Cisco VPN Software Implementation
crypto engine type: software
serial number: 12D23801
crypto engine state: installed
crypto engine in slot: N/A
R10#sh standby brief
P indicates configured to preempt.
|
Interface Grp Pri P State Active Standby Virtual IP
Gi0/1 2 100 P Active local 6.6.156.11 6.6.156.100
R10#sh crypto ha
IKE VIP: 6.6.156.100
stamp: C0 7F F5 AE 96 7E 06 E3 FC 92 F3 60 92 51 49 88
IPSec VIP: 6.6.156.100
R10#sh redundancy states
my state = 13 -ACTIVE
peer state = 8 -STANDBY HOT
Mode = Duplex
Unit ID = 0
Maintenance Mode = Disabled
Manual Swact = enabled
Communications = Up
client count = 13
client_notification_TMR = 60000 milliseconds
RF debug mask = 0x0
R11#sh redundancy states
my state = 8 -STANDBY HOT
peer state = 13 -ACTIVE
Mode = Duplex
Unit ID = 0
Maintenance Mode = Disabled
Manual Swact = cannot be initiated from this the standby unit
Communications = Up
client count = 14
client_notification_TMR = 60000 milliseconds
RF debug mask = 0x0
R10#sh redundancy inter-device
Redundancy inter-device state: RF_INTERDEV_STATE_ACT
Scheme: Standby
Groupname: HSRP Group State: Active
Peer present: RF_INTERDEV_PEER_COMM
Security: Not configured
R11#sh redundancy inter-device
Redundancy inter-device state: RF_INTERDEV_STATE_STDBY
Scheme: Standby
Groupname: HSRP Group State: Standby
Peer present: RF_INTERDEV_PEER_COMM
Security: Not configured
R2#ping 6.6.146.4 source g0/0 rep 5
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 6.6.146.4, timeout is 2 seconds:
Packet sent with a source address of 6.6.2.2
.!!!!
Success rate is 80 percent (4/5), round-trip min/avg/max = 28/28/28 ms
R10#sh crypto session detail
Crypto session current status
Code: C - IKE Configuration mode, D - Dead Peer Detection
K - Keepalives, N - NAT-traversal, T - cTCP encapsulation
X - IKE Extended Authentication, F - IKE Fragmentation
Interface: GigabitEthernet0/1
Uptime: 01:02:50
Session status: UP-ACTIVE
Peer: 6.6.25.2 port 500 fvrf: (none) ivrf: (none)
Phase1_id: 6.6.25.2
Desc: (none)
IKEv1 SA: local 6.6.156.100/500 remote 6.6.25.2/500 Active
Capabilities:D connid:1002 lifetime:22:57:09
IPSEC FLOW: permit ip 6.6.146.0/255.255.255.0 6.6.2.0/255.255.255.0
Active SAs: 2, origin: crypto map
Inbound: #pkts dec'ed 4 drop 0 life (KB/Sec) 4355669/3236
Outbound: #pkts enc'ed 4 drop 0 life (KB/Sec) 4355669/3236
Let’s now start sending traffic from R2 and disable G0/0 on R10 (this causes it to reboot since it lost the SSO-active status) – see what happens :
R2#ping 6.6.146.4 source g0/0 rep 50000
Type escape sequence to abort. Sending 50000, 100-byte
ICMP Echos to 6.6.146.4, timeout is 2 seconds:
Packet sent with a source address of 6.6.2.2
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!U.
...!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!
Looks like 5 Echoes were lost but then we have the connectivity again.
R11#
*Dec 11 21:44:26.123: %HSRP-5-STATECHANGE:
GigabitEthernet0/1 Grp 2 state Standby -> Active
*Dec 11 21:44:26.127: %CRYPTO-5-IPSEC_SA_HA_STATUS:
IPSec sa's if any, for vip 6.6.156.100 will change
from STANDBY to ACTIVE
R11#sh crypto session detail
Crypto session current status
Code: C - IKE Configuration mode, D - Dead Peer Detection
K - Keepalives, N - NAT-traversal, T - cTCP encapsulation
X - IKE Extended Authentication, F - IKE Fragmentation
Interface: GigabitEthernet0/1
Session status: UP-ACTIVE
Peer: 6.6.25.2 port 500 fvrf: (none) ivrf: (none)
Desc: (none)
Phase1_id: (none)
IKEv1 SA: local 6.6.156.100/500 remote 6.6.25.2/500 Active
Capabilities:D connid:1001 lifetime:23:49:39
IPSEC FLOW: permit ip 6.6.146.0/255.255.255.0 6.6.2.0/255.255.255.0
Active SAs: 2, origin: crypto map
Inbound: #pkts dec'ed 138 drop 0 life (KB/Sec) 3742576/3083
Outbound: #pkts enc'ed 135 drop 0 life (KB/Sec) 4203376/3083
Finally I want to mention that this feature is very buggy, especially in the IOS code that we run on our devices (disabling the Hardware VPN module does not appear to solve all of the problems – (By the way, don’t try this in Production). You may see that SA synchronization works only until the first failure occurs after it the SA data does not appear to be mirrored to the standby device, even that SSO states are shown correctly.