Sip registeration on HA Cluster

phcjpp
Posts: 19
Member Since:
2007-09-05

Hi All,

I have followed the posts on this forum on creating a HA Trixbox cluster with success. One question. I have 8 linksys 942 sip phones. All pointing at the floating ip (which hops back and forward happily). When I lose the master node, the phones don't seem to re-register with the backup node (now master obviously). Am I doing something wrong ? Do I need a very short SIP regisatration period (currently set at 3600 seconds) for the phones ?

Thanks
Chris



SkykingOH
Posts: 9681
Member Since:
2007-12-17
Are you using Rainlink or a

Are you using Rainlink or a similar VRRP driver? Once the gratuitous ARP is sent the backup server has the same IP as the original master. The phones do not have any to re-register.

I am thinking of ways to perform a force register. Do you have the SIP call flows mapped out for the failure scenarios?

If so could you share them with us?

Thanks....Scott

--

Scott

aka "Skyking"



jahyde
Posts: 2002
Member Since:
2006-06-02
what needs to happen is you

what needs to happen is you need to build a script into your ha.cf that sends a sip notify message to all the phones, something like a chk-cfg on the linksys i think. It would basically be a shell script that does asterisk -rx "sip notify extension number" for all your extensions.

or shorten your registration timeouts in the config files.

--

--my PBX is run on 2 V8's



phcjpp
Posts: 19
Member Since:
2007-09-05
I don't think simply

I don't think simply changing the ip is enough. The issue is registration with asterisk I think. The new asterisk is not aware it has phones registered. The old (dead) one had the registrations.

I am using the heartbeat / raid 1filesystem method of clustering (drbd is it).

Ta
Chris



phcjpp
Posts: 19
Member Since:
2007-09-05
Thanks jahyde Now the

Thanks jahyde

Now the obvious question - anyone got an example lying around?

Chris



phcjpp
Posts: 19
Member Since:
2007-09-05
Also I guess the message

Also I guess the message would have to be ip based as the registration has been lost at the asterisk end.

Chris



SkykingOH
Posts: 9681
Member Since:
2007-12-17
That would work, I assume a

That would work, I assume a failover event could trigger the script?

If the phone is not registered how will the server have the IP in the peer definition?

Scott

--

Scott

aka "Skyking"



phcjpp
Posts: 19
Member Since:
2007-09-05
I know every phone ip. There

I know every phone ip. There are only 2 servers. Esay.

Just need to work out what the message is.

Chris



SkykingOH
Posts: 9681
Member Since:
2007-12-17
So you are going to use

So you are going to use static IP settings for the peers? If you send the sip-info message and Asterisk can't send the message to the phone it won't work.

I also want to mention another potential problem. Your switch is going to have all of the ARP entries for server A, then the fail over will try and send a new ARP update out. It my experience this take a bit to settle down. A blast to all the phones may not get to them on the first try.

Scott

--

Scott

aka "Skyking"



phcjpp
Posts: 19
Member Since:
2007-09-05
not sure I understand. I

not sure I understand.

I have a floating ip address 192.168.0.14 that can point ot 1 of 2 servers. When one goes down the other gets the ip address. All the phones point to 192.168.0.14 no matter what.

The phone's ip addresses are know 192.168.0.90 though 98.

Is it not a matter of looping through each phone ip saying, 'Oi !' re-register now ?

Sorry if I misunderstand.

Chris



joshpatten
Posts: 733
Member Since:
2007-01-20
Try a transparent SIP proxy.

Try a transparent SIP proxy. Endian has one built into their firewall software. Not sure if this will solve the issue, but its worth a shot.



phcjpp
Posts: 19
Member Since:
2007-09-05
A little further:

A little further:

/etc/asterisk/sip_notify.conf

[spa-reboot]
Event=>reboot
Content-Length=>0

linksys-942 config - Ext1 page - >

Auth Resync-Reboot: = no

asterisk -rx "sip notify spa-reboot 101"
gives
Sending NOTIFY of type 'spa-reboot' to '101'

Phone then reboots !!

Now how to do this if the phone is not registered ???

asterisk -rx "sip notify spa-reboot 192.168.0.90"

does nothing to the phone

any ideas ?

Almost there

Chris



phcjpp
Posts: 19
Member Since:
2007-09-05
even closer ... I think i

even closer ... I think i need to send

asterisk -rx "sip notify spa-reboot 101@192.168.0.90"

it just won't let me do it. 101 works and it accepts the IP on its own but nothing happens as you can see below from the sip messages its missing the 101@ part

Transmitting (no NAT) to 192.168.0.90:5060:
NOTIFY sip:192.168.0.90 SIP/2.0
Via: SIP/2.0/UDP 127.0.0.1:5060;branch=z9hG4bK5531cb71;rport
From: "Unknown" ;tag=as4a362dd4
To:
Contact:
Call-ID: 6118feb95394bed34596445d4f646197@127.0.0.1
CSeq: 102 NOTIFY
User-Agent: Asterisk PBX
Max-Forwards: 70
Event: reboot
Content-Length: 0

---
Scheduling destruction of SIP dialog '70a1a4e664c60fda2f7f688f0701118a@192.168.0.12' in 32000 ms (Method: NOTIFY)
trixbox1*CLI>

SIP/2.0 404 Not Found
To: ;tag=8625ec2573853100i0
From: "Unknown" ;tag=as4a362dd4
Call-ID: 6118feb95394bed34596445d4f646197@127.0.0.1
CSeq: 102 NOTIFY
Via: SIP/2.0/UDP 127.0.0.1:5060;branch=z9hG4bK5531cb71
Server: Linksys/SPA942-5.2.8
Content-Length: 0

Transmitting (NAT) to 192.168.0.90:5060:
NOTIFY sip:101@192.168.0.90:5060 SIP/2.0
Via: SIP/2.0/UDP 127.0.0.1:5060;branch=z9hG4bK3a2c5f62;rport
From: "Unknown" ;tag=as326df00a
To:
Contact:
Call-ID: 162a36772eb0e1c250044674197430bd@127.0.0.1
CSeq: 102 NOTIFY
User-Agent: Asterisk PBX
Max-Forwards: 70
Event: reboot
Content-Length: 0

---
Scheduling destruction of SIP dialog '788554aa489bea8352082fff1c1bbba6@192.168.0.12' in 6400 ms (Method: NOTIFY)
trixbox1*CLI>

SIP/2.0 200 OK
To: ;tag=1cd8a9f094bb1694i0
From: "Unknown" ;tag=as326df00a
Call-ID: 162a36772eb0e1c250044674197430bd@127.0.0.1
CSeq: 102 NOTIFY
Via: SIP/2.0/UDP 127.0.0.1:5060;branch=z9hG4bK3a2c5f62
Server: Linksys/SPA942-5.2.8
Content-Length: 0



joshelson
Posts: 243
Member Since:
2006-12-07
As far as I am aware, the

As far as I am aware, the SIP notification method only works when a valid registration is held on the Asterisk server. In the HA / failover case, it seems like it would be pretty difficult to guarantee that for script execution in haresources.

Given that even in larger environments, registration traffic is pretty negligible, I would set the registration period on the phones to 30 seconds or so. That is by far the simplest solution (avoids having to maintain IP address lists, etc..). In the case of a failover, you can guarantee that inbound calling will be restored within 30 seconds. Outbound calling should immediately work, assuming you're using the default TB configuration.

Josh

--

FluentStream Technologies - Integrate * Communicate



SkykingOH
Posts: 9681
Member Since:
2007-12-17
Do the phones have a static

Do the phones have a static IP? This is getting messier.

I still think it is an ARP issue. If you crank down the phones registration timers to 1 minute the ARP table should be settled down. It's a lot more elegant a solution than rebooting all the phones on the network.

This seems to be a whole lot of work for an 8 phone system. The solutions being discussed would not scale to a large install.

Scott

--

Scott

aka "Skyking"



jahyde
Posts: 2002
Member Since:
2006-06-02
I can instantly make calls

I can instantly make calls after a failover, so arp doesnt matter.

--

--my PBX is run on 2 V8's



SkykingOH
Posts: 9681
Member Since:
2007-12-17
John, That makes sense, if

John,

That makes sense, if the phone can register it should be able to place the call. Why is the OP wanting to reboot his phones? If the registration timer is set in the phone all should be fine.

I truly want to understand this as I am going to try my hand at my first HA cluster within the next 30 days.

Thanks.....Scott

--

Scott

aka "Skyking"



phcjpp
Posts: 19
Member Since:
2007-09-05
The 30 secs will fix it. By

The 30 secs will fix it. By default its set at 3600 I had assumed it had to be along time.

I will have the linksys endpoint manager code from 3600 to 30.

Chris



joshelson
Posts: 243
Member Since:
2006-12-07
Echoing what Scott said.

Echoing what Scott said. You will be able to immediately make calls. With the most common of TB configurations, you don't even really need a registration to make calls.

You won't however, be able to receive calls on the system without a valid registration on the Asterisk system.

In my testing of this solution, failing over doesn't consistently hold registration information. If you do a 'show hints' on the primary, stop heartbeat on the primary (causing the secondary to assume all system functions), and then do a 'show hints' all extensions show up as unavailable initially. They do eventually reregister, but with a default registration period, it can take awhile before inbound calling works. Outbound calling works immediately. There likely is an ARP or other lower-level networking issue that would need to be looked into. Simple DRBD mirroring definitely doesn't carry registration information across.

My preference would be for a solution that maintained registrations in failover condition. That said, the reregister method is perfectly acceptable, but it does generate some junk Asterisk traffic for the 99.99% of the time you're (hopefully) not trying to fail over hardware.

Josh

--

FluentStream Technologies - Integrate * Communicate



langolier42
Posts: 1
Member Since:
2009-04-09
phones ignoring gratuitous arp

I'm having the same problem with a different HA solution and linksys spa942 phones. When I failover to backup server during a call, I can see the backup server sending out gratuitous arp, but phone continues to send RTP to the old mac of the virtual IP. I have tried upgrading to a fairly recent version of spa942 firmware (6.1.3a), but with no success. Problem also occurs when not in call; phone receives gratuitous arp, but still send sip signalling to previous mac address of IP address.

Problem seems not to occur with Cisco 7960 phones.



SkykingOH
Posts: 9681
Member Since:
2007-12-17
This is a different problem

This is a different problem than I spoke of and is not a Garp issue.

The phones have switches in them, these switches join the spanning tree instance and you need to make sure that all of the switches in the network are electing the switch the trixbox cluster is on as the root bridge.

If you have unmanaged switches in the mix you could have this issue.

--

Scott

aka "Skyking"



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.