Sip extensions becoming unreachable

Rich_uk
Posts: 7
Member Since:
2008-06-28

Hi everyone,

The summary is, that i have got a server running windows 2008 with VMware Server 2.0 and over time i have created some VMs and have tried different versions of Trixbox, including the latest download 2.8.0.4. I have been trying to get this working for a while and have edged a bit closer each time and for a brief moment i had everything working for SIP Extensions, SIP Trunks and then suddenly the SIP Extensions just became "Unreachable" and i was stumped!! And have not found any solution yet.

Hardware:
The Windows 2008 Server hosting VMware 2.0 is a 4 Processor system with 8GB RAM and some RAIDed SATA HDDs.

Topology:
The Server is in a colocation facility behind a Linksys BEFSR41 router, with the following ports forwarded
*5060-5100 - Sip Registation
*10001-20000 - RTP
*8000-8040 - X-Lite additional ports.
The Linksys router holds the static IP and can put the Trixbox either in a NAT or put it in the DMZ as no-NAT. Trixbox is currently is in a NAT environment.

SIP Phones:
X-Lite phones are used to connect to the Trixbox remotely, these are behind their own router in a NAT environment.

Other Systems:
Have a separate dedicated physical Asterisk system (no VMware) which has similar network topology (in NAT) and both Asterisk and X-Lite work. Even when phones dont work with Win 2008 system in collocation.

To provide you an overview, basically..... when i get through my installataion and perform basic SIP testing everything works, meaning that i can call between two internal SIP extensions using X-Lite. Great!!! After confirming this i start to configure the trunks, dialplans and eventually the menus, but sometime before finishing the config i lose the ability to call between the SIP extensions?? And this is because the SIP extensions become "Unreachable", after being reachable at the beginning.

When the SIP extensions get into the "unreachable" state, i found that it is due to the "Qualify" setting on the SIP extension. The problem is that I use this option to improve the interaction between the Trixbox and Extension, without it there is an arbitrary amount of time before a connection is made and sometimes it does not make any connection and so fails.

As i understand it, this "Qualify" option tells the Trixbox to perform a type of ping (a SIP message sent from Trixbox that requires a response from sip phone? i think). After Trixbox does this ping, it knows how far away in milliseconds the extension is and then the interaction is greatly improved. I have used this option with success on a non-virtual Asterisk installation, but cannot understand why this would be an issue on a VM running Trixbox, because it originally works and breaks later?

In one installation of trixbox, i try to retrace my steps to undo anything that i did, but once i get the SIP extensions in this state i cannot seem to fix it??? I have been all over the internet and tried some timing settings in the VM for CentOS to work at 100MHz, installed VMware tools, made it the only VM on the machine, provided different amounts of RAM (1GB, 2GB). I have even tried to use wireshark behind the routers and use tcpdump on the Trixbox to try and verify this ping type "Qualify" message.

So instead of banging my head on this thing, i want to field the question to you guys for any suggestions.

Thanks!!

Rich.



Rich_uk
Posts: 7
Member Since:
2008-06-28
Fixed it, but now i do not know why?

Basically, i went through line by line and found that the "localnet" entry in the SIP_Nat.conf was making all the SIP extensions that connect externally over the internet (NAT) become Unreachable. In other installations of Asterisk, I added this entry and all was fine. I believe that this entry is used to tell the Trixbox to not use NAT on any devices that are on the network specified in this entry, which is why it should be the same subnet that the Trixbox is on.
But with this line existing in the SIP_Nat.conf, it totally breaks Trixbox, because no calls can be made or received to extensions that are unreachable.

Anyone have any ideas why??

TIA, anything appreciated.

Rich.



SkykingOH
Posts: 9681
Member Since:
2007-12-17
Can you post the content of

Can you post the content of sip_nat.conf that breaks the system?

Quote:
tell the Trixbox to not use NAT on any devices that are on the network specified in this entry, which is why it should be the same subnet that the Trixbox is on.

This is correct.

--

Scott

aka "Skyking"



Rich_uk
Posts: 7
Member Since:
2008-06-28
SIP_Nat.conf

Hi Scott

Here is the sip_nat.conf file:

externip=1.2.3.4
#localnet=192.168.6.0/24
nat=yes
stunaddr=stun.counterpath.net

The Trixbox all started working when i commented the "localnet" line. I have also just retested this and as soon as i uncomment that line and reload, the SIP extensions become unreachable.

Thanks,

Rich.



SkykingOH
Posts: 9681
Member Since:
2007-12-17
What is the IP of the trixbox?

Scott

aka "Skyking"

--

Scott

aka "Skyking"



Rich_uk
Posts: 7
Member Since:
2008-06-28
The Internal IP of the

The Internal IP of the Trixbox is 192.168.6.150.

Thanks,



obeliks
Posts: 878
Member Since:
2010-03-14
Here is a required reading

Here is a required reading for people thinking about using stun:

http://fonality.com/trixbox/forums/trixbox-forums/open-discussion...



Atcom Alberta
Posts: 220
Member Since:
2008-07-14
I've run into the same

I've run into the same issues on Trixbox 2.6 - localnet settings totally kill SIP trunks with qualify turned on. When I watch the activity in asterisk I see messages coming and going to the correct IP but asterisk doesn't seem to accept the reply. No solution found to date...



Rich_uk
Posts: 7
Member Since:
2008-06-28
Hi Obeliks, Thanks for

Hi Obeliks,

Thanks for pointing out that useful link i did not know that STUN did not work in the current Asterisk. So from the link inside that post (http://forums.digium.com/viewtopic.php?t=74252) it mentions that this might be in the Asterisk 1.8 release.

However, I have followed another link to view a closed Asterisk issue that states that the STUN has recently been removed in Asterisk revision 282304 (2010-08-16 14:31).

So it looks like a safe move to not use STUN.

Thanks again for this info,

Rich.



Rich_uk
Posts: 7
Member Since:
2008-06-28
Same same

Hi Atcom Alberta,

I too have sniffed the packets from SIP to Trixbox with the "localnet" line used and the "qualify" option turned on for SIP extensions and i have seen the same as you. The SIP extensions send the replies just fine (if i can correctly recall.... many replies), but the Trixbox does not seem to do anything with them.

I can use this Trixbox without the "localnet" line, but i am looking to upgrade another soon which does have local extensions with the external SIP extensions. This sounds like i cannot implement external SIP Extensions with local ZAP devices? If that were true, then i would also need to find a Trixbox version that supports both "localnet" and "qualify"?

Thanks,

Rich.



Rich_uk
Posts: 7
Member Since:
2008-06-28
Were you deploying Trixbox to a VM or to a physical machine?

Hi Atcom Alberta,

I have one question....... were you deploying Trixbox to a VM or to a physical machine?

BTW (This is for anyone googling.... installing Trixbox on a VM can be done) ---- I have my VM running and successfully taking calls on VMware Server 2.0 with Trixbox 2.8.0.4 running on it (which has Asterisk 1.6 under its hood). This VM was built to run lean and has 2GB RAM, but it only has a 3GB HDD with approx 2.25 GB assigned for the main partition (not Boot) and has 0.5 GB for the Swap partition. This config may change if necessary, but it is on test and is performing very well right now. The aim of this VM was to run as much as possible of Trixbox in the RAM and avoid swapping if possible (I have not yet configured the Swappiness setting). This Trixbox is for low usage and has successfully had 10 calls running at the same time.

Thanks for everyone's input ;)

Rich.



Atcom Alberta
Posts: 220
Member Since:
2008-07-14
Each case I've run into this

Each case I've run into this issue involved a physical server used for nothing other than trixbox (& DHCP, Vlan, etc.) The only work-arounds I've found are to turn off Qualify for any external SIP trunks and set NAT=no for internal extensions.



hbasbay
Posts: 19
Member Since:
2010-06-23
I have a same problem with

I have a same problem with my trixbox server.In over time, SIP extensions that connect externally over the internet (NAT) become Unreachable.I can not find any solutions too.I follow this post.



hbasbay
Posts: 19
Member Since:
2010-06-23
I would like to present you with a quote

I would like to present you with a quote - http://www.asteriskguru.com/tutorials/peer_is_now_unreachable.htm...

-----------------------------------------------------------------------------------------------------------------------------------------------
sip_poke_noanswer: Peer 'XXX' is now UNREACHABLE!

1. Description

In sip.conf there is an option for every peer called qualify.
If qualify=yes or a numeric value, then asterisk will sometimes poke this peer by sending a "SIP OPTIONS" request to phones or other pbx's.

If they do not reply on time, they will be considered unreachable, and this message will be printed on the asterisk CLI.

When the phone is back online (first time it replies on time) then asterisk will tell you Peer 'XXX' is now REACHABLE, if we got a reply from the phone, but not on time, the message Peer 'XXX' is now too LAGGED will be printed on the CLI.

The timeout is set to 2000ms by default. (If you specify qualify=yes).
But you could also set it to any other value.

e.g. qualify=3000

2. Reasons for seeing this message:

When a phone is rebooted, or when a phone hangs, or when its shut down this message might pop up.

(Or when there is a too big delay on the network).

If all your phones become unreachable at the same time, its probably your asterisk server that has network problems instead of the phone.

When a phone is unreachable, asterisk will not try to call it. (So you might want to set this value not too low, or you might want to completely disable it).

If the phone that has unreachable messages all the time is behind a NAT, it might be that the UDP timeout is set too low on the firewall.

-----------------------------------------------------------------------------------------------------------------------------------------------

what do you think about this? I am try to set the "qualify=3000" . Let's see the problem solved?



SkykingOH
Posts: 9681
Member Since:
2007-12-17
I think if you have 3 second

I think if you have 3 second delays on your network you have bigger problems.

--

Scott

aka "Skyking"



hbasbay
Posts: 19
Member Since:
2010-06-23
Dear SkykingOH, I did to set

Dear SkykingOH,

I did to set the "qualify=3000" (only one extension to adapt it) .there is not any problem in tb.I am waiting now.



SkykingOH
Posts: 9681
Member Since:
2007-12-17
3000ms = 3 seconds.

3000ms = 3 seconds. Something else is wrong if your endpoints take 3 seconds to turn a packet around.

We have a switch in Hong Kong and it's only 300ms lagged from Cleveland.

--

Scott

aka "Skyking"



hbasbay
Posts: 19
Member Since:
2010-06-23
wait came to an end.And

wait came to an end.And result is a big disappointment.all extensions UNREACHABLE one by one.I did not got a another solution.



cougarmast
Posts: 201
Member Since:
2007-02-05
Why not try XEN in PV mode

Why not try XEN in PV mode will give much more stability and way faster than other types of virtualization. I have many running in this method with 4 port cards passed to it and works flawlessly. Have many remote extension. The same server also runs Zentyal for samba and ftp and pfSense for firewalls plus a few Ubuntu and windows for remote desktop. We deploy these servers here in Hong Kong though the uphill battle to use linux is horrendeous as most people here are M$ zombies with Apple Iphone (brain dead) users. Anyway just a suggestion.



hbasbay
Posts: 19
Member Since:
2010-06-23
I changed sip ports on every

I changed sip ports on every extensions and give them a 5061-5062-5063 ..... .It's looks like a work but I am not sure.



hbasbay
Posts: 19
Member Since:
2010-06-23
up up

up up



SkykingOH
Posts: 9681
Member Since:
2007-12-17
Quote: up up And away? Did
Quote:
up up

And away?

Did you fix your network?

--

Scott

aka "Skyking"



hbasbay
Posts: 19
Member Since:
2010-06-23
dear SkykingOH, Yes I fixed

dear SkykingOH,

Yes I fixed it but the problem is still going on.I can not understand.



mjaber
Posts: 5
Member Since:
2011-01-22
Unreachable sip phones

Greetings all,
Thanks in advance for your help with my issue. My trixbox and Polycom soundpoint 430 phones are on the same subnet. I don’t have any sip phones outside my LAN. Every now and then our phones become unreachable and we no longer able to make and receive phone calls. I notice on the call log that during this time (when phones are unreachable) incoming call go directly to our main voicemail.
Do you recommend changing the NAT to No? What about the settings for “Qualify”, is it safe to change it to No? Our phones are assigned IP addresses via DHCP. Do I need to assign static ip addresses? All PCs and phones are on the same VLAN. Do you recommend separating them?

Here is log:
[Mar 8 16:42:39] DEBUG[2766] pbx.c: FONALITY: This thread has already held the conlock, skip locking
[Mar 8 16:46:20] NOTICE[2779] chan_sip.c: Peer '202' is now UNREACHABLE! Last qualify: 15
[Mar 8 16:46:20] DEBUG[2766] pbx.c: FONALITY: This thread has already held the conlock, skip locking
[Mar 8 16:46:27] NOTICE[2779] chan_sip.c: Peer '204' is now UNREACHABLE! Last qualify: 16
[Mar 8 16:46:27] DEBUG[2766] pbx.c: FONALITY: This thread has already held the conlock, skip locking
[Mar 8 16:46:37] NOTICE[2779] chan_sip.c: Peer '200' is now UNREACHABLE! Last qualify: 14
[Mar 8 16:46:37] DEBUG[2766] pbx.c: FONALITY: This thread has already held the conlock, skip locking
[Mar 8 16:46:39] NOTICE[2779] chan_sip.c: Peer '205' is now UNREACHABLE! Last qualify: 16
[Mar 8 16:46:39] DEBUG[2766] pbx.c: FONALITY: This thread has already held the conlock, skip locking
[Mar 8 16:47:00] NOTICE[2779] chan_sip.c: Peer '201' is now UNREACHABLE! Last qualify: 16
[Mar 8 16:47:00] DEBUG[2766] pbx.c: FONALITY: This thread has already held the conlock, skip locking
[Mar 8 16:47:09] NOTICE[2779] chan_sip.c: Peer '203' is now UNREACHABLE! Last qualify: 16
[Mar 8 16:47:14] NOTICE[2779] chan_sip.c: Peer '206' is now UNREACHABLE! Last qualify: 16
[Mar 8 16:47:14] DEBUG[2766] pbx.c: FONALITY: This thread has already held the conlock, skip locking
Your help is appreciated!!!



A.Salah
Posts: 99
Member Since:
2011-02-16
I have the same issue now ..

I have the same issue now .. my TB was great then suddenly 10 days ago everything started
a hacking trial
fail2ban
everything ok ...
then SIP phones and trunks become unreachable from time to time .. it goes down then fixed by it own!!!

my Log file states that:

[Jan 10 16:00:11] ERROR[4321] /usr/src/redhat/BUILD/asterisk16-1.6.0.26/include/asterisk/lock.h: chan_sip.c line 20825 (restart_monitor): Deadlock? waited 30 sec for mutex '&monlock'?
[Jan 10 16:00:11] ERROR[4321] /usr/src/redhat/BUILD/asterisk16-1.6.0.26/include/asterisk/lock.h: chan_sip.c line 20806 (do_monitor): '&monlock' was locked here.
[Jan 10 16:00:12] ERROR[4342] /usr/src/redhat/BUILD/asterisk16-1.6.0.26/include/asterisk/lock.h: chan_sip.c line 20825 (restart_monitor): Deadlock? waited 5 sec for mutex '&monlock'?
[Jan 10 16:00:12] ERROR[4342] /usr/src/redhat/BUILD/asterisk16-1.6.0.26/include/asterisk/lock.h: chan_sip.c line 20806 (do_monitor): '&monlock' was locked here.
[Jan 10 16:00:12] ERROR[4336] /usr/src/redhat/BUILD/asterisk16-1.6.0.26/include/asterisk/lock.h: chan_sip.c line 20825 (restart_monitor): Deadlock? waited 10 sec for mutex '&monlock'?
[Jan 10 16:00:12] ERROR[4336] /usr/src/redhat/BUILD/asterisk16-1.6.0.26/include/asterisk/lock.h: chan_sip.c line 20806 (do_monitor): '&monlock' was locked here.
[Jan 10 16:00:16] ERROR[4321] /usr/src/redhat/BUILD/asterisk16-1.6.0.26/include/asterisk/lock.h: chan_sip.c line 20825 (restart_monitor): Deadlock? waited 35 sec for mutex '&monlock'?
[Jan 10 16:00:16] ERROR[4321] /usr/src/redhat/BUILD/asterisk16-1.6.0.26/include/asterisk/lock.h: chan_sip.c line 20806 (do_monitor): '&monlock' was locked here.
[Jan 10 16:00:16] NOTICE[3131] chan_sip.c: Peer '2226' is now UNREACHABLE! Last qualify: 14
[Jan 10 16:00:16] NOTICE[3131] chan_sip.c: Peer '2111' is now UNREACHABLE! Last qualify: 2891
[Jan 10 16:00:16] NOTICE[3131] chan_sip.c: Peer '2151' is now UNREACHABLE! Last qualify: 9
[Jan 10 16:00:16] NOTICE[3131] chan_sip.c: Peer '2223' is now UNREACHABLE! Last qualify: 16
[Jan 10 16:00:16] NOTICE[3131] chan_sip.c: Peer '2124' is now UNREACHABLE! Last qualify: 9
[Jan 10 16:00:16] NOTICE[3131] chan_sip.c: Peer '2258' is now UNREACHABLE! Last qualify: 8
[Jan 10 16:00:16] NOTICE[3131] chan_sip.c: Peer '2195' is now UNREACHABLE! Last qualify: 12

.
.
.
.
then

[Jan 10 16:00:16] NOTICE[3131] chan_sip.c: Peer '2136' is now UNREACHABLE! Last qualify: 12
[Jan 10 16:00:16] NOTICE[3131] chan_sip.c: Peer '2219' is now UNREACHABLE! Last qualify: 196
[Jan 10 16:00:16] NOTICE[3131] chan_sip.c: Peer '2202' is now UNREACHABLE! Last qualify: 227
[Jan 10 16:00:17] ERROR[4342] /usr/src/redhat/BUILD/asterisk16-1.6.0.26/include/asterisk/lock.h: chan_sip.c line 20825 (restart_monitor): Deadlock? waited 10 sec for mutex '&monlock'?
[Jan 10 16:00:17] ERROR[4342] /usr/src/redhat/BUILD/asterisk16-1.6.0.26/include/asterisk/lock.h: chan_sip.c line 20806 (do_monitor): '&monlock' was locked here.
[Jan 10 16:00:17] ERROR[4336] /usr/src/redhat/BUILD/asterisk16-1.6.0.26/include/asterisk/lock.h: chan_sip.c line 20825 (restart_monitor): Deadlock? waited 15 sec for mutex '&monlock'?
[Jan 10 16:00:17] ERROR[4336] /usr/src/redhat/BUILD/asterisk16-1.6.0.26/include/asterisk/lock.h: chan_sip.c line 20806 (do_monitor): '&monlock' was locked here.
[Jan 10 16:00:17] NOTICE[3131] chan_sip.c: Peer '2133' is now UNREACHABLE! Last qualify: 10
[Jan 10 16:00:17] NOTICE[3131] chan_sip.c: -- Registration for '8959264693@ippbx.net2phone.com' timed out, trying again (Attempt #1)
[Jan 10 16:00:17] DEBUG[3112] pbx.c: FONALITY: This thread has already held the conlock, skip locking
[Jan 10 16:00:17] DEBUG[3112] pbx.c: FONALITY: This thread has already held the conlock, skip locking
[Jan 10 16:00:17] DEBUG[3112] pbx.c: FONALITY: This thread has already held the conlock, skip locking
[Jan 10 16:00:17] DEBUG[3112] pbx.c: FONALITY: This thread has already held the conlock, skip locking
[Jan 10 16:00:17] DEBUG[3112] pbx.c: FONALITY: This thread has already held the conlock, skip locking
[Jan 10 16:00:17] DEBUG[3112] pbx.c: FONALITY: This thread has already held the conlock, skip locking
[Jan 10 16:00:17] DEBUG[3112] pbx.c: FONALITY: This thread has already held the conlock, skip locking
[Jan 10 16:00:17] DEBUG[3112] pbx.c: FONALITY: This thread has already held the conlock, skip locking
[Jan 10 16:00:17] DEBUG[3112] pbx.c: FONALITY: This thread has already held the conlock, skip locking
[Jan 10 16:00:17] DEBUG[3112] pbx.c: FONALITY: This thread has already held the conlock, skip locking

any idea guys ?
it was great then suddenly all this shit happened
?!!!



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.