Phones are down every morning. Any Ideas?

carrothospital
Posts: 37
Member Since:
2009-01-28

Recently, when we come in in the morning, all the phones are down. No incoming/outgoing calls work, only internal. The only way to get things running again is to run setup-rhino.

I've looked through all the logs and haven't found anything. The asterisk console shows that it doesn't detect anything when a call is coming in.
Has anyone else had this problem?

I have a ticket open with Rhino, but they need access to the system when it is down. Unfortunately it always happens in the morning and since they are 2 hours behind us, we can't wait until 10 or after to get our phones back up. Maybe if we're "lucky" it will go down during the day.

System: Trixbox 2.6.2.3
Card: Rhino R8FXX-EC modular card (Firmware 2.1) with 4 FXO modules.

If you have any tips, please let me know!



cvander
Posts: 637
Member Since:
2006-06-26
logs logs logs!

you should probably run through all the logs on the box for the 24 hour period (or less) between known good working phones, and "all phones are down" condition. obviously something bad is happening with your system, and it should show up in the logs...see if you can find anything suspicious and post the (scrubbed) logs here... that will give us a good place to start.

-Chris



carrothospital
Posts: 37
Member Since:
2009-01-28
Well, I've been through the

Well, I've been through the logs and can't find a thing! I went through all the logs in /var/logs and /var/logs/asterisk and can't find anything that looks wrong. I looked through the full log in /var/log/asterisk/ and went through each line from one of the nights where it went down. Besides the normal Verbose garbage in there, the only thing I can find are a few instances where it says:


[May 7 04:44:08] VERBOSE[2151] logger.c: -- Remote UNIX connection
[May 7 04:44:09] VERBOSE[6634] logger.c: -- Remote UNIX connection disconnected

But it also did this last night at the exact same time, except it didn't crash last night. I thought that I remembered someone mentioning an issue with Comcast Business Voice (which is what we're using), where trixbox and comcast's equipment gets out of sync. Like Comcast resets their equipment, or just resyncs, but trixbox doesn't know it and it screws things up. Like I said, I *thought* I had read that here before, but I can't find it when searching, so I may be wrong.

Are there any other logs I should be looking at besides the ones in /var/log?

Below is a dmesg, just for reference.

dmesg | grep rcbf

rcbfx 1: Rhino PCI BAR0 efffe000 IOMem mapped at f8a1e000
rcbfx 1: Waiting for response from card .........
rcbfx 1: Firmware Version 2.1
rcbfx 1: Firmware File Version is 2.0
rcbfx 1: Hardware version 11
rcbfx 1: G168 07 08 DSP Loader file size = 170 App file size = 48414
rcbfx 1: G168 DSP Ping DSP Version 106
rcbfx 1: G168 DSP Active and Servicing 8 Channels - ff
rcbfx 1: Starting DMA
rcbfx 1: Spotted a Rhino: Rhino RCB8FXX (4 modules)
rcbfx 1: Released a Rhino
rcbfx 1: Rhino PCI BAR0 efffe000 IOMem mapped at f8a20000
rcbfx 1: Waiting for response from card .........
rcbfx 1: Firmware Version 2.1
rcbfx 1: Firmware File Version is 2.0
rcbfx 1: Hardware version 11
rcbfx 1: G168 07 08 DSP Loader file size = 170 App file size = 48414
rcbfx 1: G168 DSP Ping DSP Version 106
rcbfx 1: G168 DSP Active and Servicing 8 Channels - ff
rcbfx 1: Starting DMA
rcbfx 1: Spotted a Rhino: Rhino RCB8FXX (4 modules)
rcbfx 1: Released a Rhino
rcbfx 1: Rhino PCI BAR0 efffe000 IOMem mapped at f8a22000
rcbfx 1: Waiting for response from card .........
rcbfx 1: Firmware Version 2.1
rcbfx 1: Firmware File Version is 2.0
rcbfx 1: Hardware version 11
rcbfx 1: G168 07 08 DSP Loader file size = 170 App file size = 48414
rcbfx 1: G168 DSP Ping DSP Version 106
rcbfx 1: G168 DSP Active and Servicing 8 Channels - ff
rcbfx 1: Starting DMA
rcbfx 1: Spotted a Rhino: Rhino RCB8FXX (4 modules)



cvander
Posts: 637
Member Since:
2006-06-26
We have really similar setups...

I also am using Comcast Digital Voice in my office and use a Rhino Card to interface. I'm not sure where to tell you to start looking. It certainly seems like you have gone through the logs... perhaps turning the verbosity up to 10 or so for one night and review more thouroughly.... just for reference... here's my dmesg output:

rcbfx 1: Rhino PCI BAR0 fe9ff000 IOMem mapped at f893e000
rcbfx 1: Waiting for response from card ......... 
rcbfx 1: Firmware Version 1.f
rcbfx 1: Firmware File Version is 1.f
rcbfx 1: Hardware version 10
rcbfx 1: G168 07 08 DSP Loader file size = 170 App file size = 48414
rcbfx 1: G168 DSP Ping DSP Version 106
rcbfx 1: G168 DSP Active and Servicing 6 Channels - 3f
rcbfx 1: Starting DMA
rcbfx 1: Spotted a Rhino: Rhino RCB8FXX (3 modules)

There are some differences... notably the firmware version, but I'm not too sure that matters at this point... Are you running more than one card in the system?

I've never had an issue with the Comcast Digital Voice getting "out of sync" with the rhino card... so I doubt that's the issue... but who the heck knows at this point?

-Chris



carrothospital
Posts: 37
Member Since:
2009-01-28
Thanks for the response.

Thanks for the response. Currently I'm only running one card in the system. It's an RMA one I got from Rhino a few months back. I think it's got the newest firmware, so hopefully it's not a bug with that, right?

It went down again this morning and I tried to do a little testing before I brought it back up, although I still haven't gone through the logs.

I did a few basic commands like ztcfg -v, cat /proc/zaptel/*, zap show channels, zap show status, etc, and everything looked the exact same as it does when it's working.

Then I fired up zttool and had a look. It looks like it detects an incoming call because the first bit on RxB and RxD turn from 1 to 0 when I dial in. Here's a crappy example.

No Activity:

1 2 3 4 5 6 7 8
TxA 0 0 0 0 0 0 0 0
TxB 1 1 1 1 1 1 1 1
TxC 0 0 0 0 0 0 0 0
TxD 1 1 1 1 1 1 1 1

RxA 0 0 0 0 1 1 1 1
RxB 1 1 1 1 1 1 1 1
RxC 0 0 0 0 1 1 1 1
RxD 1 1 1 1 1 1 1 1

Incoming call:

1 2 3 4 5 6 7 8
TxA 0 0 0 0 0 0 0 0
TxB 1 1 1 1 1 1 1 1
TxC 0 0 0 0 0 0 0 0
TxD 1 1 1 1 1 1 1 1

RxA 0 0 0 0 1 1 1 1
RxB 0 1 1 1 1 1 1 1
RxC 0 0 0 0 1 1 1 1
RxD 0 1 1 1 1 1 1 1

I guess that just means it's detecting the call and doesn't mean too much. I was just happy that it was actually detecting something!
I also tried ztmonitor 1 -vv and called in, but it didn't show a thing.

I'm at a loss as this point. I guess I'll go through the logs again and see if I notice anything different. Thanks for your help and feel free to shoot any suggestions my way.



anishp55
Posts: 2
Member Since:
2010-05-13
I had the same problem, but

I had the same problem, but it was with a digium card and analog peel off's from my ChoiceOne PRI. The only solution i had was to have the system reboot itself every night.



carrothospital
Posts: 37
Member Since:
2009-01-28
I was afraid that might be

I was afraid that might be the only solution...Drats. Thanks for the input though. I'm still hoping that it crashes sometime during the day so I can get Rhino support on the phone. This week it has happened every other day, as opposed to every day like that week. Improvement? Who knows.
I'll post back with updates when I have them.



carrothospital
Posts: 37
Member Since:
2009-01-28
Well, Rhino says that the

Well, Rhino says that the output from zttool verifies that the card, the RCBFX driver, and the zaptel driver are all working and that the problem must be with zap_chan or asterisk.

I guess it's good that the problem has narrowed down a little bit, but I'm not sure where to go from here. Fortunately it stayed up again today, which is a welcomed change of pace. I'll keep an eye on it and see how it is over the weekend. If you have any advice, let me know. Otherwise, my best option is probably restarting it every night.



carrothospital
Posts: 37
Member Since:
2009-01-28
Arg! After a few weeks of

Arg! After a few weeks of the scheduled reboot working ok, the past two days it has been doing it again. Somewhere between 5:30am(when the reboot happens) and 7. I guess I'll have to scour the logs, but I can't believe it's hanging after only 1.5 hrs uptime. Anyone else have any suggestions? I was thinking of recompiling zaptel. I see the drive loaded in lsmod when it's down, but maybe something is screwy? I fixed the problem by running setup-rhino, so I'll have to find out everything that script does and hopefully narrow it down to one step that's fixing it.

Any help would be appreciated.



obeliks
Posts: 877
Member Since:
2010-03-14
Why are you using Rhino with

Why are you using Rhino with Comcast ? Just move your numbers to a voip provider. It will be cheaper too ;-)



carrothospital
Posts: 37
Member Since:
2009-01-28
Well, unfortunately this is

Well, unfortunately this is a customer's phone system, so I just have to deal with what they have and try to make it work!

I'm thinking of installing DAHDI drivers, but am not sure of how to set it up with Rhino cards.



carrothospital
Posts: 37
Member Since:
2009-01-28
So it looks like running

So it looks like running /etc/init.d/zaptel restart fixes the issue. I guess it just needs to reload the zaptel modules. Why would it need to do that? I checked lsmod and all the appropriate modules were loaded when the system was down.
Arrrgggggg.



carsys
Posts: 49
Member Since:
2007-03-05
SO it is solved? you fixed it? one customer hired me a long time

ago to solve something similar to what you are describing. The issue was the disconnection with the voip line needed to be resurrected from time to time while rebooting the box coud be impractical it will solve the problem. But really the issue can be solved by automation and checking the sip channels. Imagine going to the customer site or receiving a call that their lines are down then you reboot and everything goes back to live is not something that you want to keep doing.

For this business I wrote a script that monitors the connection if it is alive or not. I created a cron that checks every 60 seconds pinging back and forth the sip trunks and the /etc/hosts each of the hosts to see if they were alive if for some reason I did not get a response from the voip line the script will reload trunks / sip registrations
it reloads the SIP configuration forcing a re-registration of your trunk peers. That was my solution.

if [ $ACTIVATED = "yes" ]; then
sip show peers
reload chan_sip.so

Greetings!
Christian Romero
FTOCC 2007
http://www.lawise.com

--

Christian Romero
FTOCC
http://www.lawise.com



carrothospital
Posts: 37
Member Since:
2009-01-28
Hey, Carsys, thanks for the

Hey, Carsys, thanks for the advice. I never actually got this fixed. My temporary patch was to have the phone system reboot every morning at 5:30. Even then, they've still had a few mornings where the phones were down again and it needed a reboot. I'll keep what you said in mind and use it if this comes up again. Right now the customer doesn't want to pay me to do maintenance and things like that, so I had to give them the ability to reboot their own system. Since they know how to remedy the situation that way, they don't want to spend the money for me to figure out a permanent fix.

Oh well....at least I'll have your fix to help me out next time this happens. I'm sure it will come in handy down the road, or if the client ever decides they want a permanent resolution.



SkykingOH
Posts: 9541
Member Since:
2007-12-17
This script is not a fix.

This script is not a fix. Nor do competent professionals leave a customers system in a condition that it requires a daily reboot.

They should not pay you for this "maintenance" and actually should not pay your for the first work until the system is in an acceptable condition. Did you actually have the customer sign off on this.

When we (and I am speaking collectively as the supporters of Open Source Telephony) do shoddy work we continue to feed the perception that OST is not ready for prime time.

--

Scott

aka "Skyking"



carrothospital
Posts: 37
Member Since:
2009-01-28
Well, I agree with you on

Well, I agree with you on some of this, but there are some things that should be addressed.

The system itself was in place for a few months working just fine before this issue started, so it was in an acceptable condition for a while. This isn't one of those initial setup issues, it's a random one that came well after deployment.

I don't do freelance work, I work for a company, so it's over my head as far as what they need to pay for and what we need to eat. The problem is that the customer not only does not want to pay me to figure out why it's not working...they don't want the downtime either. As you know, it's hard to diagnose something that is "broken" when it's currently working. When the phones are down, they want them back up asap without waiting more than 15 minutes for me to figure out what's going on. That and Rhino support doesn't open until 10 my time.

And lastly...I don't claim to be a completely competent professional with Trixbox/Asterisk. I've set it up before and done my share of tinkering, but it's not my area of expertise. I wasn't keen on setting it up for them in the first place, but it's what they wanted.



SkykingOH
Posts: 9541
Member Since:
2007-12-17
Quote: I wasn't keen on
Quote:
I wasn't keen on setting it up for them in the first place, but it's what they wanted.

You really nailed the point. Not only did you take on responsibility for a system you have no control over it's not even your primary area of expertise.

Also the customer is using an Internet based voice provider. Something most integrators do not recommend for primary phone service.

The sad thing is that the reputation of the Asterisk based solution takes the bad rap when it is a design issue.

--

Scott

aka "Skyking"



carrothospital
Posts: 37
Member Since:
2009-01-28
Well, my customer is using

Well, my customer is using Comcast Business Voice, not a sip provider. So that doesn't count as an Internet voice provider, right?

Unfortunately, I didn't have an option to refuse the responsibility. When the boss tells you he's promised a customer a phone system, then you've gotta do it!

On the other hand, I've set up and maintained our system here in the office, as well as other clients with no major issues. Of course the only time something goes wrong is when it's for a client that's slowly inching towards open source. Go figure!



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.