WCS often restarts

mbedial

Member
Hi, after solving a prometheus metrics issue and periodic restarts (https://forum.flashphoner.com/threads/prometheus-sip-metrics-details.14029/) , we still see that every 2-3 days we have a downtime of the service (see the jpeg graphs).
We have been investigating the issue and obtain some logs, but we are not sure about whats's going on... I can attach some logs but since you have access to our machine maybe better you can have a look.
We don't see anything in the machine syslog.
In the Flashphoner logs we see:

12:28:53,760 ERROR EventScanner - EventScannerThread-27 We cannot let this thread die under any circumstances. Protect ourselves by logging errors to the console but continue.
java.lang.NullPointerException
at com.flashphoner.sip.D.processTimeout(Unknown Source)
at gov.nist.javax.sip.EventScanner.deliverEvent(EventScanner.java:426)
at gov.nist.javax.sip.EventScanner.run(EventScanner.java:569)
at java.lang.Thread.run(Thread.java:745)
12:29:00,485 WARN SipUserAgentListener - EventScannerThread-27 REGISTER timeout
12:29:03,758 WARN SipUserAgentListener - EventScannerThread-27 REGISTER timeout

and

grep "REGISTER timeout" flashphoner.log.2022-01-26-12
12:28:47,003 WARN SipUserAgentListener - EventScannerThread-27 REGISTER timeout
12:28:50,898 WARN SipUserAgentListener - EventScannerThread-27 REGISTER timeout
12:28:53,760 WARN SipUserAgentListener - EventScannerThread-27 REGISTER timeout
12:29:00,485 WARN SipUserAgentListener - EventScannerThread-27 REGISTER timeout
12:29:03,758 WARN SipUserAgentListener - EventScannerThread-27 REGISTER timeout
12:29:03,764 WARN SipUserAgentListener - EventScannerThread-27 REGISTER timeout
12:29:04,934 WARN SipUserAgentListener - EventScannerThread-27 REGISTER timeout
12:29:09,169 WARN SipUserAgentListener - EventScannerThread-27 REGISTER timeout
12:29:09,378 WARN SipUserAgentListener - EventScannerThread-27 REGISTER timeout

These logs show this:

flashphoner.log.2022-01-26-12 flashphoner.log.2022-01-26-20

Thankls in advance
 

Attachments

mbedial

Member
Additional info:

One of our wcs didn't accept users for 8 hours. This is no the first time we see this behaviour. Flashphoner java's process is still running since 2 days ago and was not restarted:

root 26997 5.5 2.1 3019316 716216 ? Sl ene25 160:08 java -Xmx1024M -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=50999 -Dcom.sun.management.jmxremote.host=localhost -Djava.rmi.server.hostname=localhost -XX:ErrorFile=/usr/local/FlashphonerWebCallServer/logs/error%p.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:/usr/local/FlashphonerWebCallServer/logs/gc-core-2022-01-25_09-34.log -XX:+ExplicitGCInvokesConcurrent -Dsun.rmi.dgc.client.gcInterval=36000000000 -Dsun.rmi.dgc.server.gcInterval=36000000000 -Dcom.flashphoner.fms.AppHome=/usr/local/FlashphonerWebCallServer -Djava.library.path=/usr/local/FlashphonerWebCallServer/lib/so:/usr/local/FlashphonerWebCallServer/lib -DWCS_NON_ROOT=false -DsessionDebugEnabled=false -Djdk.tls.client.protocols="TLSv1,TLSv1.1,TLSv1.2" -cp /usr/local/FlashphonerWebCallServer/lib/* com.flashphoner.server.Server

8 hours later without any action from us, service started to work again. We attach 12pm logs and flashphoner.log.2022-01-26-20 that is when service was restored.

In 12:28:47 it starts to print "REGISTER timeout"
12:28:47,003 WARN SipUserAgentListener - EventScannerThread-27 REGISTER timeout

and LAST "REGISTER timeout" was at 20:28:06

20:28:06,545 WARN SipUserAgentListener - EventScannerThread-27 REGISTER timeout
20:28:06,782 WARN SipUserAgentListener - EventScannerThread-27 REGISTER timeout

There is nothing relevant in syslog at this time.
 

Max

Administrator
Staff member
Hello

This is the set of recommendations:

1. Update WCS server to the latest available version.

Current version: 5.2.597
Latest: 5.2.1127

There were a lot of SIP fixes which can cover such behavior.

DO BACKUP BEFORE UPDATE

2. Update Java version to Java 14.

3. Setup memory for Java Machine

-Xmx16g -Xms16g


4. Setup ZGC


5. Enable additional logs

File: log4j.properties

log4j.logger._com.flashphoner.sip.SipUserAgentListener=DEBUG

6. Provide logfile /var/log/messages

You can either update WCS and do other changes then or start with all other changes without WCS update.
 
Top