Server getting down after restart (started, but is not healthy)

Michael

Member
Hello

Recently we've updated Flashphoner from version 5.2.1140 to 5.2.1536 in our servers (Linux Debian 10)

Since there, some times service don't get up after restart. Service start, open ports to listen, but do not responds on any port.

After many tries, and some sort of time, we do restart, and works. Seems like a intermittent issue.

After some investigation, we found the server is getting down by:
Code:
ERROR start - FlashphonerWebCallServer started, but is not healthy, please try to restart

We already search at this forum for possible solutions, and the answers ask to check if server is listening in 8081 port, and if localhost is accessible on network. We can confirm these things:

Bash:
userhost:~$ sudo service webcallserver restart
userhost:~$
userhost:~$ sudo netstat -nlp | grep 8081
tcp        0      0 0.0.0.0:8081            0.0.0.0:*               LISTEN      92818/java  
userhost:~$      
userhost:~$ telnet localhost 8081
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
^]
telnet> quit
Connection closed.
userhost:~$
userhost:~$ ping localhost
PING localhost(localhost (::1)) 56 data bytes
64 bytes from localhost (::1): icmp_seq=1 ttl=64 time=0.027 ms
64 bytes from localhost (::1): icmp_seq=2 ttl=64 time=0.028 ms
64 bytes from localhost (::1): icmp_seq=3 ttl=64 time=0.031 ms
64 bytes from localhost (::1): icmp_seq=4 ttl=64 time=0.028 ms
64 bytes from localhost (::1): icmp_seq=5 ttl=64 time=0.030 ms
^C
--- localhost ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 85ms
rtt min/avg/max/mdev = 0.027/0.028/0.031/0.006 ms
userhost:~$
userhost:~$ curl --max-time 1 --insecure -s -i -w '%{http_code}' -o /dev/null http://localhost:8081/health-check
000
userhost:~$
userhost:~$
userhost:~$ curl --max-time 1 --insecure -s -i -w '%{http_code}' -o /dev/null http://localhost:8081/health-check
000
userhost:~$
userhost:~$
userhost:~$ curl --max-time 1 --insecure -s -i -w '%{http_code}' -o /dev/null http://localhost:8081/health-check -v
* Expire in 0 ms for 6 (transfer 0x5565885ddfb0)
* Expire in 1000 ms for 8 (transfer 0x5565885ddfb0)
* Expire in 1 ms for 1 (transfer 0x5565885ddfb0)
* Expire in 0 ms for 1 (transfer 0x5565885ddfb0)
* Expire in 0 ms for 1 (transfer 0x5565885ddfb0)
* Expire in 0 ms for 1 (transfer 0x5565885ddfb0)
*   Trying ::1...
* TCP_NODELAY set
* Expire in 500 ms for 3 (transfer 0x5565885ddfb0)
* Expire in 200 ms for 4 (transfer 0x5565885ddfb0)
* connect to ::1 port 8081 failed: Connection refused
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Expire in 499 ms for 3 (transfer 0x5565885ddfb0)
* Connected to localhost (127.0.0.1) port 8081 (#0)
> GET /health-check HTTP/1.1
> Host: localhost:8081
> User-Agent: curl/7.64.0
> Accept: */*
>
* Operation timed out after 1000 milliseconds with 0 bytes received
* Closing connection 0
000
userhost:~$
userhost:~$
userhost:~$ sudo service webcallserver status
● webcallserver.service - Flashphoner WebCallServer
   Loaded: loaded (/etc/systemd/system/webcallserver.service; disabled; vendor preset: enabled)
   Active: deactivating (stop-sigterm) (Result: exit-code) since Thu 2023-01-26 18:27:16 EST; 19s ago
  Process: 92202 ExecStart=/bin/bash webcallserver start (code=exited, status=1/FAILURE)
Main PID: 92202 (code=exited, status=1/FAILURE)
    Tasks: 147 (limit: 14745)
   Memory: 821.8M
   CGroup: /system.slice/webcallserver.service
           └─92818 java -Xmx26G -Xms26G -Djava.net.preferIPv4Stack=true -Djava.rmi.server.hostname=localhost -XX:ErrorFile=/usr/local/FlashphonerWebCallServer/logs/erro

Jan 26 18:27:13 host sudo[93267]: pam_unix(sudo:session): session opened for user flashphoner by (uid=0)
Jan 26 18:27:13 host sudo[93267]: pam_unix(sudo:session): session closed for user flashphoner
Jan 26 18:27:15 host sudo[93287]:     root : TTY=unknown ; PWD=/usr/local/FlashphonerWebCallServer-5.2.1536/bin ; USER=flashphoner ; COMMAND=/usr/bin/echo -e [20
Jan 26 18:27:15 host sudo[93287]: pam_unix(sudo:session): session opened for user flashphoner by (uid=0)
Jan 26 18:27:15 host sudo[93287]: pam_unix(sudo:session): session closed for user flashphoner
Jan 26 18:27:16 host bash[92202]: ERROR: FlashphonerWebCallServer started, but is not healthy, please try to restart
Jan 26 18:27:16 host sudo[93300]:     root : TTY=unknown ; PWD=/usr/local/FlashphonerWebCallServer-5.2.1536/bin ; USER=flashphoner ; COMMAND=/usr/bin/echo -e [20
Jan 26 18:27:16 host sudo[93300]: pam_unix(sudo:session): session opened for user flashphoner by (uid=0)
Jan 26 18:27:16 host sudo[93300]: pam_unix(sudo:session): session closed for user flashphoner
Jan 26 18:27:16 host systemd[1]: webcallserver.service: Main process exited, code=exited, status=1/FAILURE

userhost:~$
userhost:~$ cat /usr/local/FlashphonerWebCallServer/logs/startup.log
[2023-01-26 18:26:53] INFO checkJavaOptions - Checking JVM options
openjdk 12.0.2 2019-07-16
OpenJDK Runtime Environment (build 12.0.2+10)
OpenJDK 64-Bit Server VM (build 12.0.2+10, mixed mode)
[2023-01-26 18:26:54] INFO startAsCurrentUser - Starting FlashphonerWebCallServer-5.2.1536-60e42aeadad5da676c3b60ab972d89a27a1861e0 on debian 10
[2023-01-26 18:26:55] INFO waitForHealth - Will wait for server response at least 10 seconds
[2023-01-26 18:26:56] INFO isServerHealthy - Server health check response code:
[2023-01-26 18:26:58] INFO isServerHealthy - Server health check response code:
[2023-01-26 18:27:01] INFO isServerHealthy - Server health check response code:
[2023-01-26 18:27:03] INFO isServerHealthy - Server health check response code:
[2023-01-26 18:27:05] INFO isServerHealthy - Server health check response code:
[2023-01-26 18:27:07] INFO isServerHealthy - Server health check response code:
[2023-01-26 18:27:09] INFO isServerHealthy - Server health check response code:
[2023-01-26 18:27:11] INFO isServerHealthy - Server health check response code:
[2023-01-26 18:27:13] INFO isServerHealthy - Server health check response code:
[2023-01-26 18:27:15] INFO isServerHealthy - Server health check response code:
[2023-01-26 18:27:16] ERROR start - FlashphonerWebCallServer started, but is not healthy, please try to restart
We tried to run set permissions script, and worked for some server instances, but not to some. So, may be is not the root cause.

Any idea what could be?

Our License start with DB9E7D76-F2C1-40CB-XXXX-XXXXXXXXXXXX

The logs/server_logs/flashphoner.log is attached (confidential info removed).

Regards
 

Attachments

Last edited:

Max

Administrator
Staff member
Good day.
First, please update WCS to build 5.2.1537 which fixes some issues with starting WCS as a service. If this does not help, please provide SSH access to the server using this form.
 

Michael

Member
Hello

Thanks for your answer.

We are trying to update servers, but found not available message:

Bash:
sudo /usr/local/FlashphonerWebCallServer/bin/webcallserver update
>>> New version available: 5.2.1537
>>> Your version: 5.2.1536
>>> Version 5.2.1537 is not available
There are something we can do?

Regards
 

Michael

Member
Hello

After many tries, server has updated by command:
Bash:
sudo /usr/local/FlashphonerWebCallServer/bin/webcallserver update
>>> New version available: 5.2.1537
>>> Your version: 5.2.1536
>>> Version 5.2.1537 is available, try to update
>>> Updating FlashphonerWebCallServer
>>> Downloading 5.2.1537 build
>>> FlashphonerWebCallServer updated to 5.2.1537
But now, won't start:
Bash:
sudo systemctl start webcallserver
Job for webcallserver.service failed because the control process exited with error code.
See "systemctl status webcallserver.service" and "journalctl -xe" for details.
Bash:
systemctl status webcallserver.service
● webcallserver.service - Flashphoner WebCallServer
   Loaded: loaded (/etc/systemd/system/webcallserver.service; disabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Fri 2023-01-27 08:28:37 EST; 6s ago
  Process: 82817 ExecStart=/bin/bash webcallserver start (code=exited, status=1/FAILURE)

I've cleaned all logs and make a new try. Seems the issue persist:
Bash:
cat startup.log
[2023-01-27 08:33:28] INFO checkJavaOptions - Checking JVM options
openjdk 12.0.2 2019-07-16
OpenJDK Runtime Environment (build 12.0.2+10)
OpenJDK 64-Bit Server VM (build 12.0.2+10, mixed mode)
[2023-01-27 08:33:29] INFO startAsCurrentUser - Starting FlashphonerWebCallServer-5.2.1537-681e412d07840af52fae1351dbb056fbc0265bfd on debian 10
[2023-01-27 08:33:31] INFO waitForHealth - Will wait for server response at least 10 seconds
[2023-01-27 08:33:32] INFO isServerHealthy - Server health check response code:
[2023-01-27 08:33:34] INFO isServerHealthy - Server health check response code:
[2023-01-27 08:33:36] INFO isServerHealthy - Server health check response code:
[2023-01-27 08:33:38] INFO isServerHealthy - Server health check response code:
[2023-01-27 08:33:40] INFO isServerHealthy - Server health check response code:
[2023-01-27 08:33:42] INFO isServerHealthy - Server health check response code:
[2023-01-27 08:33:44] INFO isServerHealthy - Server health check response code:
[2023-01-27 08:33:46] INFO isServerHealthy - Server health check response code:
[2023-01-27 08:33:48] INFO isServerHealthy - Server health check response code:
[2023-01-27 08:33:50] INFO isServerHealthy - Server health check response code:
[2023-01-27 08:33:51] ERROR start - FlashphonerWebCallServer started, but is not healthy, please try to restart

May be need a completely uninstall, and reinstall after?

Regards
 

Michael

Member
Hello

We made a fresh installation:
  • Deactivate license
  • Run ./uninstall.sh
  • Downloaded version 5.2.1537
  • Install
  • License activation

After this, the service didn't stand up.

We ran the following command on fresh install:
Bash:
sudo ./webcallserver set-root-mode enable
So the instance get up apparently without issues.

We made this process in all our servers instances. For while seems it's working.

We will wait some days, until we need restart, to check if issue not comeback, and will let you know posting here the results.

If confirm without issues, may be the update from 5.2.1140 to 5.2.1536 version, need a fresh install, instead sudo ./webcallserver update

Regards
 

Max

Administrator
Staff member
sudo ./webcallserver set-root-mode enable
This should be enough to fix. No need to install from the scratch. Seems like WCS cannot bind all the ports needed correctly on your servers. That's why we requested an SSH access to check what's wrong. Anyway, in root mode there should not be any binding issues.
 

Michael

Member
Hello

As I remember, we tried everything before a fresh install, even setting root mode.

Maybe that old version install was changed something to run as sudo, since the set root mode, was introduced in newer versions than 5.2.1140, which we was using. A bash var in bin/setenv.sh maybe?
Bash:
WCS_NON_ROOT=false
I don't know what was different. But the older version have been running months with no issues to bind ports.

For while, the fresh install is working as expected, we pass the weekend with no issues to restart service when needed.

I consider this topic solved.

Thanks for your attention.

Regards
 

Michael

Member
Good. Next time please consider to provide SSH access or full server report because we cannot reproduce the issue on test server with the default settings.
Hello

Our company policies do not allow external access on their servers.

We made a full report, but we have fixed with a fresh install, before sent on your form.

We don't have the report anymore, was deleted.

Regards
 
Top