Java CPU Process stuck at 100%

Taylor

Member
Hello Flashphoner

We've been experience an odd issue with the WebCall Server where, randomly, the main Java process gets stuck at 100% CPU.

We are unable to re-create this issue as it has only occurred on our Production Environment, has only happened 3 times in the past month, but occurs at random points without much explaination (this latest occurrance happening at 17:30 UTC when there is no user activity).

Given that it's on Production, and we cannot re-create it on a test server, we are fairly limited on the access and details that can be provided, but for now this is what we can share:



When: The latest issue occurred on 2025-07-24 between 17:30 UTC and 17:40 UTC:
This is when the CPU metrics jumped from 5% to 50% (which appears to coincide with the Java process running at 100% CPU)
Screenshot 2025-07-31 110639.png


Server Logs: According to the Logs, there's not too much to go on we believe but this log stood out:
Code:
17:35:51,434 INFO  bstractNioWorkerPool - SSL-WS-BOSS-pool-18-thread-1
Workers size 4
Cemetery size 1
index:id:state:dead_for_ms
0:SSL-WS-pool-19-thread-1405:RUNNABLE:14507

17:35:51,465 INFO                Dumper - Thread-89086 Jstack execution..
17:35:52,154 INFO                Dumper - Thread-89086 Jstack is done with exit code 0. Location:/usr/local/FlashphonerWebCallServer/logs/CemeteryDump.jstack
17:35:52,154 INFO                Dumper - Thread-89086 Destroying process Process[pid=2928107, exitValue=0]
It doesn't seem like much but it's the only log that's unique in the 7 days leading up to the error and occurred around that same time.
We are also getting these ERROR logs in the server, but we don't believe it's related due to that they occur frequently both before and after the CPU spike issue:
Code:
java.io.IOException: Broken pipe
        at java.base/sun.nio.ch.SocketDispatcher.write0(Native Method)
        at java.base/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:62)
        at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:137)
        ...
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)
Client Logs: For these logs there was no activity around the time the CPU spiked:
  • Last Log was on: 2025-07-24 15:36 UTC
  • Next Log was on: 2025-07-24 19:36 UTC

AWS Properties:
We are using t3a.medium instance to run the media server on.

WCS Core Properties: These are the current running changes we make to the WCS Core Properties (everything else is based on settings in AMI 5.2.2105):
-Xms2g -Xmx2g
-XX:NewSize=256m

Further Info: We also don't believe we ever got the CPU issue on the past versions of the Media Server:
  • WCS 5.2.2071 - Between start of October 2024 and start of June 2025 (this was on an old Flashphoner AMI, so there have been a few changes since)
  • WCS 5.2.2247 - Between start of June 2025 and start of July 2025 (It was only up for about a month but there weren't any issues)
  • WCS 5.2.2269 - Between start of July 2025 to now. (As we've mentioned, the CPU issue has occurred 3 times across 6 servers).


One thing we noticed is that while we've been using a script to change the wcs-core.properties, the `-XX:+UseConcMarkSweepGC` hasn't been getting set, and actually hasn't been since we started using WCS 5.2.2071. We don't believe it's the cause of the issue, but to test it out would require us deploying to Production servers and waiting for a month at least.

We're sorry we cannot share any more right now, due to the circumstances of the issue. If you need we can compile additional log data and send them to technical support, support@flashphoner.com.

Any help in understanding and fixing this issue would be greatly appreciated.

Cheers,
Taylor
 

Max

Administrator
Staff member
Hello

Please send a report via the private form

Make sure the report contains the latest CemeteryDump.jstack files. If they are missing, please attach them manually as a separate tar archive.

CPU troubleshooting:
1. Install htop using your package manager. For RedHat/CentOS use: yum install htop. For Debian/Ubuntu use: apt install htop.
2. Launch htop by typing: htop
3. Press Shift+H to show threads.
4. Press F5 to enable tree view.
5. Look for a thread that consumes a high amount of CPU. For example, thread with PID 4988 consumes 99% CPU.
6. Capture a Java stack trace using the following command:
Code:
jstack `pidof java` > jstack.pid
7. Сonvert the thread PID to hexadecimal using: printf "0x%x\n" 4988. This will output: 0x137c
8. Search for this hex thread ID (e.g., 0x137c) inside the jstack.pid file to locate the corresponding thread's stack trace.

Please send us the following:
- Screenshot of htop with tree and threads view enabled, showing high CPU usage
- The jstack.pid file, which includes the hexadecimal Thread ID found in step 8.
 

Taylor

Member
Thank you Max.

I have sent off the data in the form. Hope it's helpful.

I forgot to note in the comment for the form data that while I was able to find the PID of Java and capture the stack trace to a file, I couldn't find the thread in the output though (both the decimal and hexadecimal variant).
 

Max

Administrator
Staff member
We checked the report. The thread serving secure websocket connections consumes 100% CPU.
Please add the configuration file thread_pools_config.json to the /usr/local/FlashphonerWebCallServer/conf folder with the following content
Code:
{
    "ws_SERVER": {
        "nio": false
    }
}
This switches websocker server from NIO to OIO approach and should reduce CPU consumption. Please restart WCS to apply changes.
 
Top