File descriptor count increasing

Michael

New Member
We are seeing a huge amount of file descriptors opened by the webcallserver process culminating in service failure after a few days up. We have high limits of ~1,000,000 but we noticed the webcallserver process dies when it reaches around 160,000 opened fd, which takes approximately 3-4 days. It looks like the fd are not being properly closed.

Do you have any idea what might be causing such gradually increase on opened FD?
 

Max

Administrator
Staff member
Combined with this post, server load seems too high for its configuration. Ports, audio resamplers, file descriptors etc can leak if resources associated with a publisher/subscriber are not released in time.
So we recommend you to perform server load testing as described here and check how much publishers/subscribers can be connected simultaneously.
 

Michael

New Member
We have conducted a few more tests and followed some of your suggestions by email to improve HLS playback time and failures to some users, which might be related to this problem: https://forum.flashphoner.com/threads/hls-https-pool-30-thread-427-blocked.13288/

We are noticing that memory and FD is leaking as FD count and memory consumption grows indefinitely and can only be freed after a service restart. We have conducted tests with 1 broadcaster, 10 broadcasters, 20 broadcasters and 50 broadcasters, with no more than 3 subscribers per broadcasters and in all cases we noticed that FD count and memory grows indefinitely.

Server specification:

2 x 2.3GHz Intel Xeon-Skylake (6140-G) - 72 cores
64GB RAM
Debian Debian 9.2.0-64 Minimal
openjdk version "1.8.0_272"
OpenJDK Runtime Environment (build 1.8.0_272-8u272-b10-0+deb9u1-b10)
OpenJDK 64-Bit Server VM (build 25.272-b10, mixed mode)

wcs-core.properties:

-Xmx32G
-Xms32G
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70
-XX:+ExplicitGCInvokesConcurrent
-Dsun.rmi.dgc.client.gcInterval=36000000000
-Dsun.rmi.dgc.server.gcInterval=3600000000

Broadcaster: RTC via Browser
Subscribers: 66% RTC / 33% HLS

Could it be a garbage collector issue? What would be the appropriated configuration to use in this scenario?

Any other tip?
 

Max

Administrator
Staff member
Please update JDK to 12 or 14 and set up ZGC as described here (the example describes 24 Gb memory heap, you should adopt this to 32 Gb).
Also please check CPU load while testing. Read this article for details what other metrics are important and how to monitor them.
Please also read this article about server performance testing.
 
Top