Stun Receiver Buffer Exhausted

AlanM

Member
Recently we ran into an issue with our CDN 2.0 setup.
Sometimes when starting the stream, the server will begin dumping:
Code:
02:59:59,881 ERROR   StunDatagramSocket - Stun receiver udp/3xxxx Incoming buffer exhausted!
After this incoming streams will not broadcast.

We are doing further investigation to try to find a reliable way to reproduce. We have observed it twice in the past week. Is this a known issue with a resolution? We will post more details as we continue our research.
 

Max

Administrator
Staff member
Hello.
One customer reported a similar issue, but we can't reproduce the problem in its case. If you find a reliable way to reproduce, it will help us to find a resolution.
And, what WCS version you use now? Is this problem occured on latest version 5.1.3578?
 

Max

Administrator
Staff member
And it would be great if you reproduce it with client debug logs enabled.
 

AlanM

Member
We are currently on 5.1.3571
Looking into our logs, we observed it happen 21 different times in the past week.
 

Max

Administrator
Staff member
The message itself means that server does not pull incoming STUN packets from buffer for some reason. By default, STUN socket buffer size is 100:
Code:
stun_socket_buffer_size=100
This is small enough to buffer exhausted message appear sometimes, but, if this message spams log, and all incoming streams are not broadcasting any more, it means server stops handling all incoming packets for some reason, such as deadlock.
If you find a way to reproduce that (with debug logs enabled), we will find a reason.
 

SergeyP

Member
I had the same problem in 5.1.3529 and now I'm having it in 5.1.3575, but much worse.
We don't use CDN at all so I assume it's not CDN related issue.
When this happens the following line (or similar) appears in the logs for hours dozens times a second:
Code:
13:00:03,423 ERROR   StunDatagramSocket - Stun receiver udp/62694 Incoming buffer exhausted!
While this happening the WCS itself doesn't do much - no streams are published or played.

While the investigation is in progress (I hope) could you just add a fail-safe mechanism to neutralize a faulty connection when this happens?
Also, is there a configuration I can use to minimize the chance of getting this problem?
 

Max

Administrator
Staff member
Also, is there a configuration I can use to minimize the chance of getting this problem?
To locate a problem, we need a description of reliable conditions to reproduce it, and/or client debug logs when problem occurs. Thread dump shows what (probably) happens, but not why.
 

Max

Administrator
Staff member
Hello. Alan, Sergey.
We find a possible deadlock on REST hooks sending to the custom backend server under high load. To prevent it, in latest version 5.1.3592 we added the parameter to control maximum backend server response delay duration, default to 15 seconds:
Code:
rest_request_timeout=15
Setting this parameter to less value, 1 second for example, may help to prevent fast thread pool filling (that can lead to deadlock) when many clients connect to WCS simultaneously. Try it, please.
 

SergeyP

Member
If I'll set rest_request_timeout to 1 second, what will happen if backend won't answer in 1 second? Will publish and play requests succeed or fail in that case?

On another note, would it be better to have a possibility to disable calling default backend servers entirely? Seems like waste of time calling http://localhost:8081/apps/EchoApp at all.
 

Max

Administrator
Staff member
If I'll set rest_request_timeout to 1 second, what will happen if backend won't answer in 1 second?
An error will raise, and this error will be handled as REST hooks configured. By default, connection will be closed, and publish or play request will fail.
would it be better to have a possibility to disable calling default backend servers entirely?
If you use default backend only, all REST hooks can be disabled with option
Code:
disable_rest_requests=true
 

AlanM

Member
We had this stun/buffer issue again today. Running WCS.version 5.1.3592-562aac10c6e8144f491da9a9393ff5b0f9be532c
This was in our origin/edge setup. Restarting the origin server resolved the issue, but we were unable to start new broadcasts until a restart.
 

Max

Administrator
Staff member
Hello.
In 5.1.3600 we added parameter to adjust maximum number of simultaneous REST connections (200 by default). Try to reduce this value, for example
Code:
rest_max_connections=20
It may help to escape WCS REST client thread pool exhausting and, therefore, to prevent deadlocks.
 
Top