How to run ASG on Ec2 spec with more than recommended C4.large?

sangsoo

Member
Hello.

EDGE Auto Scaling Test is in progress for each EC2 instance type.
1. Published in FHD CBR 2500 in OBS, and played as many times as specified in Edge without transcoding.
-I used "Load testing using WebRTC pulling".
2. The CPU usage was measured at the edge.
3. 1 stream for comparison ("streamA") was played on Edge (client2/examples/demo/streaming/player/player.html)

** result
1. In the WCS recommended C4.large, we applied the ASG increase policy based on 70% CPU usage.
At the cpu70% indicator, I was able to get the desired streaming quality and number of stream plays.

The specification of Ec2 has been increased for more mass service.
I expected that as the EC2 spec increased, the number of streams would increase proportionally.

2. Tested in the same way on Xlarge. I got a different result than I expected.
Xlarge had a delay of sample video ("streamA") at 50% of CPU. It was difficult to raise to Cpu usage 70%.

3. Tested in the same way on 2Xlarge. Likewise, the results were different than expected.
2Xlarge was difficult to stream even at 30% CPU.

As a result, for Xlarge and 2Xlarge, the ASG increase policy rule based on 70% CPU usage could not be applied.

** Question
1. What is the increase policy for running ASG in Ec2 spec other than the recommended C4.large?
2. Should ASG be configured as C4.large?
3. And which specification do you recommend, c4.large or c5.large?

Thank you.
 
Last edited:

Max

Administrator
Staff member
Good day.
1. In the WCS recommended C4.large, we applied the ASG increase policy based on 70% CPU usage.
At the cpu70% indicator, I was able to get the desired streaming quality and number of stream plays.
2. Tested in the same way on Xlarge. I got a different result than I expected.
Xlarge had a delay of sample video ("streamA") at 50% of CPU. It was difficult to raise to Cpu usage 70%.
3. Tested in the same way on 2Xlarge. Likewise, the results were different than expected.
2Xlarge was difficult to stream even at 30% CPU.
How many subscribers you emulated in every case?
Did you expand media_port_from - media_port_to range for test (the default range is enough to 200+ publishers only)?
Did you apply server tuning recommendations from this page: adjust Java heap memory to 1/2 RAM, set up ZGC, adjust maximum files opened?
Please provide server statistics page output http://server:8081/?action=stat for every test case.
1. What is the increase policy for running ASG in Ec2 spec other than the recommended C4.large?
The parameter depends on test results. You're free to choose the threshold according to test results.
2. Should ASG be configured as C4.large?
C4large is the minimum recommended configuration for production use. If this configuration works good in your test, you can setup ASG based on such configuration
3. And which specification do you recommend, c4.large or c5.large?
We recommend to test your case and choose more suitable configuration based on test results. This is the common recommendation because all the hardware of AWS instances is virtual, so only test can show the real perfomance.
Also, please read this article. If you already have a big infrastructure deeply integrates to Amazon services, you have to choose AWS, but, if not, Google Cloud solution could cost almost twice cheaper than AWS including traffic cost.
 

sangsoo

Member
Thank you for reply.

A test result table is attached. (Link will be deleted after 1 week)
- https://docs.google.com/spreadsheets/d/1HKxciysIS6nHZtnD1YVPDck-reLO_san0yLjIdlYgDE/edit?usp=sharing
- The spreadsheet cell value is the test condition and the highest CPU usage measured by aws cloudwatch. (max, 10sec)
- The colume value is the number of subscribers.

The http://server:8081/?action=stat values for all test cases could not be cleaned up.
It takes time to prepare.

I have set the following properties on all test servers.
-expanded to media_port_from=30001, media_port_to=33000.
-http://server:8081/?action=stat in response, ports_media_free=1499.

1. Adjusted Java heap memory to 1/2 RAM.
2. ZGC(or CMS GC) tuning, and max open files were not adjusted(tuned) after installation.
- This part is very suspicious.
Installation defaults are being used without adjustment. Is GC tuning necessary to solve this problem?
I've read this in the wcs document. But I didn't know what to do (what value to set).
What can I do and how? (Based on C4.2xlarge)

If the settings are operated correctly, I am glad that I can suggest service sizes for each EC2 specification.
And, thank you for the GCP article link. I will refer to it.
 
Last edited:

Max

Administrator
Staff member
1. Adjusted Java heap memory to 1/2 RAM.
2. ZGC(or CMS GC) tuning, and max open files were not adjusted(tuned) after installation.
For high load, it is recommended to use ZGC because it comsumes less CPU comparing to CMS.
Maximum open files limit should be set to 100000

You results can be explained simply:
- on t2.small and c4.large servers the bottleneck is CPU
- on c4.xlarge and c4.2xlarge servers the bottleneck is channel bandwith.

So you can set up load balancer threshold to 70% CPU load. Also, make sure that testing Edge channel bandwidth is enough for the subscribers count.
 

sangsoo

Member
Hello? Max
I have a question in the description
In the Amazon Ec2 network performace cheat sheet, the channel bandwidth of c4.2xlarge (high) is about 2gbps.
When I ran iperf on c4.large before, 567mbps was measured, so the data in the cheat sheet is reliable.
1599312539825.png

The results I tested on c4.2xlarge are ...
- https://docs.google.com/spreadsheets/d/1HKxciysIS6nHZtnD1YVPDck-reLO_san0yLjIdlYgDE/edit?usp=sharing
OBS posted at 720p cbr (2000 kbps) and pulled 200, so the bitrate used is about 400 mbps.
400mbps? 2gpbs?

A channel bandwidth of 2gbps is sufficient,
but I don't understand that channel bandwidth is the cause of the bottleneck on c4.xlarge and c4.2xlarge servers.
Please explain more...
 
Last edited:

Max

Administrator
Staff member
but I don't understand that channel bandwidth is the cause of the bottleneck on c4.xlarge and c4.2xlarge servers.
Please explain more...
Please set up bandwidth control on edge server you test
Code:
outbound_video_rate_stat_send_interval=1
and play the control stream using Media Devices example (Play section) as described here. You'll see effective bandwidth quality for playing stream in client browser.
For example, the test occupied 400 Mbps at server side, and 100 Mbps is channel bandwidth between playing browser instance and edge server you test. In this case, playback quality will drop while server resources will still be enough.
 

sangsoo

Member
Thanks Max.
Sorry, but it's different from my question.
I am calculating how many of the published streams can be played on an Ec2 instance (C4.2xlarge).
We have to decide how many media servers we will use for our service.

The publish stream (c4.2xlarge) published by OBS was pulled from another 2 EC2 servers.
- Load testing using WebRTC pulling

The important thing is My mac book chrome.
I played one stream separately in advance because it was difficult to know the server status only by the cpu usage of the publish server.
If the pre-played stream is delayed during the "pull stream" test or the screen quality deteriorates, it was determined that there was a problem with the server.
When the number of subscribers exceeds 200, this starts to appear. At this time, the CPU(aws cloudwatch) of the publish server is at 30% level.
Of course, in this situation, the quality with my "outbound_video_rate_stat_send_interval=1" applied is "perfect".
1599466150696.png

And an additional question.
The open files values of ulimit -a measured in c4.2xlarge were soft(1024) and hard(4096).
Even if wcs changed the start value from 20000 to 100000, the test result was similar.(ㅠㅠ)
1599466828207.png
I keep looking for a solution. Help.
 
Last edited:

Max

Administrator
Staff member
Please update to build 5.2.760, it contains the fix to reduce CPU load average while publishing RTMP streams to the server.
If result remains the same, please provide SSH access to the server you test using this link, we will check.
 

sangsoo

Member
Thanks Max.
I tested it on version 760. There was no difference in test results.
Please tell me your public ip. I will register for aws ssh access.
I will provide access information to the mariotrans server.

There was one misunderstanding. Corrected.
In the previous test, 1 subscribe server(martiotrans) was c5.2xlarge, not c4.2xlarge.

I changed the test server role a bit for ssh access.
And, I adjusted the number of subscribers to 100 >150.

*** What i want to know :
I am looking for a way to reach 50~70% of cpu usage by increasing the number of pulling streams in c4.2xlarge or c5.2xlarge in my test environment.
1599530838606.png
 

Max

Administrator
Staff member
We checked the server.
It seems like you used CMS garbage collector during the test, Full GC every 3-4 seconds make Java VM really slow.
Please install JDK 12 and set up ZGC garbage collector as described here. Hugepages setup (p. 6) can be skipped to simplify, in this case you can omit corresponding keys in ZGC configuration:
Code:
-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xms8g -Xmx8g
Then repeat the test.
 

sangsoo

Member
Hello Max.
Thank you for check.

After installing openjdk12, I configured ZGC.
-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xms8g -Xmx8g
gc-core-~.log is also output in zgc log format

But unfortunately, the results are not much different from before(default gc).
Hugepages setup (p.6) skipped.
it's not easy. so difficult.
What is the problem?

Here is aws cloudwatch cpu usage graph. (cpu max 36%)
-Test pulling count : 300 = 150 + 150

1599551186944.png
 

Max

Administrator
Staff member
We repeated your test using our AWS EC2 instance as pulling server. We pulled 400 streams. There is a massive packet losses while playing control stream (21%)
1599562008403.png
This is the channel issue. Your screenshot above shows 2% packet losses that is also too big.
Please check channel bandwidth between your Mac and martiotrans server using iperf as decribed here. Also please open TCP ports from this range
Code:
media_port_from        =30001
media_port_to          =33000
and switch to WebRTC over TCP stream playback for control stream
1599562507663.png
But, if this is server channel issue, it does not help.
In this case, we recommend to choose instances with guaranteed channel bandwidth (10 Gb for example) for big subscribers load.
 

sangsoo

Member
Thank you so much for the help.
That's sad news.

I currently manage 27 WCS EC2 instances (18 in AWS Seoul, 9 in Tokyo).
Most of them were installed with aws market wcs.
Currently, the EC2 instance is running or stopped as needed.

Origin/Edge is configured with AWS AutoScaling, so there are more real servers.
EC2 specs range from c4.large to c5.2xlarge.
More than half of these are c5.2xlarge.

This is what I first wondered about.
1. Can I configure ASG with c5.2xlarge like c4.large?
2. If yes, what are the indicators (CPU indicators) for stable ASG operation?

Your answer is "Yes."
However, I had a hard time finding the condition of "Yes",
I expanded the media port range, configured zgc, adjusted open files, and played in TCP mode.
I tested by changing several servers in seoul and tokyo, but the result (cpu 35%) was the same.
I did a lot of tests.

AWS C5.2xlarge provides a channel bandwidth of 2.5gbps~10Gpbs.
When pulling 400, it is 2mbps*400=800mbps, so it is enough for C5.2xlarge.
iperf3 test result (mariotrans=>stfp8) tcp 1gbps, udp 2gpb measured.

1. iperf3 -c stfp8 -p 5201 -t 60
1599633360804.png
2. iperf3 -c stfp8 -p 5201 -P 5 -u -b 400M -t 60
1599633327456.png
3. This is an about 8gbyte(=1.16gbps*60sec/8bit) traffic (AWS Cloudwatch) that occurred during testing.
1599632535899.png
So I carefully expect that it will not be a problem with the channel bandwidth of the server and browser.
The same is true for browser.

If it is a similar (c5.2xlarge) environment, it will be similar to my results.
It's an unreasonable request, but please check again on the flashphoner demo server.

please.
Thank you.
 
Last edited:

Max

Administrator
Staff member
This is what I first wondered about.
1. Can I configure ASG with c5.2xlarge like c4.large?
2. If yes, what are the indicators (CPU indicators) for stable ASG operation?
1. Yes
2. The same indicators as like as for c4.large servers
The problem in your test is not in server-to-server channel, the problem is in server-to-Mac channel. Have you tested channel with iperf3 from mariotrans server to your Mac?
For example, in our test with your server, we have 10 times more packet losses comparing to your test because we played control stream on PC in Europe. And this affects the results.
You can load any server configuration up to 70% CPU, but more subscribers (with not so good "last mile" channels) can connect to one server. So you have to resolve end user channel issues: use lower resolution, bitrate, transcoding.
In fact, your test results show the following: more c4.large Edge servers in autoscaling group may be more preferable than less c5.2xlarge, because subscribers are distributed more evenly across Edge servers in this case.
 

sangsoo

Member
I got it, you are right.
There was almost no packet loss as a result of iperf3 test between servers,
In my pc (today windows10), packet loss results were varied with the iperf3 -b option. I didn't expect it at all.

By setting the bitrate in the media devices sample, I was able to access the desired result. The cpu also increased by more than 60%, and the number of pulling increased.
I understand the bitrate setting that reduces packet loss in the browser.

However, even if the number of webrtc pullings was increased, the total amount of network traffic outgoing through WCS in c5.2xlarge was limited to a maximum of 3.8 to 4.0 gbyte/min (450 to 500 mbps). We plan to use this value as an indicator.

It's all thanks to you. Thank you.
 
Last edited:
Top