Video transcode comparison — Intel Vs AMD

Abhishek Singh
5 min readMar 6, 2022

Video transcode is certainly an integral part of any Video On Demand (VOD) service. At Mobishaala, on a daily basis, thousands of videos get uploaded. Before making these video contents available for playback at the user end, they are required to be first converted into different video formats, bitrate and resolution like 1080p, 720p or 360p etc. This process is called Transcode. Above all, it is done to provide smooth video streaming across different user devices, having varying network speeds. Based on the user’s network speed, the video player automatically switches between the different quality of video chunks.

You can find the original and upto date article at my blog site — https://techkblog.com/video-transcode-comparison-intel-vs-amd/

Last year, we overhauled the video transcoding service at the Mobishaala platform, to make it more efficient and reduce the overall operating cost at the same time. This service is hosted on AWS and was implemented in a very crude form using multiple C5.4xlarge compute instances.

To improve the transcode service, we considered following steps:

  • Firstly, redesign and optimise our existing transcode pipeline process.
  • Secondly, compare and switch to other cheaper options available on AWS instance.

So in the latter part of this article, I am presenting the video transcode comparison, conducted on different AWS’s Compute instance types like C5 (Intel) Vs C5a (AMD) series.

C5 and C5d instances feature either the 1st or 2nd generation Intel Xeon Platinum 8000 series processor (Skylake-SP or Cascade Lake) with a sustained all core Turbo CPU clock speed of up to 3.6 GHz.

While, C5a instances feature custom 2nd generation up to 3.3 GHz AMD EPYC 7002 series processors built on a 7nm process node for increased efficiency. In addition, C5a instances deliver leading x86 price-performance through a combination of high performance processing and 10% lower cost.

For all video transcode comparisons, following video file specifications were considered:

1080p video raw file, captured from video cam.

  • 1920×1080 resolution
  • Timecode, H.264 , AAC, stereo channel

Video 1: Video 2: 720p video file, captured from our live classroom recording. Video 3:

  • 1280×720 resolution
  • H.264 encoded, AAC, stereo channel
  • Duration: 50 min 30 sec
  • File Size: 608.4 MB

1- Redesign and optimisation of video transcode pipeline:

Prior to the optimization, this service was used to generate different bitrate transcoded videos in a sequential manner. For transcoding videos, we used FFmpeg software. Because FFmpeg is a well-known open-source & free software that provides different libraries for audio/video processing. Also being a command-line tool, it is easy to integrate it with the backend scripts.

Generation of different resolution videos (720p, 360p, 144p)

ffmpeg -i video.mp4 -r 24 -c:a aac -ac 2 -b:a 192k -ar 48000 -c:v libx264 -x264opts ‘keyint=24:min-keyint=24:no-scenecut’ -b:v 700k -maxrate 700k -bufsize 1000k -vf ‘scale=trunc(oh*a/2)*2:720’ ./screenshot/temp_720.mp4

ffmpeg -i video.mp4 -r 24 -c:a aac -ac 2 -b:a 64k -ar 22050 -c:v libx264 -x264opts ‘keyint=24:min-keyint=24:no-scenecut’ -b:v 400k -maxrate 400k -bufsize 400k -vf ‘scale=trunc(oh*a/2)*2:360’ ./screenshot/temp_360.mp4

ffmpeg -i video.mp4 -r 24 -c:a aac -ac 2 -b:a 64k -ar 22050 -c:v libx264 -x264opts ‘keyint=24:min-keyint=24:no-scenecut’ -b:v 100k -maxrate 100k -bufsize 150k -vf ‘scale=trunc(oh*a/2)*2:144’ ./screenshot/temp_144.mp4

As mentioned earlier, we were using the C5.4x large instance type, which is Intel Xeon Platinum 8000 series. It has the following configuration:

c5.4xlarge (Intel 1st or 2nd gen, 3.4 GHz)

16 vCPU, 32 GiB, upto 10 Gbps network bandwidth, 4750 Mbps EBS bandwidth

Intel Xeon Platinum 8000

Before optimization, transcode time on C5.4x instance (Intel Xeon Platinum 8000):

Transcode pipeline optimizations implemented:

  • Firstly, you may have noticed that we were sequentially generating the 3 resolution videos (720p, 320p, 144p) for each of the input video. It was obvious to shift it towards parallel transcode, as much as possible.
  • To further speed up the transcoding process, we tried few FFmpeg tweaks:
  • Changed ‘preset mode to fast’. {default is medium}
  • Changed ‘constant rate factor ie crf to 20’. {default is 23}
  • Keeping the frame rate to 24.

While considering these optimisation main criteria were:

  • Firstly, there should not be any significant degradation in the quality of the generated Video / Audio files.
  • Also, transcoded file size should not vary too much. Because lesser file size is always welcome.

Optimised transcode command:

1- First generate audio

ffmpeg -y -i video.mp4 -vn -ar 44100 -ac 2 -b:a 64k output.aac

2- Generate the required ABR video resolutions in parallel

ffmpeg -y -i video.mp4 -i output.aac -filter_complex “[0]split=3[v0][v1][v2];[v0]scale=trunc(oh*a/2)*2:144[low];[v1]scale=trunc(oh*a/2)*2:360[mid];[v2]scale=trunc(oh*a/2)*2:720[high]” \

-map ‘[high]’ -map 1:a -c:a copy -c:v libx264 -x264opts ‘keyint=24:min-keyint=24:no-scenecut’ -r 24 -b:v 700k -maxrate 700k -bufsize 1000k -preset fast -crf 20 ./x_720.mp4 \

-map ‘[mid]’ -map 1:a -c:a copy -c:v libx264 -x264opts ‘keyint=24:min-keyint=24:no-scenecut’ -r 24 -b:v 400k -maxrate 400k -bufsize 400k -preset fast -crf 20 ./x_360.mp4 \

-map ‘[low]’ -map 1:a -c:a copy -c:v libx264 -x264opts ‘keyint=24:min-keyint=24:no-scenecut’ -r 24 -b:v 100k -maxrate 100k -bufsize 150k -preset fast -crf 20 ./x_144.mp4

After optimization, transcode time on C5.4x instance (Intel Xeon Platinum 8000):

Here is the comparison between unoptimized Vs optimized transcode pipelines, on the same C5.4x large (Intel) instance:

So, the optimized pipeline is already 36% — 38% faster than our original implementation, which is a huge improvement.

2- Cheaper AWS Options:

AWS also provides various other CPU instance types which are available at a much cheaper rate. Yes, I am referring to AMD and Arm series. Since we faced compatibility issues with the Arm instance type, we could not compare them. However, we may revisit them in the future.

So are these cheaper instances really better or at least at par with Intel instances?

Although, we tried with different instance configuration types, here I am showing the data for the C5a.4xlarge version for the apples to apple comparison. It has the following configuration:

c5a.4xlarge (AMD EPYC 3.3 GHz)

16 vCPU, 32 GiB, upto 10 Gbps bandwidth, upto 3170 Mbps EBS bandwidth

AMD 2nd gen EPYC 7002 series

Transcode time using C5a.4x instance (AMD Epyc processor): Finally here are Transcode time comparisons between AMD Vs Intel Vs Intel (unoptimized pipeline)

Conclusion

With C5.4x (Intel) instance type and optimized FFmpeg command:

With C5a.4x (AMD Epyc) instance type and optimized FFmpeg command:

  • Transcode is 1.7–1.8 times faster than non optimized execution on C5.4xlarge instance.
  • Transcode is 9% — 15% faster as compared to optimised execution on C5.4xlarge instance.
  • In addition, C5a.4x instance type is available at almost half rate as compared to C5.4x instance type.
  • C5.4x is at $0.68/hr Vs C5a.4x at $0.37/hr.
  • As a result, by just switching to these cheaper instances, we are already saving around ₹19k — 20k per month based on the workload. So, these savings will increase as more transcode processing is done.

In short, AMD instances are slightly faster and interestingly cheaper options at the same time!

Originally published at https://techkblog.com on March 6, 2022.

--

--