REKLAMA


Preset settings in x264: the quality and compression speed test

x264-3d-eng-logoWhy to examine preset settings test? First of all, presets are the settings of x264 encoder that are used very often, oftentimes the only ones used, especially by novice users. Second of all, if the compression of a given video is to last twelve hours instead of about two, it would be good to be sure that it is profitable and that the image quality acquired would be noticeably better and that the time spent on the compression would not be wasted. Especially that during the video compression the computer is so overloaded that you can barely work on it then.

Finally, I have never seen such a test carried out professionally while other tests, e.g. video codecs/encoders comparison tests, are common and it is difficult to discover something new in this area.

We will try to find answers to the questions: Are the settings in respective presets optimally selected? Do the slower presets ensure the increase of image quality? Is there a preset among them that should be avoided?

 

Research environment

In order to make the results as reliable as possible, the tests were carried out on video samples that have never been compressed before. Earlier, even slight compression makes it easier for the encoder to work because some frequencies of such samples are already reduced and in case of a compression with high bitrate and settings ensuring high quality, the results of measurements may turn out to be unreliable.

The research was carried out on four uncompressed video clips (designed for testing) in Y4M (YUV4MPEG) format. This format is defined as RAW, which means that each frame is stored as a sequence of pixels encoded in the YUV color space, i.e. in the way they are displayed on the screen (therefore no other additional conversions are needed). I used video clips in HD resolution (three in 1080p and one in 720p) because sources in such resolution are most often compressed by x264.

 

Data and description of particular test video clips:

  • old_town_cross

duration: 500 frames
resolution: 1280×720 [720p]
frame rate: 50 fps progressive
format and color subsampling: YUV 4:2:0
camera: ARRI ArriFlex 765 System for 65mm
colorimetry: ITU-R BT.709

description: a bird’s-eye view of city panorama, many details on buildings, slow camera motion, large area of sky with barely visible clouds. Quite easy to encode.

 

  • crowd_run

duration: 500 frames
resolution: 1920×1080 [1080p]
frame rate: 50 fps progressive
format and color subsampling: YUV 4:2:0
camera: ARRI ArriFlex 765 System for 65mm
colorimetry: ITU-R BT.709

description: running crowd of competitors. Bottom part of the frame: very large amount of fast moving objects with many details, upper part: static background with complex texture in some parts (trees, grass, people). Quite difficult to compress but, at the same time, very interesting in terms of results analysis. It is the only video clip in which the difference between compression by particular presets was clearly visible (in the frame sections with static background).

 

  • riverbed

duration: 250 frames
resolution: 1920×1080 [1080p]
frame rate: 25 fps progressive
format and color subsampling: YUV 4:2:0
camera: Sony HDW-F900
colorimetry: ITU-R BT.709

description: waving water through which the stony bottom is visible. Very difficult to compress. Large amount of fast moving details on the entire frame space. Only the differences in sharpness between the particular presets are clearly visible.

 

  • pedestrian_area

duration: 375 frames
resolution: 1920×1080 [1080p]
frame rate: 25 fps progressive
format and color subsampling: YUV 4:2:0
camera: Sony HDW-F900
colorimetry: ITU-R BT.709

description: people walking on the sidewalk, static camera. Few details, large uniform sidewalk surfaces, in parts not very complex and moderately fast moving objects (walking people). Easy to compress.

 

Research method

  • Compression

I used x264 encoder version core:133 r2334 64 bit to compress the testing samples. If the quality of particular samples is to be comparable, the samples must have the same bitrate value. Therefore, the mode „Constant Rate Factor” (CRF) cannot be used because bitrate of the samples would be impossible to predict. I used 2-pass compression and defined only bitrate and preset without any other modification (without setting the profile and tune options).

The example of command line:

x264_64bit --preset medium --pass 1 --bitrate 4000 -o pedestrian_area-4M-1pass.mkv pedestrian_area_1080p25.y4m
x264_64bit --preset medium --pass 2 --bitrate 4000 -o pedestrian_area-4M-medium.mkv pedestrian_area_1080p25.y4m

Each sample for planned bitrate was compressed one by one with all presets (i.e. ten times without taking into consideration the first pass which generates statistics). Test clips were not long so, in some cases, x264 encoder had problems with hitting the planned bitrate. In those cases I compressed the samples again (sometimes 3 or 4 times) by manual adjustment of the bitrate value in order to reduce the bitrate differences between all the samples to minimum.

Bitrate of the particular test samples and the maximum bitrate differences between the particular presets:

crowd_run (8 Mb/s) -> 0,675%
crowd_run (10 Mb/s) -> 0,73%
crowd_run (12 Mb/s) -> 0,8%
crowd_run (14 Mb/s) -> 1,25%

riverbed (8 Mb/s) -> 1,125%
riverbed (10 Mb/s) -> 1,01%
riverbed (12 Mb/s) -> 1%
riverbed (15 Mb/s) -> 0,51%

old_town_cross (1 Mb/s) -> 1,9%
old_town_cross (2 Mb/s) -> 1,8%
old_town_cross (4 Mb/s) -> 1,67%
old_town_cross (6 Mb/s) -> 1,23%
old_town_cross (8 Mb/s) -> 1,06%

pedestrian_area (2 Mb/s) -> 1,5%
pedestrian_area (4 Mb/s) -> 0,75%

As you can see, the maximum bitrate deviation amounts to 1,9% and is so little that the influence of these differences on the measurements results is negligible. I received 150 compressed test samples altogether.

  • Measurements

I assumed to carry out the measurements of image quality using the following indexes:

- SSIM (Structural Similarity)
- PSNR (Peak signal-to-noise ratio)
- blurring/sharpness index (MSU blurring)
- blocking artifact index (MSU blocking)

Initially I planned to carry out the measurements with the use of a program that is well known to me, namely MSU Video Quality Measurement Tool (MSU VQMT). However, it turned out that the free version does not accept any files in HD resolution on input. Such a possibility was made available just in PRO version, which is paid (and its price is quite high). So I have browse the Internet and I have found the alternative that does not have such limits: Video Quality Measurement Tool – VQMT 1.1. This program is more difficult to use, it has no graphics interface and it has to be started form the command line. This is an advantage in case of large amount of testing samples because the batch files can be used. The problem is that this program enables the measurements only with first two indexes, namely SSIM and PSNR. I had to carry out the two remaining indexes measurements in MSU VQMT but this will be discussed later on. I will explain why I didn’t take a shortcut and didn’t use the SSIM and PSNR measuring mechanism built-in the x264 encoder in the phase of samples compression. Well, x264 is an encoder, not a measuring program, so the measuring algorithms implemented in the encoder can be optimized in terms of speed and, in this way, simplified. Anyway, the results received by x264 were not fully corresponding with the results received by VQMT. Therefore, I’ve decided to use the second one.

There were lots of measurements results and their manual copying (for the purpose of visualization) into Excel was quite a laborious process. Therefore, I have created a small program, which helped me in my work and took the results out of CSV files by itself. The rest of my work is made up of tables, graphs and the proper formatting in Excel.

 

Results

I didn’t analyze the variation of the particular index during the video clip. All the graphs show the average results for the whole video clip compressed by a particular preset. The duration of compression is presented in the logarithmic scale.

  • Problem with ultrafast preset

The majority of graphs for SSIM and PSNR indexes don’t include the results for the fastest ultrafast preset. The results were so low in comparison with the remaining presets that placing them in the graph resulted in merging of the remaining results with one another and in being completely illegible. It also refers to the graphs for the remaining indexes: ultrafast preset generates VERY intensive blocking artifacts in the whole frame, even with high bitrates. Therefore, the results of ultrafast preset are omitted in the main graph – they are presented in the additional graphs in other scale. On the other hand, the result for the blurring/sharpness index was surprisingly high. The blurring index is calculated on the basis of contrasts analysis between pixels groups. In case of ultrafast preset the measurement is disrupted by very clear blocking artifacts which generate high contrasts between adjoining blocks. That is why the blurring index measurement result is so high, in spite of the fact that after the compression by this preset the image is not sharp (it is also not blurred – it is just a large blocks cluster). Therefore, this index will not be taken into consideration in the results analysis for the ultrafast preset.

 

First clip: old_town_cross

Among all the researched clips this is the only one in lower 720p resolution.

old_town-SSIM

old_town-PSNR

As I have already mentioned, the blurring/sharpness index and blocking index measurements were carried out in MSU VQMT, which accepts for testing only the clips in maximum 768×576 resolution (maximum PAL resolution). Therefore, first I had to cut the examined clip to 768×576 resolution using AviSynth script (with CROP command) and then test it in w MSU VQMT. Both indexes are of no-reference type. It means that they don’t require the comparison of frames between the tested sample and the original.

Because the blurring/sharpness index and blocking index research was not carried out on the whole frame space, the results are not 100% reliable. However, the additional measurements I carried out prove that the results would not change significantly if the test were carried out on full frames. I tried to cut the clip frames in different random areas and the measurements results were almost the same. I assumed that the results received are highly probable and, therefore, I have presented them in this article.

Below: the location of the cropped area in the full frame:

old_town-crop(full)

old_town-blurring

old_town-blocking

Below: the same graph with increased scale where the differences between the results received for ultrafast preset and for the remaining presets can be noticed:

old_town-blocking_with_ultrafast

 

Second clip: crowd_run

crowd_run-SSIM

crowd_run-PSNR

The location of the cropped area in the full frame:

crowd_run-crop(full)

 

crowd_run-blurring

crowd_run-blocking

Once again the comparison of the extremely high blocking artifacts for ultrafast preset:

crowd_run-blocking_with_ultrafast

Let’s pay attention to faster preset which, for the second time, stands out above average. In case of SSIM index its evident increase occurs above slow preset. However, in case of PSNR measurement faster preset gets practically the highest results. Where does this phenomenon of faster preset result from?

  • Faster preset better than others… is that true?

In order to draw reasonable conclusions we have to forget for a while about the measurements and reach for the visual evaluation of the image. First, let’s take a look at frames fragments of the samples with the lowest 8 Mb/s bitrate and let’s try to evaluate their quality ourselves:

Ultrafast:

crowd_run-8M-ultrafast

Faster:

crowd_run-8M-faster

Slower:

crowd_run-8M-slower

I hope it is clearly visible that faster preset gives a strongly blurred and not very detailed image which will be certainly subjectively evaluated as an image in worse quality than the one received by slower preset. PSNR index measures only the errors (differences) resulting from compression. As the result of the PSNR index measurement an image that is blurred and devoid of small details can give slighter differences than a sharp and detailed image in comparison to the original. Why is it happening? Small details of the image generate higher frequencies than the blurred, flat areas and, therefore, they require higher bitrate to store them in the stream. If the bitrate is not high enough, the quantization causes cutting of some higher frequencies and thereby deformation of small details of the image. All these deformations of details can generate larger differences between the compressed image and the original than their complete removal from the image (which occurs in compression by faster preset). As a proof, let’s take a look at twice magnified fragments of frames (faster preset on the left, slower preset on the right):

crowd_run-8M-faster-(frag-2x)  crowd_run-8M-slower-(frag-2x)

In the fragment on the left, in spite of smaller complexity, the objects outline is preserved better, which may generate high result of SSIM index measurement (it is higher than, e.g. medium preset). Obviously, during watching the film nobody compares the image pixel by pixel and if such distortions occur in dynamic scenes or frame fragments, they are simply imperceptible. However, the image as a whole will be interpreted as more sharp and it will be evaluated as qualitatively better. It is clearly visible in the blurring index graph – faster preset achieved the lowest result in both tested bitrates. What’s interesting is that faster preset achieved also the highest (the worst) result with blocking artifacts (excluding ultrafast preset). Therefore, both measurements are contradictory to the results received with PSNR and SSIM indexes.

  • Measurements vs. subjectively perceived image quality

The creators of modern encoders, especially x264, know well about all the dependencies. Therefore, they implemented many methods that improved distribution of bits between less or more dynamic frame fragments or the whole scenes which has a significant influence on the acquired image quality (MB-tree, RDO modes, Adaptive Quantization). Some of them, especially RDO modes, are enabled and used more intensively with slower presets, which paradoxically may cause worse results in PSNR index measurements and sometimes also in SSIM index measurements (much more rarely). It proves again that particularly PSNR index is by no means an indicator of which image is perceived as the one in better quality. Therefore, in my opinion, using only the typical quality indexes and drawing conclusions on the basis of them is pointless. Unfortunately, many tests, including the professional ones, are carried out in such a way. I don’t mean to question encoders tests carried out finely by MSU group. However, if we take a look at the way the x264 encoder presets were set (among others: enabled tune ssim parameter, which blurs the image in order to receive the higher measurement result), I wonder why the additional blurring/sharpness index measurements were not carried out instead (especially that the MSU group measuring application introduces this index). Perhaps it complicated the results too much. Anyway, in case of this preset test, the PSNR index and SSIM index were not enough to receive clear results. Coming back to the –tune ssim and –tune psnr settings, I have experimented with them as well but I have noticed that they definitely worsen the image quality with slower presets by disabling the RDO mode and thus blurring the image. However, the graph for PSNR and SSIM indexes increased as expected (namely, slower preset = higher index value).

 

To sum up, „crowd_run” clip may be a good reference for scenes with large amount of small details – fast moving in the almost static background, with small flat areas (sky). Taking into consideration the fact that there are quite a lot of such scenes, the results of measurements for this clip should be considered as significant.

 

Third clip: riverbed

riverbed-SSIM

riverbed-PSNR

 

  • Does denoising of image change the results arrangement?

I have noticed that this clip (as well as the remaining ones) contains lots of noise/grain. I wanted to check if its removal will change somehow the results arrangement between the particular presets – e.g. will make the compression easier making the results higher for one of them while working the other way round for the other preset. Therefore, except for the regular tests, I have generated an additional sample with denoised and slightly sharpened image. For this purpose I have used AviSynth and FFT3d filter with the following parameters:

FFT3DFilter(sigma=3, bt=3, sharpen=1.0)

As you can guess, it increased the sample’s compressibility, which resulted in higher measurements results with SSIM index – the sample devoid of noise with 8 Mb/s bitrate comes closer to the sample with 10 Mb/s bitrate. However, the results arrangement remained almost unchanged – the measurements results of denoised sample look almost like a copy of the results of the sample with noise preserved, they are only moved up in the graph. The only slight difference is that the results achieved by presets from slower up are slightly better than the results of the sample with noise (they equaled the medium preset). After this short test I came to the conclusion that the efficiency of the particular presets depends slightly on the intensity of image noising. However, more tests are necessary to give a more precise answer to this question.

  • Why are the results of high presets so low?

It is connected with the atypical characteristic of the image in this clip. Higher presets have the modes (RDO 6-11 modes, Trellis 2 mode quantization, Adaptive Quantization), enabled by default, that allocate bits between more and less dynamic/complex frame fragments or between different frames, which usually improves the image quality. Unfortunately, the „riverbed” clip does not include more or less dynamic fragments as the whole clip has quite uniform, very dynamic image and, moreover, a similar texture. Additionally, the clip is quite short – 250 frames. Therefore, these mechanisms don’t prove to be correct here or they may simply make wrong decisions. However, the sharpness and details of image (the results below) are improved, which increases the requirement for data stream. Therefore, without the efficiently working mechanisms enumerated above, the artifacts will appear. In my opinion, this is the reason why the low results for SSIM and PSNR measurements appear.

Let’s move on to the further measurements results.

The location of a cropped fragment in the full frame:

riverbed-crop(full)

riverbed-blurring

riverbed-blocking

Again, the same graph in scale which shows the results for ultrafast preset:

riverbed-blocking_with_ultrafast

Because of high dynamics and large amount of details, the „riverbed” clip is very difficult to evaluate with the naked eye. If we were to base exclusively on the results of SSIM and PSNR indexes measurements, it can be concluded that using faster preset would be the best way to compress the clip. However, the truth is the same as it was described in case of the previous clip – faster preset and the lower presets blur the image details very intensely (it is clearly visible in the graph of blurring/sharpness index), which gives surprisingly good effects with SSIM and PSNR measurements.

Below, twice magnified fragments of frames where the issues described above can be clearly visible.

The original frame:

a.riverbed-original-(frag-2x)

Faster – 8 Mb/s (the image is highly blurred and not very detailed but without visible artifacts. The objects outlines are well preserved):

b.riverbed-8M-faster-(frag-2x)

Veryslow – 8 Mb/s (the sharpness and complexity are higher but the compression artifacts are visible – they are marked with arrows):

c.riverbed-8M-veryslow-(frag-2x)-with-arrows

However, I have to point out that these artifacts are not visible in the clip during standard watching, while greater number of details and the sharpness improvement are perceived as an evident quality improvement. However, if the details are not important for us and we care only for preserving the objects outlines, the faster preset will work just fine in case of such a dynamic clip. Additionally, we can be more certain that the blocking artifacts will not appear with low bitrate. Anyway, it turns out again that without more accurate analysis of indexes other than basic SSIM and PSNR, the sensible conclusions concerning image quality cannot be drawn.

Summing up the measurements results for the „riverbed” clip, I think they should not be regarded as reliable for typical movie clips – they are rarely characterized by such type of dynamics, especially for their whole duration. However, it may be a good reference for very dynamic and detailed scenes.

 

Fourth clip: pedestrian_area

pedestrian_area-SSIM

pedestrian_area-PSNR

The location of the cropped area in the full frame:

pedestrian_area-crop(full)

pedestrian_area-blurring

pedestrian_area-blocking

Larger scale for ultrafast preset:

pedestrian_area-blocking_with_ultrafast

 

The results for „crowd_run” clip can be a good benchmark for not very dynamic scenes – the camera is motionless, the objects moving in front of it are big, not very detailed and often blurred (fast movement).

 

The comparison of 32 and 64-bit versions of x264 encoder

At this opportunity I carried out some additional compression measurements with 32 and 64-bit versions of x264. It was definite that 64-bit version is a bit faster but I have not expected that it will give a slightly better image quality. Yet… the measurements speak for themselves. The differences are not significant – from 0,000038 to 0,001580 for SSIM index and from 0,000683 to 0,073603 for PSNR index. Therefore, the differences cannot be noticed with the naked eye but we have an additional reason to use the 64-bit version.

The time of compression for each preset presents as follows:

preset compression time [s]
32bit 64bit
ultrafast 2,38 2,49
superfast 4,93 4,81
veryfast 8,49 8,33
faster 15,24 14,08
fast 30,16 28,31
medium 39,81 37,65
slow 89,29 85,62
slower 170,07 158,23
veryslow 352,11 333,33
placebo 925,93 833,33

 

32vs64bit-SSIM

32vs64bit-PSNR

 

Conclusions

  1. The bitrate increase of the compressed material prolongs slightly the time of compression.
  2. The differences in image quality and in time of compression between fast and medium presets are very minor (slightly in favour of medium, mainly in terms of greater image sharpness).
  3. Placebo preset, as the name suggests, very rarely provides any quality improvement (sometimes it causes quality decrease with regard to veryslow preset!) and, taking into consideration the extremely long compression time, it should be used only by people who don’t care for time.
  4. Taking into consideration all the results, if we have more time for the compression and if the sharp and detailed image are important for us, slow and slower presets are, in my opinion, the best ones (with an emphasis on the latter). In case of higher presets, the quality increase is disproportional to the expenditure of time.
  5. If the speed of compression (not the image quality) is a priority, I propose to choose faster preset (surprisingly good results for SSIM and PSNR indexes) or fast. The presets below faster provide very low image quality, so I strongly advise against using them, unless we compress with extremely high bitrate. Then the quality decrease caused by low preset strongly loses its significance.
  6. The lower the bitrate, the more important is the choice of preset used – the dispersion of results increased, especially for SSIM and PSNR. In other words, the lower bitrate we set, the higher preset should be used for compression. Yet, in case of high bitrates particular presets provided results more similar to one another (measured with SSIM and PSNR indexes).
    The blurring/sharpness index poorly correlates with stream bitrate and it clearly increases along with higher presets, that is regardless of bitrate. Higher preset provides sharper and more detailed image.
  7. The results for blocking index are difficult to be confirmed with the naked eye (I skipped ultrafast here). There is also no clear correlation between the remaining indexes. In most cases faster preset produced the largest number of blocking artifacts (in spite of the lowest image sharpness). I have to admit that I have no theory to explain this phenomenon.
    In case of lower bitrate, higher presets may cause slight increase of blocking artifacts. It was probably the result of bitrate deficiency for more detailed and sharp image, which was provided by higher presets (however, this is not a rule).
  8. Ultrafast preset should be given a wide berth even if the fastest compression is important for us. The measurements results were so poor that they didn’t fit in any graph in the assumed scale.
  9. Medium preset was chosen by the x264 creators as the default preset. It could be argued if this was a right choice – in 3 out of 4 samples an evident „drop” can be observed in the results of SSIM and PSNR with medium preset. The usage of slow preset provides a considerable increase of quality (at the expense of slight compression slowdown) measured with all the indexes. Therefore, I recommend to use this preset as the default one as it is the best compromise between the quality and the time of compression.

 


Opublikowano: 23 mar 2014 / Kategoria: jakość obrazu » pomiary, testy i porównania / 50 905 wyświetleń

Odpowiedzi: 8 do wpisu “Preset settings in x264: the quality and compression speed test”

  1. Aneta pisze:

    widzę sporo wiedzy merytorycznej, szkoda że nie w oryginalnym polskim języku

    • qbakos pisze:

      Proszę kliknąć na poprzedni artykuł – to jest dokładnie to samo tylko w wersji polskiej :)
      Kolejne artykuły także postaram się publikować w obu wersjach językowych.

  2. robnitro pisze:

    Sorry,
    I did not realize you speak english, instead I translated the article.
    Have you considered trying the intel haswell quick sync 264 on slowest setting to compare?
    In a test here, the author says it is as good as x264… but
    http://www.tetrachromesoftware.com/q264Test1Analysis/q264test_5.html

    I wonder if your blocking and blurring measurements would change this!

    • qbakos pisze:

      I will try to perform some tests to compare q264 and x264 because this topic seems to be interesting. However, I need some time to acquaint myself closely with the q264 encoder. Therefore, please be patient.

      • robnitro pisze:

        No rush, you are doing a great service!
        I notified that guy about your tests but no reply yet, it would be nice if he ran the blurring and blocking tests too.

        I use handbrake or vidcoder instead to try intel encoder.
        However the problem is that they don’t have 2 pass for intel. ABR is not the same quality as finding a specific CRF for your size as x264 does in 2 pass.

        So, what I did to do visual comparison:
        Find what size/bitrate I want.
        Then I ran intel QSV at balanced or highest quality setting with QP . If it was bigger than needed, I increase QP and try again until I get to the target bitrate

        I can do the videos for you if you want, but I don’t have haswell, I have sandy bridge 2500k which isn’t the latest qsv. I have noticed more blocking with qsv for my SB cpu.
        robnitro ostatnio opublikował: Mars used to be a lot wetter than it is nowMy Profile

  3. fachman pisze:

    There is the major flaw n those tests. Faster presets have problemms with utilising all cores in Intel Core I7.
    In case of Ultrafast and Faster barelly half of the possible time is used to compute.
    That means you can find yourself a settings which will use this free cpu time for better quality without increasing encoding time so much.
    In case of slower CPU with less cores the settings are good,, but if you have something faster you need to find out yourself the optimum speed/quality wise setting,.

    • LOVE pisze:

      hi! what settings do you recommend for having an I5 core? :)

      I really want to have a good quality video, I dont mind how much time it takes

  4. LOVE pisze:

    hi!!!
    if you would recommend a present for having a good quality only (not considering the file size and time), which is the best between PLACEBO, VERY SLOW, and SLOWER?

Musisz mieć włączoną obsługę JavaScript, aby dodawać komentarze!

Zostaw odpowiedź


CommentLuv badge

Następny / poprzedni artykuł:



Kategorie

O autorze

    Nazywam się Jakub Kościelny. Na forach jako qbakos. Technikami przetwarzania i kompresji wideo interesuję się właściwie od studiów, czyli już dość długo.
    Kilka lat temu postanowiłem połączyć to hobby z czymś poważniejszym i zacząłem o tym pisać, robić badania, pomiary.
    Na tej stronie zebrałem to wszystko w mam nadzieję przystępnej formie i udostępniłem zainteresowanym tą tematyką.
    A przy okazji mam motywację, aby cały czas szukać nowych tematów na artykuły... oraz zgłębiać tajniki WordPressa. Zapraszam :)