Authors:
(1) Joshua P. Ebenezer, Student Member, IEEE, Laboratory for Image and Video Engineering, The University of Texas at Austin, Austin, TX, 78712, USA, contributed equally to this work (e-mail: [email protected]);
(2) Zaixi Shang, Student Member, IEEE, Laboratory for Image and Video Engineering, The University of Texas at Austin, Austin, TX, 78712, USA, contributed equally to this work;
(3) Yixu Chen, Amazon Prime Video;
(4) Yongjun Wu, Amazon Prime Video;
(5) Hai Wei, Amazon Prime Video;
(6)Sriram Sethuraman, Amazon Prime Video;
(7) Alan C. Bovik, Fellow, IEEE, Laboratory for Image and Video Engineering, The University of Texas at Austin, Austin, TX, 78712, USA.
Table of Links
Abstract—We conducted a large-scale study of human perceptual quality judgments of High Dynamic Range (HDR) and Standard Dynamic Range (SDR) videos subjected to scaling and compression levels and viewed on three different display devices. HDR videos are able to present wider color gamuts, better contrasts, and brighter whites and darker blacks than SDR videos. While conventional expectations are that HDR quality is better than SDR quality, we have found subject preference of HDR versus SDR depends heavily on the display device, as well as on resolution scaling and bitrate. To study this question, we collected more than 23,000 quality ratings from 67 volunteers who watched 356 videos on OLED, QLED, and LCD televisions. Since it is of interest to be able to measure the quality of videos under these scenarios, e.g. to inform decisions regarding scaling, compression, and SDR vs HDR, we tested several well-known full-reference and no-reference video quality models on the new database. Towards advancing progress on this problem, we also developed a novel no-reference model called HDRPatchMAX, that uses both classical and bit-depth sensitive distortion statistics more accurately than existing metrics.
Index Terms—High dynamic range, video quality assessment, video compression
I. INTRODUCTION
High Dynamic Range (HDR) videos have utilize deeper bit-depths to represent brighter and darker luminance with wider color gamuts than Standard Dynamic Range (SDR) videos. To obtain the full benefits of HDR, however, a display must have the technology to accurately represent high contrasts and the extremes of the dynamic range. SDR videos are gamma-encoded using the power law described in BT 709 [1]. When shown on a TV, these values are decoded and adjusted to the TV’s display capabilities using a look-up table. Although SDR standards were originally created for cathode ray televisions having a maximum display brightness of 100 nits, modern display devices can use their entire brightness range (often much greater than 100 nits) to display SDR content, since the digital values of SDR content are relative and not scene-referred. The exact mapping between the SDR digital values and what the television displays are able to display differs across products and is ordinarily proprietary.
Videos following the HDR10 standard, on the other hand, are scene-referred and absolute. The PQ EOTF (used in the HDR10 standard) specifies the absolute luminance that the display must show for a particular digital value. If the absolute luminance value that is required to be shown is greater than the highest luminance value that the TV can display, a tonemapping function called the Electrical-Electrical Transfer Function (EETF) is applied on the HDR content so that clipping does not occur at highlights, while ensuring a smooth roll-off of brightness values at the peak. EETFs differ among televisions and are also usually proprietary.
Due to differences in how SDR and HDR signals are displayed on HDR-capable displays, an SDR version of a content may have a higher average brightness than the HDR version, depending on how each is graded and displayed. For example, a content having a maximum brightness of 200 nits in HDR may be graded in SDR such that the digital value of the maximum brightness is 255. When the HDR and SDR versions are displayed on an HDR-capable device having a peak brightness of 1000 nits, the SDR version may present a peak brightness much larger than 200 nits while the HDR version will be displayed with a peak brightness of 200 nits. The SDR version may therefore appear brighter, but it may also be washed out or oversaturated. Higher peak brightness or higher average brightness are not the only reasons why HDR content can be more appealing than SDR. Indeed, SDR videos may suffer from defects such as saturation, banding, lowcontrast, etc., which are less likely to occur in HDR videos.
In addition to these differences, HDR videos use 10 bit representations (stored in 16 bits), while SDR videos have 8 bit representations. HDR videos require twice the number of bytes that SDR videos of the same content have. Because of this, HDR videos may be more susceptible to compression artifacts.
The tradeoffs between compression, contrast-representation, color-representation, and brightness make the perceptual assessment of HDR and SDR video quality content-dependent and display-dependent. Towards better understanding these tradeoffs, we have conducted a detailed subjective and objective assessment of HDR and SDR videos having the same contents. For the subjective study, we recruited 67 participants who viewed and rated the qualities of 356 HDR and SDR videos of 25 source contents, which were processed by various combinations of scaling and compression using the x265 encoder. We also evaluate objective full-reference (FR) and no-reference (NR) video quality assessment (VQA) on the new subjective database. We also present the design of a new NR VQA model for the task of predicting the quality of both HDR and SDR videos.
This paper is under CC 4.0 license.