Papers
Topics
Authors
Recent
Search
2000 character limit reached

Joint Quality Assessment and Example-Guided Image Processing by Disentangling Picture Appearance from Content

Published 20 Apr 2024 in eess.IV and cs.CV | (2404.13484v1)

Abstract: The deep learning revolution has strongly impacted low-level image processing tasks such as style/domain transfer, enhancement/restoration, and visual quality assessments. Despite often being treated separately, the aforementioned tasks share a common theme of understanding, editing, or enhancing the appearance of input images without modifying the underlying content. We leverage this observation to develop a novel disentangled representation learning method that decomposes inputs into content and appearance features. The model is trained in a self-supervised manner and we use the learned features to develop a new quality prediction model named DisQUE. We demonstrate through extensive evaluations that DisQUE achieves state-of-the-art accuracy across quality prediction tasks and distortion types. Moreover, we demonstrate that the same features may also be used for image processing tasks such as HDR tone mapping, where the desired output characteristics may be tuned using example input-output pairs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (98)
  1. ITU-R, “ITU-R BT.2446: Methods for conversion of high dynamic range content to standard dynamic range content and vice-versa,” 2021.
  2. J. Hable. Uncharted 2: HDR lighting. [Online]. Available: https://www.gdcvault.com/play/1012351/Uncharted-2-HDR
  3. I. Katsavounidis, “Dynamic optimizer - a perceptual video encoding optimization framework,” March 2018. [Online]. Available: https://netflixtechblog.com/dynamic-optimizer-a-perceptual-video-encoding-optimization-framework-e19f1e3a277f
  4. E. Reinhard, M. Stark, P. Shirley, and J. Ferwerda, “Photographic tone reproduction for digital images,” ACM Transactions on Graphics, vol. 21, no. 3, p. 267–276, Jul 2002.
  5. Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
  6. H. R. Sheikh and A. C. Bovik, “Image information and visual quality,” IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 430–444, 2006.
  7. R. Soundararajan and A. C. Bovik, “Video quality assessment by reduced reference spatio-temporal entropic differencing,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 4, pp. 684–694, 2013.
  8. S. Li, F. Zhang, L. Ma, and K. N. Ngan, “Image quality assessment by separately evaluating detail losses and additive impairments,” IEEE Transactions on Multimedia, vol. 13, no. 5, pp. 935–949, 2011.
  9. Z. Li, A. Aaron, I. Katsavounidis, A. Moorthy, and M. Manohara, “Toward a practical perceptual video quality metric,” The Netflix Tech Blog, vol. 6, p. 2, 2016.
  10. A. K. Venkataramanan, C. Stejerean, and A. C. Bovik, “FUNQUE: Fusion of unified quality evaluators,” in IEEE International Conference on Image Processing (ICIP), 2022, pp. 2147–2151.
  11. A. K. Venkataramanan, C. Stejerean, I. Katsavounidis, and A. C. Bovik, “One transform to compute them all: Efficient fusion-based full-reference video quality assessment,” IEEE Transactions on Image Processing, vol. 33, pp. 509–524, 2024.
  12. Z. Shang, J. P. Ebenezer, A. K. Venkataramanan, Y. Wu, H. Wei, S. Sethuraman, and A. C. Bovik, “A study of subjective and objective quality assessment of HDR videos,” IEEE Transactions on Image Processing, vol. 33, pp. 42–57, 2024.
  13. A. K. Venkataramanan, C. Stejerean, I. Katsavounidis, and A. C. Bovik, “A FUNQUE approach to the quality assessment of compressed HDR videos,” arXiv preprint arXiv:2312.08524, 2023.
  14. H. Yeganeh and Z. Wang, “Objective quality assessment of tone-mapped images,” IEEE Transactions on Image Processing, vol. 22, no. 2, pp. 657–667, 2013.
  15. H. Ziaei Nafchi, A. Shahkolaei, R. Farrahi Moghaddam, and M. Cheriet, “FSITM: A feature similarity index for tone-mapped images,” IEEE Signal Processing Letters, vol. 22, no. 8, pp. 1026–1029, 2015.
  16. A. K. Venkataramanan, C. Stejerean, I. Katsavounidis, H. Tmar, and A. C. Bovik, “Cut-FUNQUE: Objective quality assessment of compressed and tone mapped high dynamic range videos,” Manuscript Under Submission, vol. 1, 2024.
  17. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  18. K. Ding, K. Ma, S. Wang, and E. P. Simoncelli, “Image quality assessment: Unifying structure and texture similarity,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 5, pp. 2567–2581, 2022.
  19. X. Liao, B. Chen, H. Zhu, S. Wang, M. Zhou, and S. Kwong, “DeepWSD: Projecting degradations in perceptual space to wasserstein distance in deep feature space,” in ACM International Conference on Multimedia, 2022, p. 970–978.
  20. A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality assessment in the spatial domain,” IEEE Transactions on Image Processing, vol. 21, no. 12, pp. 4695–4708, 2012.
  21. A. K. Moorthy and A. C. Bovik, “Blind image quality assessment: From natural scene statistics to perceptual quality,” IEEE Transactions on Image Processing, vol. 20, no. 12, pp. 3350–3364, 2011.
  22. J. Korhonen, “Two-level approach for no-reference consumer video quality assessment,” IEEE Transactions on Image Processing, vol. 28, no. 12, pp. 5923–5938, 2019.
  23. D. Kundu, D. Ghadiyaram, A. C. Bovik, and B. Evans, “No-reference quality assessment of tone-mapped HDR pictures,” IEEE Transactions on Image Processing, vol. 26, no. 6, pp. 2957–2971, 2017.
  24. J. P. Ebenezer, Z. Shang, Y. Wu, H. Wei, S. Sethuraman, and A. C. Bovik, “ChipQA: No-reference video quality prediction via space-time chips,” IEEE Transactions on Image Processing, vol. 30, pp. 8059–8074, 2021.
  25. Z. Ying, H. Niu, P. Gupta, D. Mahajan, D. Ghadiyaram, and A. Bovik, “From patches to pictures (PaQ-2-PiQ): Mapping the perceptual space of picture quality,” in IEEE/CVF International Conference on Computer Vision and Pattern Recognition, 2020, pp. 3575–3585.
  26. Z. Ying, M. Mandal, D. Ghadiyaram, and A. Bovik, “Patch-VQ: ‘Patching up’ the video quality problem,” in IEEE/CVF International Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 019–14 029.
  27. X. Li, T. Gao, X. Zheng, R. Hu, J. Zheng, Y. Shen, K. Li, Y. Liu, P. Dai, Y. Zhang et al., “Adaptive feature selection for no-reference image quality assessment using contrastive mitigating semantic noise sensitivity,” arXiv preprint arXiv:2312.06158, 2023.
  28. J. Ke, Q. Wang, Y. Wang, P. Milanfar, and F. Yang, “MUSIQ: Multi-scale image quality transformer,” in IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 5148–5157.
  29. X. Li, J. Zheng, X. Zheng, R. Hu, E. Zhang, Y. Gao, Y. Shen, K. Li, Y. Liu, P. Dai et al., “Less is more: Learning reference knowledge using no-reference image quality assessment,” arXiv preprint arXiv:2312.00591, 2023.
  30. K. Xu, L. Liao, J. Xiao, C. Chen, H. Wu, Q. Yan, and W. Lin, “Local distortion aware efficient transformer adaptation for image quality assessment,” arXiv preprint arXiv:2308.12001, 2023.
  31. P. C. Madhusudana, N. Birkbeck, Y. Wang, B. Adsumilli, and A. C. Bovik, “Image quality assessment using contrastive learning,” IEEE Transactions on Image Processing, vol. 31, pp. 4149–4161, 2022.
  32. A. Saha, S. Mishra, and A. C. Bovik, “Re-IQA: Unsupervised learning for image quality assessment in the wild,” in IEEE/CVF International Conference on Computer Vision and Pattern Recognition, 2023, pp. 5846–5855.
  33. P. C. Madhusudana, N. Birkbeck, Y. Wang, B. Adsumilli, and A. C. Bovik, “CONVIQT: Contrastive video quality estimator,” IEEE Transactions on Image Processing, vol. 32, pp. 5138–5152, 2023.
  34. T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International Conference on Machine Learning, 2020, pp. 1597–1607.
  35. K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in IEEE/CVF International Conference on Computer Vision and Pattern Recognition, 2020, pp. 9729–9738.
  36. X. Chen, H. Fan, R. Girshick, and K. He, “Improved baselines with momentum contrastive learning,” arXiv preprint arXiv:2003.04297, 2020.
  37. X. Wang, H. Chen, S. Tang, Z. Wu, and W. Zhu, “Disentangled representation learning,” arXiv preprint arXiv:2211.11695, 2022.
  38. H. Kim and A. Mnih, “Disentangling by factorising,” in International Conference on Machine Learning, Jul 2018, pp. 2649–2658.
  39. R. T. Chen, X. Li, R. B. Grosse, and D. K. Duvenaud, “Isolating sources of disentanglement in variational autoencoders,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  40. X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets,” in Advances in Neural Information Processing Systems, vol. 29, 2016.
  41. X. Zhu, C. Xu, and D. Tao, “Where and what? examining interpretable disentangled representations,” in IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 5861–5870.
  42. Y. Wei, Y. Shi, X. Liu, Z. Ji, Y. Gao, Z. Wu, and W. Zuo, “Orthogonal jacobian regularization for unsupervised disentanglement in image generation,” in IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 6721–6730.
  43. J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny, “Barlow twins: Self-supervised learning via redundancy reduction,” in International Conference on Machine Learning, 2021, pp. 12 310–12 320.
  44. L. Tran, X. Yin, and X. Liu, “Disentangled representation learning GAN for pose-invariant face recognition,” in IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), Jul 2017.
  45. L. Liu, J. Li, L. Niu, R. Xu, and L. Zhang, “Activity image-to-video retrieval by disentangling appearance and motion,” in AAAI Conference on Artificial Intelligence, vol. 35, no. 3, 2021, pp. 2145–2153.
  46. P. Cheng, W. Hao, S. Dai, J. Liu, Z. Gan, and L. Carin, “CLUB: A contrastive log-ratio upper bound of mutual information,” in International Conference on Machine Learning, vol. 119, 13–18 Jul 2020, pp. 1779–1788.
  47. E. H. Sanchez, M. Serrurier, and M. Ortner, “Learning disentangled representations via mutual information estimation,” in European Conference on Computer Vision (ECCV), 2020, pp. 205–221.
  48. H.-Y. Lee, H.-Y. Tseng, J.-B. Huang, M. Singh, and M.-H. Yang, “Diverse image-to-image translation via disentangled representations,” in European Conference on Computer Vision (ECCV), September 2018.
  49. V.-H. Tran and C.-C. Huang, “Domain adaptation meets disentangled representation learning and style transfer,” in IEEE International Conference on Systems, Man and Cybernetics (SMC), 2019, pp. 2998–3005.
  50. H. E. Gedik, A. K. Venkataramanan, and A. C. Bovik, “Joint deep image restoration and unsupervised quality assessment,” arXiv preprint arXiv:2311.16372, 2023.
  51. L. Wang, Q. Wu, K. N. Ngan, H. Li, F. Meng, and L. Xu, “Blind tone-mapped image quality assessment and enhancement via disentangled representation learning,” in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2020, pp. 1096–1102.
  52. Z. Ye, Y. Wu, D. Liao, T. Yu, J. Yang, and J. Hu, “DRIQA-NR: No-reference image quality assessment based on disentangled representation,” Signal, Image and Video Processing, vol. 17, no. 3, pp. 661–669, 2023.
  53. J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, and R. Shah, “Signature verification using a “Siamese” time delay neural network,” in Advances in Neural Information Processing Systems, vol. 6, 1993.
  54. H. Barrow, J. Tenenbaum, A. Hanson, and E. Riseman, “Recovering intrinsic scene characteristics,” Computer Vision Systems, vol. 2, no. 3-26, p. 2, 1978.
  55. Z. Tu, H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. C. Bovik, and Y. Li, “MAXIM: Multi-axis MLP for image processing,” in IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 5769–5780.
  56. A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
  57. X. Chen, S. Xie, and K. He, “An empirical study of training self-supervised vision transformers,” in IEEE/CVF International Conference on Computer Vision, 2021, pp. 9640–9649.
  58. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, October 5-9, 2015, pp. 234–241.
  59. B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Jul 2017.
  60. S. Nah, T. Hyun Kim, and K. Mu Lee, “Deep multi-scale convolutional neural network for dynamic scene deblurring,” in IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), Jul 2017, pp. 3883–3891.
  61. D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis,” in IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4105–4113.
  62. V. Dumoulin, J. Shlens, and M. Kudlur, “A learned representation for artistic style,” arXiv preprint arXiv:1610.07629, 2016.
  63. F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang, “Residual attention network for image classification,” in IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), Jul 2017, pp. 3156–3164.
  64. X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in IEEE/CVF International Conference on Computer Vision (ICCV), Oct 2017, pp. 1501–1510.
  65. Q. He, D. Li, T. Jiang, and M. Jiang, “Quality assessment for tone-mapped HDR images using multi-scale and multi-layer information,” in IEEE International Conference on Multimedia and Expo Workshops, 2018, pp. 1–6.
  66. A. K. Venkataramanan, C. Wu, A. C. Bovik, I. Katsavounidis, and Z. Shahid, “A hitchhiker’s guide to structural similarity,” IEEE Access, vol. 9, pp. 28 872–28 896, 2021.
  67. W. Xue, L. Zhang, X. Mou, and A. C. Bovik, “Gradient magnitude similarity deviation: A highly efficient perceptual image quality index,” IEEE Transactions on Image Processing, vol. 23, no. 2, pp. 684–695, 2013.
  68. S. Saini, A. K. Venkataramanan, and A. C. Bovik, “The LIVE user-generated HDR video dataset,” 2024. [Online]. Available: https://live.ece.utexas.edu/research/LIVE_UGC_HDR/index.html
  69. T. Borer and A. Cotton, “A display-independent high dynamic range television system,” SMPTE Motion Imaging Journal, vol. 125, no. 4, pp. 50–56, 2016.
  70. SMPTE, “High dynamic range electro-optical transfer function of mastering reference displays,” SMPTE Standard, vol. 2084, p. 11, 2014.
  71. F. Durand and J. Dorsey, “Fast bilateral filtering for the display of high-dynamic-range images,” in ACM Annual Conference on Computer Graphics and Interactive Techniques, 2002, p. 257–266.
  72. Q. Shan, T. DeRose, and J. Anderson, “Tone mapping high dynamic range videos using wavelets,” Pixar Technical Memo, 2012.
  73. E. Reinhard, T. Pouli, T. Kunkel, B. Long, A. Ballestad, and G. Damberg, “Calibrated image appearance reproduction,” ACM Trans. Graph., vol. 31, no. 6, Nov 2012.
  74. G. Eilertsen, R. K. Mantiuk, and J. Unger, “Real-time noise-aware tone mapping,” ACM Trans. Graph., vol. 34, no. 6, Nov 2015.
  75. M. Oskarsson, “Temporally consistent tone mapping of images and video using optimal k-means clustering,” Journal of Mathematical Imaging and Vision, vol. 57, no. 2, pp. 225–238, Feb 2017.
  76. A. Rana, P. Singh, G. Valenzise, F. Dufaux, N. Komodakis, and A. Smolic, “Deep tone mapping operator for high dynamic range images,” IEEE Transactions on Image Processing, vol. 29, pp. 1285–1298, 2020.
  77. J. Yang, Z. Liu, M. Lin, S. Yanushkevich, and O. Yadid-Pecht, “Deep reformulated laplacian tone mapping,” arXiv preprint arXiv:2102.00348, 2021.
  78. A. K. Venkataramanan and A. C. Bovik, “Subjective quality assessment of compressed tone-mapped high dynamic range videos,” Manuscript Under Preparation, vol. 1, 2024.
  79. VideoLAN, “x264.” [Online]. Available: https://code.videolan.org/videolan/x264.git
  80. (2016) An introduction to Dolby Vision. [Online]. Available: https://professional.dolby.com/siteassets/pdfs/dolby-vision-whitepaper_an-introduction-to-dolby-vision_0916.pdf
  81. H. Lin, V. Hosu, and D. Saupe, “DeepFL-IQA: Weak supervision for deep IQA feature learning,” arXiv preprint arXiv:2001.08113, 2020.
  82. N. Murray, L. Marchesotti, and F. Perronnin, “AVA: A large-scale database for aesthetic visual analysis,” in IEEE/CVF International Conference on Computer Vision and Pattern Recognition, 2012, pp. 2408–2415.
  83. E. Mavridaki and V. Mezaris, “No-reference blur assessment in natural images using Fourier transform and spatial pyramids,” in IEEE International Conference on Image Processing (ICIP), 2014, pp. 566–570.
  84. M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The Pascal visual object classes (VOC) challenge,” International Journal of Computer Vision, vol. 88, pp. 303–338, 2010.
  85. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in European Conference on Computer Vision, 2014, pp. 740–755.
  86. B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, “Learning deep features for scene recognition using places database,” in Advances in Neural Information Processing Systems, vol. 27, 2014.
  87. H. R. Sheikh, M. F. Sabir, and A. C. Bovik, “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3440–3451, 2006.
  88. E. C. Larson and D. M. Chandler, “Most apparent distortion: full-reference image quality assessment and the role of strategy,” Journal of Electronic Imaging, vol. 19, no. 1, pp. 011 006–011 006, 2010.
  89. N. Ponomarenko, O. Ieremeiev, V. Lukin, K. Egiazarian, L. Jin, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti et al., “Color image database TID2013: Peculiarities and preliminary results,” in European Workshop on Visual Information Processing (EUVIP), 2013, pp. 106–111.
  90. H. Lin, V. Hosu, and D. Saupe, “KADID-10k: A large-scale artificially distorted IQA database,” in International Conference on Quality of Multimedia Experience (QoMEX), 2019, pp. 1–3.
  91. L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity index for image quality assessment,” IEEE Transactions on Image Processing, vol. 20, no. 8, pp. 2378–2386, 2011.
  92. P. Ye, J. Kumar, L. Kang, and D. Doermann, “Unsupervised feature learning framework for no-reference image quality assessment,” in IEEE/CVF International Conference on Computer Vision and Pattern Recognition, 2012, pp. 1098–1105.
  93. W. Zhang, K. Ma, J. Yan, D. Deng, and Z. Wang, “Blind image quality assessment using a deep bilinear convolutional neural network,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 1, pp. 36–47, 2020.
  94. H. Zeng, L. Zhang, and A. C. Bovik, “A probabilistic quality representation approach to deep blind image quality prediction,” arXiv preprint arXiv:1708.08190, 2017.
  95. S. Su, Q. Yan, Y. Zhu, C. Zhang, X. Ge, J. Sun, and Y. Zhang, “Blindly assess image quality in the wild guided by a self-adaptive hyper network,” in IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), June 2020, pp. 3667–3676.
  96. W. Kim, A.-D. Nguyen, S. Lee, and A. C. Bovik, “Dynamic receptive field generation for full-reference image quality assessment,” IEEE Transactions on Image Processing, vol. 29, pp. 4219–4231, 2020.
  97. J. Shi, P. Gao, and J. Qin, “Transformer-based no-reference image quality assessment via supervised contrastive learning,” arXiv preprint arXiv:2312.06995, 2023.
  98. K. He, J. Sun, and X. Tang, “Guided image filtering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 6, pp. 1397–1409, 2013.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.