Papers
Topics
Authors
Recent
Search
2000 character limit reached

Structured Reinforcement Learning for Media Streaming at the Wireless Edge

Published 10 Apr 2024 in eess.SY, cs.AI, cs.LG, and cs.SY | (2404.07315v2)

Abstract: Media streaming is the dominant application over wireless edge (access) networks. The increasing softwarization of such networks has led to efforts at intelligent control, wherein application-specific actions may be dynamically taken to enhance the user experience. The goal of this work is to develop and demonstrate learning-based policies for optimal decision making to determine which clients to dynamically prioritize in a video streaming setting. We formulate the policy design question as a constrained Markov decision problem (CMDP), and observe that by using a Lagrangian relaxation we can decompose it into single-client problems. Further, the optimal policy takes a threshold form in the video buffer length, which enables us to design an efficient constrained reinforcement learning (CRL) algorithm to learn it. Specifically, we show that a natural policy gradient (NPG) based algorithm that is derived using the structure of our problem converges to the globally optimal policy. We then develop a simulation environment for training, and a real-world intelligent controller attached to a WiFi access point for evaluation. We empirically show that the structured learning approach enables fast learning. Furthermore, such a structured policy can be easily deployed due to low computational complexity, leading to policy execution taking only about 15$\mu$s. Using YouTube streaming experiments in a resource constrained scenario, we demonstrate that the CRL approach can increase quality of experience (QOE) by over 30\%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Constrained policy optimization. In International Conference on Machine Learning. PMLR, 22–31.
  2. On the theory of policy gradient methods: Optimality, approximation, and distribution shift. Journal of Machine Learning Research 22, 98 (2021), 1–76.
  3. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift. Journal of Machine Learning Research 22, 98 (2021), 1–76.
  4. Eitan Altman. 1999. Constrained Markov decision processes. Vol. 7. CRC Press.
  5. Eitan Altman. 2002. Applications of Markov decision processes in communication networks. In Handbook of Markov decision processes. Springer, 489–536.
  6. Shalabh Bhatnagar. 2010. An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processes. Systems & Control Letters 59, 12 (2010), 760–766.
  7. Qflow: A reinforcement learning approach to high qoe video streaming over wireless networks. In Proceedings of the twentieth ACM international symposium on mobile ad hoc networking and computing. 251–260.
  8. Vivek S Borkar. 2005. An actor-critic algorithm for constrained Markov decision processes. Systems & control letters 54, 3 (2005), 207–213.
  9. Vivek S Borkar. 2009. Stochastic approximation: a dynamical systems viewpoint. Vol. 48. Springer.
  10. DOPE: Doubly Optimistic and Pessimistic Exploration for Safe Reinforcement Learning. arXiv preprint arXiv:2112.00885 (2021).
  11. Provably efficient safe exploration via primal-dual policy optimization. In International Conference on Artificial Intelligence and Statistics. PMLR, 3304–3312.
  12. Natural policy gradient primal-dual method for constrained markov decision processes. Advances in Neural Information Processing Systems 33 (2020), 8378–8390.
  13. Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes.. In Advances in Neural Information Processing Systems (NeurIPS).
  14. Exploration-exploitation in constrained MDPs. arXiv preprint arXiv:2003.02189 (2020).
  15. A Continuous QoE Evaluation Framework for Video Streaming over HTTP. IEEE Transactions on Circuits and Systems for Video Technology In press (2017). https://doi.org/10.1109/TCSVT.2017.2742601
  16. Towards Network-wide QoE Fairness Using Openflow-assisted Adaptive Video Streaming. In Proceedings of ACM FhMN.
  17. Learning a Continuous-Time Streaming Video QoE Model. IEEE Transactions on Image Processing 27, 5 (May 2018), 2257–2271. https://doi.org/10.1109/TIP.2018.2790347
  18. Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 7667–7674.
  19. Model-Based Reinforcement Learning for Infinite-Horizon Discounted Constrained Markov Decision Processes.. In IJCAI. 2519–2525.
  20. Hewlett Packard Enterprise. 2021. Orange demos dynamic, self-healing 5G network slice management with HPE and Casa Systems. https://www.hpe.com/us/en/newsroom/press-release/2020/07/orange-demos-dynamic-self-healing-5g-network-slice-management-with-hpe-and-casa-systems.html.
  21. A theory of QoS for wireless. In IEEE INFOCOM 2009. Rio de Janeiro, Brazil.
  22. Ping-Chun Hsieh and I-Hong Hou. 2018. Heavy-traffic analysis of QoE optimality for on-demand video streams over fading channels. IEEE/ACM Transactions on Networking 26, 4 (2018), 1768–1781.
  23. SDN-based Application-Aware Networking on the Example of YouTube Video Streaming. In Proceedings of EWSDN.
  24. A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 8030–8037.
  25. Fast global convergence of policy optimization for constrained mdps. arXiv preprint arXiv:2111.00552 (2021).
  26. Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs. In Thirty-fifth Conference on Neural Information Processing Systems.
  27. Neural adaptive video streaming with pensieve. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 197–210.
  28. Open Networking Foundation. 2021. SD-RAN: Software Defined Radio Access Network. https://opennetworking.org/sd-ran/.
  29. Avoiding interruptions—A QoE reliability function for streaming media applications. IEEE Journal on Selected Areas in Communications 29, 5 (2011), 1064–1074.
  30. Constrained Reinforcement Learning Has Zero Duality Gap. Advances in Neural Information Processing Systems (NeurIPS) 32 (2019), 7555–7565.
  31. SDN Based QoE Optimization for HTTP-Based Adaptive Video Streaming. In Proceedings of IEEE ISM.
  32. A modular http adaptive streaming qoe model—candidate for itu-t p. 1203 (“p. nats”). In 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 1–6.
  33. Online reinforcement learning of optimal threshold policies for Markov decision processes. IEEE Trans. Automat. Control 67, 7 (2021), 3722–3729.
  34. Sandvine. 2021. The Mobile Internet Phenomena Report. https://www.sandvine.com/phenomena.
  35. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
  36. Rahul Singh and PR Kumar. 2019. Optimal Decentralized Dynamic Policies for Video Streaming over Wireless Channels. arXiv preprint arXiv:1902.07418 (2019).
  37. L. Tassiulas and A. Ephremides. 1992. Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks. IEEE Trans. Automat. Control 37, 12 (Dec. 1992), 1936–1948.
  38. Near-Optimal Sample Complexity Bounds for Constrained MDPs. arXiv preprint arXiv:2206.06270 (2022).
  39. Triple-Q: A Model-Free Algorithm for Constrained Reinforcement Learning with Sublinear Regret and Zero Constraint Violation. In International Conference on Artificial Intelligence and Statistics. PMLR, 3274–3307.
  40. Peter Whittle. 1988. Restless bandits: Activity allocation in a changing world. Journal of applied probability 25, A (1988), 287–298.
  41. Projection-Based Constrained Policy Optimization. In International Conference on Learning Representations (ICLR).
  42. Delivery quality score model for Internet video. In Proceedings of IEEE ICIP. https://doi.org/10.1109/ICIP.2014.7025402
  43. First Order Constrained Optimization in Policy Space. Advances in Neural Information Processing Systems (NeurIPS) 33 (2020).
  44. Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning. arXiv preprint arXiv:2206.05357 (2022).

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.