Comparative Analysis of TD-Based Reinforcement Learning Algorithms

Ahmet KALA; Süleyman UZUN; Cem ÖZKURT

doi:-

The Academic Perspective Procedia publishes Academic Platform symposiums papers as three volumes in a year. DOI number is given to all of our papers.
Publisher : Academic Perspective

Journal DOI : 10.33793/acperpro
Journal eISSN : 2667-5862

Year :2025, Volume 6, Issue 2, Pages: 197-207

06.01.2025

Comparative Analysis of TD-Based Reinforcement Learning Algorithms

Ahmet KALA; Süleyman UZUN; Cem ÖZKURT

https://doi.org/-

432

153

Abstract

Reinforcement Learning has emerged as a fundamental framework in artificial intelligence, enabling agents to optimize decision-making strategies through interaction with their environment. Among RL techniques, Temporal-Difference (TD) learning stands out due to its efficiency in updating value functions incrementally, combining the strengths of Monte Carlo methods and Dynamic Programming. This study focuses on analyzing and comparing the performance of TD(0), State Action Reward State Action (SARSA), and Expected SARSA algorithms in various reinforcement learning scenarios. By conducting experiments in dynamic environments such as the Sliding Block and Windy Gridworld problems, we evaluate the adaptability, stability, and efficiency of these TD-based methods. The results demonstrate that Expected SARSA exhibits superior stability and learning performance compared to SARSA and Q-learning, particularly in high-variance environments. Our findings provide valuable insights into the effectiveness of TD-based algorithms and contribute to the ongoing development of reinforcement learning strategies for complex decision-making tasks.

Keywords: Reinforcement Learning (RL), Temporal-Difference learning (TD), State Action Reward State Action (SARSA), Monte Carlo methods

References

[1] Sutton RS, Barto AG. (2018). Reinforcement Learning: An Introduction. MIT Press. Cambridge: Massachusetts; 2018.

[2] Bertsekas DP. Reinforcement Learning and Optimal Control. Athena Scientific. Belmont: Massachusetts; 2019.

[3] Yu C, Liu J, Nemati S, Yin G. Reinforcement learning in healthcare: a survey. ACM Computing Surveys 2021; 55(1) :1-36.

[4] Arakawa R and Shiba S. Exploration of reinforcement learning for event camera using car-like robots. 2020.

[5] Luong N, Hoang D, Gong S, Niyato D, Wang P, Liang Y et al. Applications of deep reinforcement learning in communications and networking: a survey. IEEE Communications Surveys & Tutorials 2019; 21(4): 3133-3174.

[6] Lei L, Tan Y, Zheng K, Liu S, Zhang K, Shen X. Deep reinforcement learning for autonomous internet of things: model, applications and challenges. IEEE Communications Surveys & Tutorials 2020; 22(3): 1722-1760.

[7] Sutton, RS. Learning to predict by the methods of temporal differences. Machine Learning 1988; 3(1): 9-44.

[8] Tesauro G. Temporal difference learning and TD-Gammon. *Communications of the ACM 1995;38(3): 58-68.

[9] Tsitsiklis JN, Van Roy B. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 1997; 42(5): 674-690.

[10] Szepesvári C. Algorithms for Reinforcement Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 2010; 4(1): 1-103.

[11] Amiranashvili A, Dosovitskiy A, Koltun V, Brox T. Td or not td: analyzing the role of temporal differencing in deep reinforcement learning. 2018.

[12] Asis, KD and Sutton, RS. Per-decision multi-step temporal difference learning with control variates.2018. https://doi.org/10.48550/arxiv.1807.01830.

[13] Silver D, Sutton RS, Müller M. Sample-based learning methods for online planning. Journal of Machine Learning Research 2008; 9: 1937-1959.

[14] Van Seijen H, Van Hasselt H, Whiteson S, Wiering M. A theoretical and empirical analysis of Expected SARSA. IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning 2009; 177-184.

[15] Ganger M, Duryea E, Hu W. Double sarsa and double expected sarsa with shallow and deep learning. Journal of Data Analysis and Information Processing 2016; 04(04): 159-176.

[16] Jiang H, Gui R, Chen Z, Wu L, Dang J, & Zhou J. An improved sarsa(λ) reinforcement learning algorithm for wireless communication systems. IEEE Access 2019; 7: 115418-115427.

[17] Lin B, Han L, Xiang C, Liu H, Ma T. A real-time energy management strategy for off-road hybrid electric vehicles based on the expected sarsa. Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering 2022; 237(2-3): 362-380.

[18] Asis K, Hernandez-Garcia J, Holland G, Sutton R. Multi-step reinforcement learning: a unifying algorithm. 2017.

[19] Moradi Maryamnegari H, Frego M, Peer A. Model predictive control-based reinforcement learning using expected sarsa. IEEE Access 2022; 10: 81177-81191.

[20] Doya, K. Temporal difference learning in continuous time and space. Neural Computation 2000; 12(1): 219-245.

[21] Konda VR, Tsitsiklis JN. Actor-critic algorithms. Advances in Neural Information Processing Systems 2000; 12: 1008-1014.

Cite

@article{acperproISITES2025ID36, author={KALA, Ahmet and UZUN, Süleyman and ÖZKURT, Cem}, title={Comparative Analysis of TD-Based Reinforcement Learning Algorithms}, journal={Academic Perspective Procedia}, eissn={2667-5862}, volume={6}, year=2025, pages={197-207}}
KALA, A. , UZUN, . , ÖZKURT, .. (2025). Comparative Analysis of TD-Based Reinforcement Learning Algorithms. Academic Perspective Procedia, 6 (2), 197-207. DOI: -
%0 Academic Perspective Procedia (ACPERPRO) Comparative Analysis of TD-Based Reinforcement Learning Algorithms% A Ahmet KALA , Süleyman UZUN , Cem ÖZKURT% T Comparative Analysis of TD-Based Reinforcement Learning Algorithms% D 1/6/2025% J Academic Perspective Procedia (ACPERPRO)% P 197-207% V 6% N 2% R doi: -% U -

[ 0 ]

Full Paper