The Academic Perspective Procedia publishes Academic Platform symposiums papers as three volumes in a year. DOI number is given to all of our papers.
Publisher : Academic Perspective
Journal DOI : 10.33793/acperpro
Journal eISSN : 2667-5862
[1] Sutton RS, Barto AG. (2018). Reinforcement Learning: An Introduction. MIT Press. Cambridge: Massachusetts; 2018.
[2] Bertsekas DP. Reinforcement Learning and Optimal Control. Athena Scientific. Belmont: Massachusetts; 2019.
[3] Yu C, Liu J, Nemati S, Yin G. Reinforcement learning in healthcare: a survey. ACM Computing Surveys 2021; 55(1) :1-36.
[4] Arakawa R and Shiba S. Exploration of reinforcement learning for event camera using car-like robots. 2020.
[5] Luong N, Hoang D, Gong S, Niyato D, Wang P, Liang Y et al. Applications of deep reinforcement learning in communications and networking: a survey. IEEE Communications Surveys & Tutorials 2019; 21(4): 3133-3174.
[6] Lei L, Tan Y, Zheng K, Liu S, Zhang K, Shen X. Deep reinforcement learning for autonomous internet of things: model, applications and challenges. IEEE Communications Surveys & Tutorials 2020; 22(3): 1722-1760.
[7] Sutton, RS. Learning to predict by the methods of temporal differences. Machine Learning 1988; 3(1): 9-44.
[8] Tesauro G. Temporal difference learning and TD-Gammon. *Communications of the ACM 1995;38(3): 58-68.
[9] Tsitsiklis JN, Van Roy B. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 1997; 42(5): 674-690.
[10] Szepesvári C. Algorithms for Reinforcement Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 2010; 4(1): 1-103.
[11] Amiranashvili A, Dosovitskiy A, Koltun V, Brox T. Td or not td: analyzing the role of temporal differencing in deep reinforcement learning. 2018.
[12] Asis, KD and Sutton, RS. Per-decision multi-step temporal difference learning with control variates.2018. https://doi.org/10.48550/arxiv.1807.01830.
[13] Silver D, Sutton RS, Müller M. Sample-based learning methods for online planning. Journal of Machine Learning Research 2008; 9: 1937-1959.
[14] Van Seijen H, Van Hasselt H, Whiteson S, Wiering M. A theoretical and empirical analysis of Expected SARSA. IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning 2009; 177-184.
[15] Ganger M, Duryea E, Hu W. Double sarsa and double expected sarsa with shallow and deep learning. Journal of Data Analysis and Information Processing 2016; 04(04): 159-176.
[16] Jiang H, Gui R, Chen Z, Wu L, Dang J, & Zhou J. An improved sarsa(λ) reinforcement learning algorithm for wireless communication systems. IEEE Access 2019; 7: 115418-115427.
[17] Lin B, Han L, Xiang C, Liu H, Ma T. A real-time energy management strategy for off-road hybrid electric vehicles based on the expected sarsa. Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering 2022; 237(2-3): 362-380.
[18] Asis K, Hernandez-Garcia J, Holland G, Sutton R. Multi-step reinforcement learning: a unifying algorithm. 2017.
[19] Moradi Maryamnegari H, Frego M, Peer A. Model predictive control-based reinforcement learning using expected sarsa. IEEE Access 2022; 10: 81177-81191.
[20] Doya, K. Temporal difference learning in continuous time and space. Neural Computation 2000; 12(1): 219-245.
[21] Konda VR, Tsitsiklis JN. Actor-critic algorithms. Advances in Neural Information Processing Systems 2000; 12: 1008-1014.