The Academic Perspective Procedia publishes Academic Platform symposiums papers as three volumes in a year. DOI number is given to all of our papers.
Publisher : Academic Perspective
Journal DOI : 10.33793/acperpro
Journal eISSN : 2667-5862
[1] Lillicrap T., Hunt J., Pritzel A., Heess N., Erez T., Tassa Y. et al.. Continuous control with deep reinforcement learning. 2015. https://doi.org/10.48550/arxiv.1509.02971
[2] Haarnoja T., Zhou A., Hartikainen K., Tucker G., Ha S., Tan J. et al.. Soft actor-critic algorithms and applications. 2018. https://doi.org/10.48550/arxiv.1812.05905
[3] Wang R., Foster D., & Kakade S.. What are the statistical limits of offline rl with linear function approximation?. 2020. https://doi.org/10.48550/arxiv.2010.11895
[4] Fujimoto S., Meger D., & Precup D.. Off-policy deep reinforcement learning without exploration. 2018. https://doi.org/10.48550/arxiv.1812.02900
[5] Silver D., Schrittwieser J., Simonyan K., Antonoglou I., Huang A., Guez A. et al.. Mastering the game of go without human knowledge. Nature 2017;550(7676):354-359. https://doi.org/10.1038/nature24270
[6] Mandyam A., Jones A., Laudański K., & Engelhardt B.. Nested policy reinforcement learning. 2021. https://doi.org/10.48550/arxiv.2110.02879
[7] Fujimoto S., Conti E., Ghavamzadeh M., & Pineau J.. Benchmarking batch deep reinforcement learning algorithms. 2019. https://doi.org/10.48550/arxiv.1910.01708
[8] Mnih V., Kavukcuoglu K., Silver D., Rusu A., Veness J., Bellemare M. et al.. Human-level control through deep reinforcement learning. Nature 2015;518(7540):529-533. https://doi.org/10.1038/nature14236
[9] Chen Z.. A unified lyapunov framework for finite-sample analysis of reinforcement learning algorithms. ACM SIGMETRICS Performance Evaluation Review 2022;50(3):12-15. https://doi.org/10.1145/3579342.3579346
[10] Shi L., Li S., Cao L, Long Y., & Pan G.. Tbq(σ): improving efficiency of trace utilization for off-policy reinforcement learning. 2019. https://doi.org/10.48550/arxiv.1905.07237
[11] Kumar A., Fu J., Tucker G., & Levine S.. Stabilizing off-policy q-learning via bootstrapping error reduction. 2019. https://doi.org/10.48550/arxiv.1906.00949
[12] Touati A., Zhang A., Pineau J., & Vincent P.. Stable policy optimization via off-policy divergence regularization. 2020. https://doi.org/10.48550/arxiv.2003.04108
[13] Imani E., Graves E., & White M.. An off-policy policy gradient theorem using emphatic weightings. 2018. https://doi.org/10.48550/arxiv.1811.09013
[14] Munos R., Stepleton T., Harutyunyan A., & Bellemare M.. Safe and efficient off-policy reinforcement learning. 2016. https://doi.org/10.48550/arxiv.1606.02647
[15] Gu S., Lillicrap T., Ghahramani Z., Turner R., & Levine S.. Q-prop: sample-efficient policy gradient with an off-policy critic. 2016. https://doi.org/10.48550/arxiv.1611.02247
[16] Kallus N. and Uehara M.. Intrinsically efficient, stable, and bounded off-policy evaluation for reinforcement learning. 2019. https://doi.org/10.48550/arxiv.1906.03735
[17] Tokdar S. and Kass R.. Importance sampling: a review. WIREs Computational Statistics 2009;2(1):54-60. https://doi.org/10.1002/wics.56
[18] Yu T., Lu L., & Li J.. A weight-bounded importance sampling method for variance reduction. International Journal for Uncertainty Quantification 2019;9(3):311-319. https://doi.org/10.1615/int.j.uncertaintyquantification.2019029511
[19] Liu Y., Bacon P., & Brunskill E.. Understanding the curse of horizon in off-policy evaluation via conditional importance sampling. 2019. https://doi.org/10.48550/arxiv.1910.06508