Csaba SzepesvC3A1ri

Home * People * Csaba Szepesvári

Csaba Szepesvári [1] Csaba Szepesvári,

a Hungarian computer scientiest with research interests in applications of statistical techniques in AI, and Reinforcement Learning [2]. Csaba Szepesvári worked at the Computer and Automation Research Institute of the Hungarian Academy of Sciences, and is professor at the Department of Computing Science, University of Alberta, and principal investigator of the RLAI [3] group, actually on leave at DeepMind.

UCT

In 2006, along with Levente Kocsis, Csaba Szepesvári introduced UCT (Upper Confidence bounds applied to Trees), a new algorithm that applies bandit ideas to guide Monte-Carlo planning [4]. UCT accelerated the Monte-Carlo revolution in computer Go [5] and other domains.

Selected Publications

[6] [7]

1994 …

Csaba Szepesvári, Lászlo Balázs, András Lõrincz (1994). Topology learning solved by extended objects: a neural network model. pdf
Csaba Szepesvári (1998). Reinforcement Learning: Theory and Practice. in Proceedings of the 2nd Slovak Conference on Artificial Neural Networks, zipped ps

2005 …

Levente Kocsis, Csaba Szepesvári, Mark Winands (2005). RSPSA: Enhanced Parameter Optimization in Games. Advances in Computer Games 11, pdf
Levente Kocsis, Csaba Szepesvári (2006). Universal Parameter Optimisation in Games Based on SPSA. Machine Learning, Special Issue on Machine Learning and Games, Vol. 63, No. 3
Levente Kocsis, Csaba Szepesvári (2006). Bandit based Monte-Carlo Planning. ECML-06, LNCS/LNAI 4212, pp. 282-293. introducing UCT, pdf
Levente Kocsis, Csaba Szepesvári, Jan Willemson (2006). Improved Monte-Carlo Search. pdf
András György, Levente Kocsis, Ivett Szabó, Csaba Szepesvári (2007). Continuous Time Associative Bandit Problems IJCAI-07, 830-835. pdf
Jean-Yves Audibert, Rémi Munos, Csaba Szepesvári (2007). Tuning Bandit Algorithms in Stochastic Environments. pdf
Richard Sutton, Csaba Szepesvári, Hamid Reza Maei (2008). A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation. NIPS 2008, pdf
Rémi Munos, Csaba Szepesvári (2008). Finite time bounds for sampling based fitted value iteration. Journal of Machine Learning Research, 9:815-857, 2008. pdf, pdf
Hamid Reza Maei, Csaba Szepesvári, Shalabh Bhatnagar, Doina Precup, David Silver, Richard Sutton (2009). Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation. NIPS 2009, pdf
Richard Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvári, Eric Wiewiora. (2009). Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation. ICML 2009
Jean-Yves Audibert, Rémi Munos, Csaba Szepesvári (2009). Exploration-exploitation trade-off using variance estimates in multi-armed bandits. Theoretical Computer Science, Vol. 410, pdf

2010 …

Csaba Szepesvári (2010). Algorithms for Reinforcement Learning. Morgan & Claypool
István Szita, Csaba Szepesvári (2010). Model-based reinforcement learning with nearly tight exploration complexity bounds. ICML 2010
Hamid Reza Maei, Csaba Szepesvári, Shalabh Bhatnagar, Richard Sutton (2010). Toward Off-Policy Learning Control with Function Approximation. ICML 2010, pdf
István Szita, Csaba Szepesvári (2011). Agnostic KWIK learning and efficient approximate reinforcement learning. Journal of Machine Learning Research - Proceedings Track 19
Sylvain Gelly, Marc Schoenauer, Michèle Sebag, Olivier Teytaud, Levente Kocsis, David Silver, Csaba Szepesvári (2012). The Grand Challenge of Computer Go: Monte Carlo Tree Search and Extensions. Communications of the ACM, Vol. 55, No. 3, pdf preprint
Mahdi Milani Fard, Joelle Pineau, Csaba Szepesvári (2012). PAC-Bayesian Policy Evaluation for Reinforcement Learning. arXiv:1202.3717

2015 …

Tor Lattimore, Csaba Szepesvári (2017). The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits. AISTATS, pdf
Tor Lattimore, Csaba Szepesvári (2018). Cleaning up the neighborhood: A full classification for adversarial partial monitoring. arXiv:1805.09247
Tor Lattimore, Csaba Szepesvári (2019). Bandit Algorithms. Cambridge University Press (draft), pdf

External Links

Homepage of Csaba Szepesvári from University of Alberta
Bandit Algorithms
Csaba Szepesvari - Google Scholar Citations
Csaba, Szepesvári, PhD. Senior Research Scientist from Hungarian Academy of Sciences
Introduction to Reinforcement Learning, videolecture by Csaba Szepesvári, 2008

References

↑ Homepage of Csaba Szepesvári
↑ Research Interests of Csaba Szepesvári
↑ Reinforcement Learning and Artificial Intelligence (RLAI)
↑ Levente Kocsis, Csaba Szepesvári (2006). Bandit based Monte-Carlo Planning
↑ Sylvain Gelly, Marc Schoenauer, Michèle Sebag, Olivier Teytaud, Levente Kocsis, David Silver, Csaba Szepesvári (2012). The Grand Challenge of Computer Go: Monte Carlo Tree Search and Extensions. Communications of the ACM, Vol. 55, No. 3, pdf preprint
↑ Publications of Csaba Szepesvári
↑ dblp: Csaba Szepesvári

Up one level

Edit this page on GitHub

Csaba SzC3BBts

CSVN

Chess Programming Wiki

Title here

Csaba SzepesvC3A1ri

UCT

Selected Publications

1994 …

2005 …

2010 …

2015 …

External Links

References

Csaba SzepesvC3A1ri

UCT#

Selected Publications#

1994 …#

2005 …#

2010 …#

2015 …#

External Links#

References#

UCT

Selected Publications

1994 …

2005 …

2010 …

2015 …

External Links

References