Reinforcement Learning Explained by Microsoft on edX
Recently I have completed Reinforcement Learning Explained by Microsoft on edX (DAT257x). Overall I had a very good learning experience. Here I would like to share what I have learned and some comments about the course.
Reinforcement learning (RL) is a branch of machine learning that trains an agent to achieve certain goal by interacting with the environment. It is the technology behind AlphaGo. The achievement of Alpha Go is fascinating and gained a lot of coverage by mass media. During this winter break, I finally found sometime to learn something about RL. There are many learning materials online. But as for a well-structured MOOC freely available with practice problems on this subject, there are not many options. DAT257x is the best I have found so far (thumbs up for Microsoft and edX). The labs (programming practice) use Python and Jupyter notebook, which are the best part I like about this course. You can clone it on Azure Notebook but at the time I found Azure notebook very glitchy (it is really a good idea though. I hope it will get better since it is in beta phase). I would recommend do the labs in your local environment. Backbone of the codes are provided and key parts are left to students to fill in. Problems are not very hard and hints are dropped at places and places (sometimes in the multiple choices). I would recommend you do it locally. The course are organized into 7 modules, each of which focuses on a main topic of RL. Lecture videos are usually a couple of minutes long and very concise. Nevertheless, references are provided for further study. This kind of presentation makes it very convenient to competent self-learners. Personally I enjoyed it. I think that anyone with some background in supervised learning and probability should feel comfortable doing this course. Sometimes I had to read Sutton’s book as a supplement. This book is mentioned as a reference and to a large extent followed by this course.
Even though this course does not require a course project, it is always helpful to do a project with what is learned. I did a project using Policy Gradient to study cancer cell invasion. I put it on Github and you can also play the Jupyter notebook on mybinder. If you find it interesting, let me know what you think about it.
There seems to slight inconsistency between modules and each is presented by a different lecturer, e.g., deep learning framework Chainer is used for DNQ in the early lab but CNTK is used in the later labs. Sometimes it even feels like one is just reading the slides. When this happens, it is important to assimilate knowledge from other resources. Not so often, some concepts were not delivered in a clear manner. I felt confused about the fake label used in training neural net for Policy Gradient method. After Googling around, it feels even more like a myth. It turns out that it does not have to be interpreted in a sense of supervised learning. I plan to write an article to explain this matter later. In addition, there are some important topics omitted or not treated explicitly by this course, such as off-policy methods and planning.
Lastly, I want to say what I feel about the field of RL in general. It is indeed a fascinating area. Even though it gained a lot of attention just recently because of AlphaGo, many algorithms and ideas are not new. The success of AlphaGo is really a combination of old techniques. Iterations and approximations are recurring theme in solving RL problems. Many ideas that stem from simple Bandit problems carry over to more complicated settings. Since most time the problem boils down to find value function or policy function in a RL problem, Neutral network as a universal function approximation comes to help and show indeed its strength. There are many open problems both in practice and in theory. For a mathematician, convergence proof of important algorithms are what should be gone after to reassure the use of them. Regardless algorithms growing more sophisticated and amazing achievements like AlphaGo, it seems still far from artificial intelligence most of us dreamed of. I am eager to see a new and more powerful paradigm emerging in the future and hope to be part of the process.