Our paper entitled ‘Lookahead Pathology in Monte-Carlo Tree Search’ has been accepted to ICAPS, a CORE A* conference in automated planning and scheduling, with 20% acceptance rate. This is my first big-deal publication, achieved after a very long process.

The technical result

In symbolic AI, search is an important formulation of intelligence. One type of search that is heavily explored is adversarial search, meaning search for a move (or in fact a whole plan) in a game between multiple players. Among advances in adversarial search, AlphaGo is perhaps the most famous. AlphaGo is a game-playing system developed by Deepmind around 2015 that defeated the champion in the game of Go.

Among many technicalities in AlphaGo, Monte Carlo Tree Search (MCTS) is the main search algorithm. From the name, we can tell that this algorithm is a combination between Monte Carlo sampling (from Statistics) and Tree Search (from symbolic AI). In particular, the MCTS is evoked whenever a game player needs to make a move. It starts with equal scores for all the legal moves in the subtree rooted under the current game state. After that, as long as time allows, it repeatedly do the following: (1) recursively selecting a child state to explore based on some policy until getting to a new state, (2) estimating the utility of the new state based on some heuristic function, and (3) using the estimated utility to update the scores of the states along the explored path. For decision making, the move with the highest score is chosen to play.

Due to the its widespread use in various applications, it is important to have a good understanding of MCTS. It’s just like: before selling self-driving cars, the manufacturer must know exactly how it works in various conditions. For MCTS, we found that it actually has a failure mode: lookahead pathology. Intuitively, lookahead means ‘planning ahead’ what will happen in the future after an action is done. For Chess, you lookahead by imagining your chance of winning after a potential move. For driving, you lookahead by imagining what will happen on the road if you, say, increase the speed by 5 miles per hour. The alternative of lookahead is static evaluation, or ‘do it when you feel like it’. Lookahead is intuitively important. However, if you imagine too much, you will overthink and hurt your productivity. Similarly, due to the nature of MCTS, sometimes it hurts to lookahead too far.

As such, we are the first one in the literature to provide evidence for the existence of such pathological behavior in MCTS. We do that in two ways. Firstly, we provide mathematical proof for the existence of lookahead pathology in certain settings. Secondly, we explore a much wider range of settings empirically by simulations. To do that, we created a novel simulated environment, which is a probabilistic parametrized model for all possible games in this universe. Then, we measure the effect of lookahead on MCTS’ performance across the grid of values of the model’s parametrization. The result is that: MCTS does exhibit such a pathology in a wide range of settings.

Implications? If you want to use MCTS, especially the pure MCTS instantiation that we studied (i.e., UCT), be careful of having it overthink!

The journey

I started this project in my freshman year at Fulbright when Dr. Ramanujan offered to advise me on an independent research course in AI. Back then, Fulbright did not have enough CS courses for me to take, but more importantly, I am very curious about the project, so I said yes. Little did I know, that was the beginning of a 4-year project that only ends a few months after I graduated from Fulbright.

me giving a cute presentation to my STEM friends at Fulbright about the project

It is worth noting that the 4 years could be shortened into 1 year if we wanted to. I could have worked with 100% of my focus like a PhD student and my advisor could have pushed me to work like that. But we just did it one step at a time: having me learning the search algorithms, running some exploratory experiments, progressing through Covid, submitting to a few conferences, getting rejections and feedback, etc.

This project was one of the nicest things happened to me during my undergrad. Thanks to it, I got two value things – learning and connection with my advisor.

For learning:

  • In this very project, I first learned about various engineering skills like git and github, virtualenv (with pipenv), running batch jobs with SLURM, spawning subprocess, etc. These skills prove to be more and more important as I progress into my CS research career.
  • Knowledge-wise, this project allows me to learn about search algorithm even before learning about AI. That helps me develop an appreciation for symbolic AI very early in my research journey. Now I am still looking for interesting problem where planning techniques could be used to improve auto-regressive LLMs capabilities.
  • At the same time, this project is where I first learn to be a scholar. My advisor showed me how to ask the interesting questions, leveraging mathematical analysis when needed, as well as story-telling in technical papers, and crafting responses to reviewers. Now doing my PhD, I often find myself channeling Dr. Ramanujan a lot.

Beside learning new things, it has also been very beneficial for me to work closely with a researcher because they could recommend me for further opportunities. Dr. Ramanujan wrote recommendation letters that help me land the REU at the National University of Singapore (which was also my first time being abroad), and later get a PhD offer in the US.

If you are a CS undergrad interested in research, you may wonder how to go about doing research. I think it is pretty similar to how you find your industry internships. You need to spend time to do look for opportunities. It is easier to ask people who you know for a position, especially the professors in your school. And the circle of people you know can only offer you a limited search of problems to work on, which might eventually be something not interesting to you at all. And the boss might not be the one you enjoy working with. But that’s how it is. You don’t have to stick to the research field you explored during undergrad, or the profess you work with during that time. In fact, working with various advisors in various subfields, regardless of how much you like them, is a valuable experience for you moving forward.