Ask Effective Thesis Alumni Q&A session with Stephen Casper

Stephen (Cas) Casper is a PhD student working on technical AI safety research at MIT. In this interview Cas covers topics such as his view of the AI safety research landscape, the advice he'd give his past self for his undergraduate thesis, how to prepare for graduate school, and the most valuable research skills he thinks an early-career researcher can develop.

This is an edited version of an interview our community manager Aleksandrina held with Cas for our community platform. If you want to hear about other interviews with early-career researchers and Effective Thesis alumni, apply to our online community platform and explore our upcoming events!

If you're interested in the topics Cas discusses in this interview, you can find his research here. If you want to work on these topics yourself, explore our profiles on AI safety and governance, and consider applying for our coaching!

Ask Effective Thesis Alumni Q&A session with Stephen Casper

Hi Cas, thanks for joining us! Could you tell us about yourself and what research you’re currently working on?

I'm a second year PhD student at MIT, advised by Dylan Hadfield-Menell, and I'm in the algorithmic alignment lab. The main thing I work on is technical AI safety research. I'm someone who codes every day and works with neural networks a lot. That, in combination with my interests and the types of projects I’ve had opportunities to get involved in, has led me to focusing specifically on research on diagnostics, adversarial vulnerabilities, interpretability and similar topics related to deep neural networks. The things that I'm working on right now all involve attacking networks and interpreting them by attacking them.

I think this is an abundantly interesting area to work in, and I enjoy it a lot. It's quickly evolving, like many other areas in machine learning, and there's also quite a lot of real world importance and safety implications of this type of work. I think AI safety agendas also usually hinge in some way on having useful diagnostic and debugging tools that can empower humans to exercise the control they want over AI systems.

And what's the research landscape you’re working in like?

You'll see lots of people propose their own models of what the AI safety research landscape looks like. For what it's worth, here's mine, which might be a little helpful for someone who wants to think more about exactly what's going on in the space and where their interests and skills fit best.

The way I see it, there are four clusters. For clusters that are next to each other, I think there’s a lot of potential work that can be done at the intersection. But for clusters that are across from each other (e.g. ‘conceptual’ and ‘applications’), while the connections may still exist, they're more sparse.

One of these clusters is the deep learning one – this is where I and most grad students in CS programmes are. And this is where people code, and work with neural networks every day, and work on various topics and with various tools related to understanding, evaluating and making AI systems safer.

Another cluster is the conceptual one, which is the space where most of the high level ideas and new paradigms are being developed and vocabulary shaping work in AI safety work is being done. Lots of people here don't necessarily have CS degrees or code every day. People in this cluster might have a strong math or philosophy background (or sometimes CS), and like to think about questions from a theoretical or philosophical perspective, or work on alignment or agendas for alignment at a high level. A lot of ground in this area is being broken on the Alignment Forum at the moment.

There's governance and society at the bottom. This is probably the cluster I’m least qualified to talk about, but I think this might be one of the most neglected and one of the most important types of work to be doing. There are a lot of challenges when it comes to how society handles AI systems and what actually happens with them. So thinking about general ways to positively shape the ecosystem of governance, research, development, and deployment of AI systems is very good. And something that we will need much more of in the future.

Last but not least, there are applications. This is where you'll see people doing immediately relevant work that often addresses near term concerns with AI systems. And from a lot of perspectives, this is really useful; it’s intrinsically valuable to solve these important problems, such as making self-driving cars go well or making recommender systems go well. These are pretty big and important challenges. But also from a meta perspective, if we solve any of these things, I think it’s very unlikely that we will not learn a bunch of technical and a bunch of governance strategies that are going to lead to some broader, useful ripples when it comes to how we approach AI safety overall. So this is how I think about the AI safety space.

Your undergrad topic fits into this landscape, right? Can you share more about what your thesis topic was?

Yeah, I've been working on topics related to technical AI safety for over four years. And during this period, I connected with Effective Thesis during my undergraduate thesis. I was already interested in interpretability, attacks, and robustness, and I got interested in working on adversarial reinforcement learning for my thesis.

Reinforcement learning is this paradigm in machine learning which involves agents learning to achieve a goal via a formalised process of trial and error – for example, lots of machine learning systems are trained and tested on playing things like video games.

Adversarial reinforcement learning is focused on systems being trained in contexts that are engineered to make them fail. I did research looking at multi agent environments, and specifically at the dynamics involved when one agent learns a behaviour that makes another agent fail. So think like a two player video game.

I chose my topic partly as a follow up to some earlier research that had been done in the AI safety space. I think there’s something to be said for building on other people’s work for a thesis topic, as building on pre-existing work meant I had some people to reach out to, which turned out to be useful.

How did Effective Thesis support you?

I think getting feedback on ideas, experiments and drafts was one of the most valuable things when I was writing my thesis. So for me the most valuable thing Effective Thesis did was connecting me to a few different people to talk to. And after I talked to these people, I was like, ‘hey, can I follow up with you sometime when I'm a little bit further in this process?’ or ‘could I send a research proposal to you in a google doc to get your comments?’ Being able to get this kind of feedback is the point of having an advisor/supervisor, but receiving feedback from a larger range of people with different areas of expertise can also be really useful.

What problems did you encounter when working on your thesis?

The main pitfall I encountered was persisting for a long time with experiments that weren’t working. Most people doing AI research can probably relate in some way. If you go and read papers from the literature you'll get the impression that things work a lot more easily and often than they do. It’s because there's a strong selection effect where when experiments don't work, you don't write the paper. So at first I was just setting experiments up the way I wanted, pressing ‘run,’ and expecting them to work. I've come to realise this almost never works.

I think one useful piece of advice for avoiding this is to make the absolute simplest possible thing for your first experiment, then gradually, step by step, attempt things that are more complicated and interesting until you are running full scale experiments that you can write a paper about. I cannot emphasise enough how much time and effort and stress is saved by beginning with making the simplest possible thing to find the right approach to get things going.

What skills did you find particularly valuable during your thesis, perhaps that you wish you’d developed sooner?

I think that doing a project like a thesis kind of decomposes into four distinct parts. A big one – maybe even the most important, but also perhaps the easiest to overlook – is knowing what questions to work on in the first place. This isn’t something you can straight-forwardly go and practice, but it’s a very useful research skill.

The second thing is having the right approach to learning as much background as you can in the area in which you're doing work – so the skills involved in effective reading and annotation and good note-taking etc. I think the third skill involves actually conducting the work, and in my case, the experiments. And the fourth is the process of writing. And I think all of these are essential components of the research pipeline. And all of them are independently difficult.

I'm tempted to say: ‘well, I wish I could have done all of them generically better, by having more knowledge and experience.’ In a way, I think this is the best and most complete answer.

Is there any advice on building those skills that you’d give to your past self?

In the past few years, I have come to appreciate much more than I would have predicted the value of reading a lot and taking notes on what you read, and developing a deep level of familiarity with the area that you are working in. I think this kind of reading is indispensable when it comes to knowing how to approach things with your research and knowing what is neglected and needed.

So for example, if I'm interested in something – say adversarial reinforcement learning like I was for my thesis – something that would have been very useful is to go to the academic literature and find all the papers I possibly can and spend a few weeks or a few months reading and taking notes on maybe 50 papers.

Maybe the most ambitious version of this would be to do a set of deep dives into reading about a closely connected set of topics in the research literature, taking notes on them, and then writing an extended research proposal or high quality paper surveying the space, and especially surveying what is missing from the space. Sometimes this kind of reading is even better if you can couple it with a project. So if you have to do a course project for a particular class, using that as an excuse to do a deep dive into an area could be really valuable.

I didn't do any particularly ambitious versions of this before I started graduate school. But I recently worked on a survey paper related to interpretability with some other coauthors. And now that I'm on the other side of this project I can say there's nothing else I've ever done that compares to writing a survey paper. Nothing has compared in terms of what I've learned at an object level, how many interesting, critical new perspectives I’ve taken away that are shaping my future work, and how many ideas I have for projects that are just kind of missing from the space. I don't think I've ever had any particularly good research ideas that haven't followed from reading very deeply on the topic I'm interested in, and for me, nothing compares to how strongly doing this has shaped my personal agenda for what I want to work on.

Is there any other advice that you would give your younger self?

I think there are several key pillars that support you in building skills and knowledge as a researcher. I’m thinking of: your network; the projects you work on; what you read; your classes and the mentorship you receive. I think these are five things it’s useful to pay attention to and try to make work for your development as much as possible.

Once you're in grad school, it's really important to have – not a particularly well-formed agenda – but the beginnings of an agenda; a set of topics to really attach yourself to and make a lot of progress on over your tenure as a graduate student. So taking the right classes, talking to the right people working on the right projects, working on a project yourself like a thesis, and reading deeply are all important for this reason too.

What advice do you have about choosing and applying to a grad school programme?

It’s potentially pretty useful to figure out if this is the kind of thing you want to be doing for a number of years. And that means having a good amount of familiarity with the kind of topics you want to work on, and what it's really like to do that type of work. Having a good thesis project under your belt is one of the valuable things for this. I think another very valuable option is doing an independent literature review or research proposal as an undergraduate, as I described earlier.

I think doing an independent literature review is also one of the most valuable things that can prepare you for grad school interviews. It shows the potential and ability to work on projects, because you've done it in the past, provides evidence of your perseverance, and suggests that you’ll have a deep understanding of the area and good takes on what to work on.

In terms of selecting grad programmes to apply to, this is a fairly generic piece of advice, but I think trying to work at particular schools is not as nearly as important as applying to places where you would have a very good personal fit between you and the lab – both with the person you'd be advised by, and the other people that you would work with.

I think when it comes to grad school applications, people tend to underestimate the importance of how much you will interact and collaborate with your fellow advisees. Advisors are usually pretty busy, so the majority of interactions that you're going to have in your lab group are going to be the people you sit next to. So when you search for grad programmes to apply to, I would put very little emphasis on the actual programme itself and much more emphasis on the specific professors inside that programme, and put approximately equal emphasis on the individual people inside of particular labs.

Can people reach out to you?

Yeah, I like to talk to people about specific ideas and papers I’m currently working on, or what I might work on next! I think this applies to many other researchers, especially if they're grad students and not professors who have a million things to do. Just cold emailing people and asking them to Zoom is a good way to get new ideas and get a sense of what it's like to have your boots on the ground doing this type of work. So people should feel free to reach out to me!

Where next?

Keep exploring our other services and content

Our recommended research directions

Explore areas where we think further research could have a particularly positive impact on the world.

Human-aligned artificial intelligence

Artificial intelligence will likely become an increasingly powerful force shaping humanity’s future, so research to align it with human values is vital. Read our research direction profile to learn more.

Mechanistic Interpretability

Mechanistic interpretability is a subfield of AI interpretability that could let us interpret what goals a neural network may have learned.

Apply for coaching

Want to work on this research direction? Apply for coaching to receive personalised guidance.