Human-aligned artificial intelligence
Why is this a pressing problem?
Artificial intelligence is becoming increasingly powerful. AI systems can solve college-level maths problems, beat champion human players at multiple games and generate high quality images. They can be used in many ways that could help humanity, for example by identifying cases of human trafficking, predicting earthquakes, helping with medical diagnosis and speeding up scientific discovery.
The AI systems described above are all ‘narrow;’ they are powerful in specific domains, but they can’t do most tasks that humans can. Nonetheless, narrow AI systems present serious risks as well as benefits. They can be designed to cause enormous harm – lethal autonomous weapons are one example – or they can be intentionally misused or have harmful unintended effects, for example due to algorithmic bias.
AI is also quickly becoming more general. One example is large language models or LLMs. These are AI systems that can do a wide range of language tasks, including unexpected things, like writing code or translation. You could try using ChatGPT to get a sense of current large language model capabilities.
It seems likely that at some point, ‘transformative AI’ will be developed. This phrase refers to AI that ‘precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution.’ One way this could happen is if researchers develop ‘artificial general intelligence;’ AI that is at least as capable as humans across most domains. AGI could radically transform the world for the better and help tackle humanity’s most important problems. However, it could also do enormous harm, even threatening our survival, if it doesn’t act in alignment with human interests.
Work on making sure transformative AI is beneficial to humanity seems very pressing. Multiple predictions (see here, here and here) suggest that transformative AI is likely within the next few decades, if not sooner. A majority of experts surveyed in 2022 believed there was at least a 5% chance of AI leading to extinction or similarly bad outcomes, while a near majority (48%) believed there was at least a 10% chance.
Working on preventing these outcomes also seems very neglected – 80,000 Hours estimates that 1,000 times more money is being spent on speeding up the development of transformative AI compared to the money spent on reducing its risks. Technical research to ensure AI systems are aligned with human values and benefit humanity therefore seems highly important.
-
Research papers
An annotated biography of recommended reading materials from CHAI.
This interactive research map from the Future of Life Institute, setting out the technical research threads that could help build safe AI.
Jacob Steinhardt’s AI Alignment research overview
Organisations
DeepMind, a research lab developing artificial general intelligence. The org as a whole focuses on building more capable systems, but has teams focused on AI safety.
MIRI, a non-profit studying the mathematical underpinnings of artificial intelligence.
Redwood Research, an organisation conducting applied AI alignment research.
OpenAI, an AI research and deployment company developing artificial general intelligence. They have teams focused on AI safety; see the discussion of safety teams at OpenAI in this podcast episode.
Anthropic, an AI safety company focused on empirical research.
Alignment Research Center, an organisation attempting to produce alignment strategies that could be adopted in industry today.
OpenAI. Similar to DeepMind, it’s a research lab developing artificial general intelligence, but has teams focused on AI Safety
Cooperative AI, an organisation supporting research that will improve the cooperative intelligence of advanced AI.
The Center for AI Safety, a nonprofit doing technical research and field-building.
Center on Long-term Risk, a research institute aiming to address worst-case risks from the development and deployment of advanced AI systems.
Academic research groups
Some academic research groups working on technical AI safety research are:
The Center for Human-Compatible Artificial Intelligence, a research group based at UC Berkeley and led by Stuart Russell.
Jacob Steinhardt’s research group at UC Berkeley.
David Krueger’s research group at the University of Cambridge.
The Algorithmic Alignment Group, led by Dylan Hadfield-Menell at MIT.
The Future of Humanity Institute, a multidisciplinary research institute at the University of Oxford.
The Foundations of Cooperative AI Lab at Carnegie Mellon University.
The Alignment of Complex Systems research group at Charles University, Prague.
Stanford Center for AI Safety, led by Clark Barrett.
-
Learn more about the importance of this research direction:
The Case for Taking AI Seriously as a Threat to Humanity – Vox
AI Safety from First Principles – Richard Ngo
Why AI alignment could be hard with modern deep learning – Ajeya Cotra
The alignment problem from a deep learning perspective – Richard Ngo, Lawrence Chan & Sören Mindermann
AI could defeat all of us combined – Holden Karnofsky gives an argument for why AI ‘only’ as intelligent as humans could pose an existential risk.
Preventing an AI-related catastrophe – 80,000 Hours
What could an AI caused catastrophe actually look like? – 80000 Hours
Explore the links below for overviews of research in this area:
An annotated biography of recommended reading materials from CHAI.
This interactive research map from the Future of Life Institute, setting out the technical research threads that could help build safe AI.
Jacob Steinhardt’s AI Alignment research overview
Online courses
AGI Safety Fundamentals online curriculum on technical AI alignment – Richard Ngo
ML Safety Course – Dan Hendrycks at the Center for AI Safety
Get 1:1 advice on working on this research direction
Apply for our coaching and we can connect you with researchers already working in this space, who can help you refine your research ideas.
Apply for 80000 Hours coaching
Apply for AI Safety Support career coaching.
Research fellowships, internships and programmes
If you’re interested in a programme that isn’t currently accepting applications, you can sign up for our newsletter to hear when it opens:
Summer research schools
The CERI AI Fundamentals programme (technical track) is aimed at helping participants with a maths, CS or other mathematical science background gain an introduction to AI alignment research.
The CHERI summer research program is for students who want to work on the mitigation of global catastrophic risks.
The SERI Machine Learning Alignment Theory Scholars Program offers an introduction to the field of AI alignment and networking opportunities.
The Center on Long-Term Risk’s summer fellowship is for researchers who want to work on research questions relevant to reducing suffering in the long-term future.
The PIBBSS summer research fellowship is for researchers studying complex and intelligent behaviour in natural and social systems, who want to apply their expertise to AI alignment and governance.
The Human-aligned AI Summer School (EA Prague) is a series of discussions, workshops and talks aimed at current and aspiring researchers working in ML/AI and other disciplines who want to apply their expertise to AI alignment.
Other fellowships, internships and programmes
AGI Safety Fundamentals is held at the University of Cambridge and virtually, and is most useful to those with technical backgrounds who are interested in working on beneficial AI.
The CAIS philosophy fellowship is a research fellowship aimed at clarifying risks from advanced AI systems, for philosophy PhD students or graduates.
The CHAI (Center for Human-Compatible Artificial Intelligence) research fellowship is for researchers who have or are about to obtain a PhD in computer science, statistics, mathematics or theoretical economics.
The CHAI internship, during which aspiring researchers work on a project under the supervision of a mentor.
AI Safety Camp, which connects participants with a mentor with whom they collaborate on open AI alignment problems during intensive co-working sprints.
AI Risk for Computer Scientists, a four-day series of workshops run by MIRI.
The OpenAI Residency, a pathway to a full-time role at OpenAI for researchers and engineers who don’t currently focus on artificial intelligence.
Refine, an incubator to help independent researchers build original research agendas related to AI safety.
Lists of resources for getting started
Resources I send to AI researchers about AI safety – Vael Gates
AI safety starter pack from–Marius Hobbhahn
Victoria Krakovna’s list of AI alignment resources
AI alignment career advice
Career review of an ML PhD – 80000 Hours
General advice for transitioning into Theoretical AI Safety – EA Forum
Getting into grad school
Applying for Grad School: Q&A Panel from AI Safety Support
Getting into CS grad school in the USA – Mark Corner
Doing independent research
Alignment research exercises – Richard Ngo
Other advice
How I think students should orient to AI safety – Buck Shlegeris
7 traps that (we think) new alignment researchers often fall into – Akash
Interviews with AI safety researchers from 80000 Hours
ML engineers Catherine Olsson and Daniel Ziegler on fast paths to becoming a machine learning alignment researcher.
Dario Amodei on how to become an AI researcher.
Miles Brundage on how to become an AI strategist.
Jan Leike on how to become a machine learning alignment researcher.
Find supervisors, courses and funding
If you’re considering a PhD, as well as looking at the academic research groups we list above, see the computer science PhD programs listed here.
Find community
Join the Future of Life Institute’s AI Existential Safety Community to apply for mini-grants, connect with other researchers and hear about conferences and other events.
This AI Safety reading group meets fortnightly online.
The AI Safety Accountability Programme is a Slack group for people who are interested in working on AI safety in the future and want to stay motivated while pursuing their goals.
Newsletters
Alignment Newsletter – Rohin Shah
ChinAI newsletter – Jeff Ding
ML Safety Newsletter – Dan HendrycksImport, AI – Jack Clark
Contributors: This profile was last updated 24/01/2023. Thanks to Tomáš Gavenčiak for originally writing this profile. Thanks to Jan Kirchner, Neel Nanda, Rohin Shah, Martin Soto and Dan Hendrycks for helpful feedback on parts of this profile. All mistakes remain our own. Learn more about how we create our profiles.