Human-aligned artificial intelligence

How can we ensure artificial intelligence systems act in accordance with human values?

This profile is tailored towards students studying computer science, maths, philosophy and ethics, and psychology and cognitive sciences, however we expect there to be valuable open research questions that could be pursued by students in other disciplines.

Why is this a pressing problem?

Artificial intelligence is becoming increasingly powerful. AI systems can solve college-level maths problems, beat champion human players at multiple games and generate high quality images. They can be used in many ways that could help humanity, for example by identifying cases of human trafficking, predicting earthquakes, helping with medical diagnosis and speeding up scientific discovery.

The AI systems described above are all ‘narrow;’ they are powerful in specific domains, but they can’t do most tasks that humans can. Nonetheless, narrow AI systems present serious risks as well as benefits. They can be designed to cause enormous harm – lethal autonomous weapons are one example – or they can be intentionally misused or have harmful unintended effects, for example due to algorithmic bias.

AI is also quickly becoming more general. One example is large language models or LLMs. These are AI systems that can do a wide range of language tasks, including unexpected things, like writing code or translation. You could try using ChatGPT to get a sense of current large language model capabilities.

It seems likely that at some point, ‘transformative AI’ will be developed. This phrase refers to AI that ‘precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution.’ One way this could happen is if researchers develop ‘artificial general intelligence;’ AI that is at least as capable as humans across most domains. AGI could radically transform the world for the better and help tackle humanity’s most important problems. However, it could also do enormous harm, even threatening our survival, if it doesn’t act in alignment with human interests.

Work on making sure transformative AI is beneficial to humanity seems very pressing. Multiple predictions (see here, here and here) suggest that transformative AI is likely within the next few decades, if not sooner. A majority of experts surveyed in 2022 believed there was at least a 5% chance of AI leading to extinction or similarly bad outcomes, while a near majority (48%) believed there was at least a 10% chance.

Working on preventing these outcomes also seems very neglected – 80,000 Hours estimates that 1,000 times more money is being spent on speeding up the development of transformative AI compared to the money spent on reducing its risks. Technical research to ensure AI systems are aligned with human values and benefit humanity therefore seems highly important.

Contributors: This profile was last updated 24/01/2023. Thanks to Tomáš Gavenčiak for originally writing this profile. Thanks to Jan Kirchner, Neel Nanda, Rohin Shah, Martin Soto and Dan Hendrycks for helpful feedback on parts of this profile. All mistakes remain our own. Learn more about how we create our profiles.

Previous
Previous

Prioritisation research

Next
Next

Risks from volcanic eruptions