Governance of artificial intelligence
How should we govern the development and deployment of AI?

Interested in working on this research direction? Join our newsletters and coaching waitlist


Want more context on this profile? Explore a map of all our profiles →​

This profile is tailored towards students studying communications, media and marketing, computer science, economics, history, law, political science, psychology and cognitive sciences, philosophy and ethics and sociology. We expect there to be valuable open research questions that could be pursued by students in other disciplines.

Why is this a pressing problem?

Artificial intelligence is becoming increasingly powerful. AI systems can solve college-level maths problems, beat champion human players at multiple games and generate high quality images. They can be used in many ways that could help humanity, for example by identifying cases of human trafficking, predicting earthquakes, helping with medical diagnosis and speeding up scientific discovery.

The AI systems described above are all ‘narrow;’ they are powerful in specific domains, but they can’t do most tasks that humans can. Nonetheless, narrow AI systems present serious risks as well as benefits. They can be designed to cause enormous harm – lethal autonomous weapons are one example – or they can be intentionally misused or have harmful unintended effects, for example due to algorithmic bias.

It seems likely that at some point, ‘transformative AI’ will be developed. This phrase refers to AI that ‘precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution.’ One way this could happen is if researchers develop ‘artificial general intelligence;’ AI that is at least as intelligent as humans across all domains. AGI could radically transform the world for the better and help tackle humanity’s most important problems. However, it could also do enormous harm, even threatening our survival, if it doesn’t act in alignment with human interests.

Work on making sure transformative AI is beneficial to humanity seems very pressing. Multiple predictions (see here, here and here) suggest that transformative AI is likely within the next few decades, if not sooner. A majority of experts surveyed in 2022 believed there was at least a 5% chance of AI leading to extinction or similarly bad outcomes, while a near majority (48%) believed there was at least a 10% chance. Working on preventing these outcomes also seems very neglected – 80,000 Hours estimates that 1,000 times more money is being spent on speeding up the development of transformative AI compared to the money spent on reducing its risks.

AI governance research is one way the development and use of AI could be guided towards more beneficial outcomes. This is research that aims to understand and develop ‘local and global norms, policies, laws, processes, politics and institutions (not just governments) that will affect social outcomes from the development and deployment of AI systems.’ It can include high level questions such as how soon AGI will be developed, how it will affect the geopolitical landscape, and what ideal AI governance would look like. It can also include researching the possible impacts of AI on specific areas such as employment, wealth equality and cybersecurity, and developing specific solutions – such as lab policies to incentivise responsible research practices.

Watch the conference talk below in which Alan Dafoe discusses the space of AI governance for more information.

Explore existing research and get more context

 

To get more context about this research direction, we would recommend starting by reading the resources linked on this page (e.g. Useful concepts and framings, Mapping the space, Reading lists and research papers, etc…) and if interested, applying to some online courses on the topic (e.g. AI Safety Fundamentals: Governance track) where you can learn more about this space in a group with other like-minded people.  

Below we introduce some of the useful concepts and framings to help you get up to speed with this research direction. Note that this content will be more relevant to the AI paradigm that currently seems to get the most traction (deep learning).

One useful framing is Racing through a minefield: the AI deployment problem (2022) proposed by Holden Karnofsky, discussing various types of actions/interventions we might want to make to ensure safe development and deployment of AI.

There are also several useful concepts that might help you grapes the literature in this field better:

AI triad: For many policy and governance use cases, AI can be thought of as a combination of data, compute and algorithmic progress. The current frontier Large Language Models (LLMs) have increased their capabilities a lot by scaling the amount of compute, data and model size. 

Frontier/Advanced AI: there is no united definition for this, but we can possibly think of frontier AI in terms of the model output: capabilities. For example, specific measured capabilities comparable to the best existing models (like various SOTA tests), or the extent to which the model is able to match or exceed people’s performance in most jobs. Alternatively, we can define the Frontier AI in terms of its input: the amount of data and compute that is used for training the model. For example, US executive order from 2023 focuses specifically on models that were “trained using a quantity of computing power greater than 1026 integer or floating-point operations, or using primarily biological sequence data and using a quantity of computing power greater than 1023 integer or floating-point operations” and uses the phrase “dual-use foundation model” to mean “AI model that is trained on broad data; generally uses self-supervision; contains at least tens of billions of parameters; is applicable across a wide range of contexts; and that exhibits, or could be easily modified to exhibit, high levels of performance at tasks that pose a serious risk to security, national economic security, national public health or safety, or any combination of those matters..”, similar to Frontier AI Regulation: Managing Emerging Risks to Public Safety (2023, see Appendix A for further discussion on definitions).

Types of extreme risks: Many people distinguish two broad categories of risks: Misuse and Misalignment. Misuse refers to models being used by malicious actors (individuals, groups or state actors) to cause harm, such as by creating bioweapons, cyberattacks, misinformation spreading, etc… Misalignment refers to models acting against the interests and intentions of their designers and users (usually after escaping human control), which can happen for a variety of reasons, including failure to encode human values correctly into the system, specification gaming, evolution of goals, distributional shift, instrumental tendencies for power seeking, etc..

Some authors add more categories of risks such as Structural risks, which depend on how the AI system interacts with larger social, political, and economic forces in society (Zwetsloot and Dafoe, 2019); Risks from models incompetently performing desired tasks (Raji et al., 2022a). Other authors distinguish categories of AI race, where competition pressures nations and corporations to rush the development of AIs and cede control to AI systems; and Organizational risks, where organizations developing and deploying advanced AIs could suffer catastrophic accidents, such as AIs being accidentally leaked to the public or stolen by malicious actors, failing to invest in safety research, lack understanding of how to reliably improve AI safety faster than general AI capabilities, or suppressing internal concerns about AI risks. Yet other authors attempted an “exhaustive taxonomy based on accountability: whose actions led to the risk, were they unified, and were they deliberate? Such a taxonomy may be helpful because it is closely tied to the important questions of where to look for emerging risks and what kinds of policy interventions might be effective. This taxonomy in particular surfaces risks arising from unanticipated interactions of many AI systems, as well as risks from deliberate misuse, for which combined technical and policy solutions are needed.” There are also many less extreme but still important risks such as risks to equity and civil rights, privacy, or economic competition and power concentration. 

Dangerous capabilities: as large general-purpose models scale, they gain emerging capabilities which we are not able to predict. Some of these might be dangerous and might significantly increase some of the risks mentioned above (especially misuse and misalignment). Examples of such capabilities are cyber offence, deception, self-proliferation, long-horizon planning, biohazard construction, collusion, etc… Therefore, it is useful to map and define these dangerous capabilities, create a way to measure them and regulatory procedures to ensure that models showing those capabilities are not deployed. For more context and a list of dangerous capabilities see Model evaluation for extreme risks (2023).

      

Different intervention points: There are many points of intervention that could be targeted by useful policies, ranging mapping and influencing the ecosystem as a whole (e.g. forecasting future developments and their speed, regulating the availability of chips and other resources for building models), to the point before the model is trained (e.g. licencing of models exceeding some size; guidelines for the training), to the training itself (e.g. mapping and auditing for dangerous capabilities), to the deployment phase (e.g. regulating access and ongoing monitoring for dangerous capabilities and model misuse). See many ideas for potential interventions in this paper: Frontier AI Regulation: Managing Emerging Risks to Public Safety (2023) and some others in 12 tentative ideas for US AI policy (2023) 

Levels of coordination: Some of the proposed policies can be implemented by letting various labs self-regulate. A more coordinated approach that also allows for some new policies to come into existence is regulating on the level of the whole country. However, many people think that some form of international cooperation on the development and deployment of AI might be the safer option (and sometimes claimed necessary to reduce some types of risks, e.g. removing the arms race between states). There have been a couple of ideas on International Institutions for Advanced AI (and here). On the applied side, there has been open letter advocating for an International AI treaty and some first signs of international cooperation seem to have started materializing via the Bletchley Declaration as well as UN’s high-level advisory board on AI.

Some important concepts to understand the technical landscape: backpropagation, neural networks, gradient descent, model size, loss function (e.g. next-token prediction), architectures, transformers, etc… Note that it is often important to understand at least the basics of the technology to be able to design good regulatory mechanisms.

Various people tried to map the space of AI governance and discussion about AI safety. These might be useful to read to get a more comprehensive picture of the field and positions some people take towards this issue.

 

Overview of types of research and work being done: 

  • The longtermist AI governance landscape: a basic overview (2022): Useful contributions to AI governance could be made in various levels of applicability, ranging from very foundational and strategy research to much more applied policy advocacy and implementation. You can also contribute on the meta-level by field-building. Read more including examples in the linked post.
  • Career Resources on AI Strategy Research by AI Safety Fundamentals (2022) lists various types of research one can do to contribute to AI governance (especially strategy) research which include monitoring the current state of affairs, examining history, examining the feasibility, technical forecasting, Assessing risks, etc… Read more including examples in the linked post.

 

Overview of arguments people hold towards the issue:

  • AI Risk Discussions: interactive summary of interviews with 97 AI researchers done by Dr. Vael Gates
  • This online curriculum on AI governance is a great place to start if you want to learn more about this area and find relevant papers.

Other useful reading lists:

Find a thesis topic

If you’re interested in working on this research direction, below are some ideas on what would be valuable to explore further. If you want help refining your research ideas, apply for our coaching!

  • “I think it would be interesting to try to develop a list of influential/common memes about AI, which are prevalent in different communities. (Examples: “Data is the new oil,” in certain policy communities, and “paperclippers,” in the EA x-risk community.) Then I think it’d also be interesting to ask whether any of these memes might be especially misleading or detrimental. This project could help people to better understand the worldviews of different communities better – and, I think, more importantly, to help people understand what kinds of communication/meme-pushing around AI governance might be most useful.” (Ben Garfinkel  – Some AI Governance Research Ideas)

80,000 Hours writes that ‘there are few AI policy practitioners with a technical AI background, leaving this perspective neglected.’

Mapping the technical possibilities of AI and assessing AI progress could be valuable – for more discussion and more specific questions see Alan Dafoe’s research agenda.

Other questions could come from this list from
80,000 Hours and these agendas from AI Impacts.

  • “If we assume that AI software is similar to other software, what can we infer from observing contemporary software development? [concrete] For instance, is progress in software performance generally smooth or jumpy? What is the distribution? What are typical degrees of concentration among developers? What are typical modes of competition? How far ahead does the leading team tend to be to their competitors? How often does the lead change? How much does a lead in a subsystem produce a lead overall? How much do non-software factors influence who has the lead? How likely is a large player like Google—with its pre-existing infrastructure—to be the frontrunner in a random new area that they decide to compete in?” (AI Impacts)
  • “How likely is it that AI systems will make it possible to cheaply, efficiently and reliably find vulnerabilities in computer systems with more skill than humans? What kinds of indicators might provide updates on this front? What measures could state or non-state actors take to prevent this coming about, and/or mitigate potential negative effects?” (80,000 Hours)
  • “Compute is a very promising node for AI governance. Why? Powerful AI systems in the near term are likely to need massive amounts of compute, especially if the scaling hypothesis proves correct. Furthermore, compute seems more easily governable than other inputs to AI systems…Should governments set up such funds? Seeing as they are likely to be set up, how should they be designed?” (Markus Anderljung  – Some AI Governance Research Ideas)

There are a lot of questions you can draw from Alan Dafoe’s AI governance research agenda. For example, you could explore the impact of exacerbated inequality and job displacement on trends such as liberalism, democracy, and globalisation; to what extent countries are able to internalise the returns on their AI investments, or whether talent inevitably gravitates towards and benefits the existing leaders in AI (e.g. Silicon Valley).

Other potential questions include:

Another set of questions could be found in Economics and AI Risk: Research Agenda and Overview by Charlotte Siegmann (2023) featuring questions such as:

  • “When exactly should society ban certain types of AI development or deployment, at least temporarily? When should it be open-sourced or closed?”
  • “What is the ideal organizational structure for developing Transformative AI?”
  • “Previous dual-use technology market failures and solutions: What can be learned from them, and how were they fixed?”
 

A set of open questions and introduction to the field can also be found in The Economics of Artificial Intelligence: An Agenda book.

 

Case studies on the development and governance of other transformative technologies can act as partial analogies for the development of AGI.

Possible questions include:

  • “One set of possibilities for avoiding an AI arms race is the use of third party standards, verification, enforcement, and control…What are the prospects that great powers would give up sufficient power to a global inspection agency or governing body? What possible scenarios, agreements, tools, or actions could make that more plausible? What do we know about how to build government that is robust against sliding into totalitarianism and other malignant forms? What can we learn from similar historical episodes, such as the failure of the Acheson-Lilienthal Report and Baruch Plan, the success of arms control efforts that led towards the 1972 Anti-Ballistic Missile (ABM) Treaty,  and episodes of attempted state formation?” (Allan Dafoe’s research agenda)
  • “This project explores the impact of US nuclear strategists on nuclear strategy in the early Cold War. What types of experts provided advice on US nuclear strategy? How and in what ways did they affect state policy making on nuclear weapons from 1945 through to the end of the 1950s (and possibly beyond)? How could they have had a larger impact?” (Waqar Zaidi – Some AI Governance Research Ideas)
  • “History of existential risk concerns around nanotechnology: How did the community of people worried about nanotech go about communicating this risk, trying to address it, and so on? Are there any obvious mistakes that the AI risk community ought to learn from?; How common was it for people in the futurist community to believe extinction from nanotech was a major near-term risk? If it was common, what led them to believe this? Was the belief reasonable given the available evidence? If not, is it possible that the modern futurist community has made some similar mistakes when thinking about AI?” (Ben Garfinkel – Some AI Governance Research Ideas)

Examples of research exploring historical events to inform AI governance:

Possible questions include:

  • “Will EU regulation diffuse globally via the so-called “Brussels effect” (Bradford, 2020), or will there be a global race to the bottom with regards to minimum safety standards (Askell et al., 2019; Smuha, 2019)?” (Legal Priorities research agenda)
  • “How should the scope of AI safety regulations be defined (Schuett, 2019)? Do we need new regulatory instruments (Clark & Hadfield, 2018)? How can compliance be monitored and enforced? Is there a need for stronger forms of supervision (Bostrom, 2019; Garfinkel, 2018)? If so, would they violate civil rights and liberties?” (Legal Priorities research agenda)
  • “Is there a need to legally restrict certain types of scientific knowledge to prevent malevolent actors from gaining control over potentially dangerous AI technologies (Bostrom, 2017; Ovadya & Whittlestone, 2019; Shevlane & Dafoe, 2020; Whittlestone & Ovadya, 2020)? If so, how could this be done most effectively? To what extent is restricting scientific knowledge consistent with the relevant provisions of constitutional law?” (Legal Priorities research agenda)

See the Legal Priorities research agenda for more context and further questions. A survey of research questions for robust and beneficial AI from the Future of Life Institute also contains many questions law students could explore.

It may be valuable to explore topics such as “How can we avoid a dangerous arms race to develop powerful AI systems? How can the benefits of advanced AI systems be widely distributed? How open should AI research be?”

There are a lot of questions you can draw from Allan Dafoe’s AI governance research agenda. You could explore the potential impact of exacerbated inequality and job displacement on trends such as liberalism, democracy, and globalisation; who should be leading on AI governance; how substantial an advantage China has – as compared with other advanced developed (mostly liberal democratic) countries – in its ability to channel its large economy, collect and share citizen data, and exclude competitors; and how a dangerous arms race to develop AI could be prevented or ended.

Other potential questions include:

  • “Taking into account likely types and applications of autonomous weapons, what are the likely effects on global peace and security?” (80,000 Hours)
  • “What plausible paths exist towards limiting or halting the development and/or deployment of autonomous weapons? Is limiting development desirable on the whole? Does it carry too much risk of pushing development underground or toward less socially-responsible parties?” (80,000 Hours)
  • “How might AI alter power dynamics among relevant actors in the international arena? (great and rising powers, developed countries, developing countries, corporations, international organizations, militant groups, other non-state actors, decentralized networks and movements, individuals, and others).” (Center for a New American Security)

These are some of the questions suggested in the Effective Altruism Psychology Lab’s research agenda. They suggest you reach out if you’re interested in pursuing research on any of these topics.

Possible questions include:

For more ideas, see:

Possible questions include:

  • “How do AI researchers think about risks from advanced AGI? How concerned are AI researchers about risks from uncontrolled AGI, and why? How do AI researchers’ views differ from those of other populations? Are they more or less concerned?” (Effective Altruism Psychology Lab)
  • “How can spread norms in favour of careful, robust testing and other safety measures in machine learning? What can we learn from other engineering disciplines with strict standards, such as aerospace engineering?” (Technical AI Safety Research outside of AI)
  • “How can we best increase communication and coordination within the AI safety community? What are the major constraints that safety faces on sharing information (in particular ones which other fields don’t face), and how can we overcome them?” (Technical AI Safety Research outside of AI)
  • “What capabilities might AI systems one day have, and what would be some possible social consequences? For example, what would happen if an AI system could generate language and images designed to persuade particular sets of people of particular propositions?” (80,000 Hours)

Further resources

Podcasts

How can AI governance research have a positive impact?

If you’re interested in a programme that isn’t currently accepting applications, you can sign up for our newsletter to hear when it opens:

This database of EA-relevant US policy fellowships may be useful for finding further opportunities to gain experience.

Our funding database can help you find potential sources of funding if you’re a PhD student interested in this research direction.

Sign up for our Effective Thesis newsletter to hear about opportunities such as funding, internships and research roles.

Other newsletters that are useful for keeping up with AI governance and advancements in AI are:

Contributors

This profile was last updated 8/11/2023. Thanks to Rick Korzekwa, Jenny Xiao and Lennart Heim for helpful feedback on the previous version. All mistakes remain our own. Learn more about how we create our profiles.

Subscribe to the Topic Discovery Digest

Subscribe to our Topic Discovery Digest to find thesis topics, tools and resources that can help you significantly improve the world.

Where next?

Keep exploring our other services and content
We use cookies to enhance your experience. By continuing to visit this site you agree to our use of cookies. More info