What Is AI Alignment?

Artificial intelligence is advancing faster than ever before and often reaches impressive heights. The growth of AI-based startups has accelerated 14 times since 2020. AI has really penetrated any and every corner of our lives: from content creation to fintech to healthcare.

However, the problem with AI is that it often doesn’t exactly do what we request of it. It often represents a black box that interprets our queries in its own way. And sometimes it can be quite dangerous.

AI alignment is a new field that studies how to make it so that the created systems would answer the needs and expectations of humans. Its goal is to prevent a future where AI can function contrary to the interests of particular individuals, groups, or society as a whole.

In this post, we will share insights into what AI alignment is and how it works.

The definition of AI alignment

AI alignment is an area of science that studies ways to make AI align with our goals and prevent it from going out of control. In other words, AI alignment studies how to create robust and human-friendly AI.

Open AI, the company that created GPT-4, one of the most advanced artificial intelligence systems in the world, claims to pay close attention to the alignment and compliance of their invention.

“Our alignment research aims to make artificial general intelligence (AGI) aligned with human values and follow human intent. […] Our main goal is to push current alignment ideas as far as possible, and to understand and document precisely how they can succeed or why they will fail. We believe that even without fundamentally new alignment ideas, we can likely build sufficiently aligned AI systems to substantially advance alignment research itself.” ― Open AI

ChatGPT, which is their chatbot built on top of GPT-3.5, is today accessible to the public. It processes millions of requests everyday and learns from it. As the model learns by processing big data, it might with time become more intelligent and autonomous. The chatbot is capable of performing many different tasks: from writing essays to answering questions to generating code. It’s also good at these many tasks, and even though they all relate to natural language processing and speech generation, and most scientists point out that the machine doesn’t actually understand human language, so far it is the closest we’ve got to general artificial intelligence.

Many people have pointed out the drawbacks of the system. For example, the fact that the chatbot often provides answers that contain made-up information and doesn’t alert the user that the information it provides is false.

“Right now, ChatGPT is just a tech demo, a research experiment. Less clear is how it might be used, beyond the dire predictions about what sectors its technology might upend. In almost every case, the AI appeared to possess both knowledge and the means to express it. But when pressed—and the chat interface makes it easy to do so—the bot almost always had to admit that it was just making things up.” — Ian Bogot, The Atlantic

Today it would be premature to worry that the chatbot has a mind of its own and will plot against you to make you fail your English lit essay. However, in the future, if the machine becomes more advanced, we might want to start worrying.

Artificial intelligence is also a vital part of self-driving technology that is used in cars and other vehicles. Complex systems dictate the machine’s behavior on the road, both in standard everyday situations and in critical ones. Right now cars that contain self-driving features such as Tesla or Cadillac require drivers to have their hands on the wheel so that they can prevent a critical situation by taking action. After all, humans are much better at orienting themselves in unpredictable situations than any AI is. But as AI is becoming more and more capable, it might take over more and more responsibilities. What if one day you will decide to crash into a tree because it doesn’t want to drive you to work every morning anymore?

These scenarios might seem a bit sci-fi. A bit exaggerated. But there’s no guarantee that this won’t happen. In order for AI to be safe for humanity, it needs to embody certain values and ethical principles, such as following three rules of robotics: protect, obey, survive.

Stuart Russel, who is a professor at the Centre for Human-Compatible Artificial Intelligence at UC Berkeley has coined the term “value alignment problem”. It refers to the problem of making sure that AI that we create shares the same values as we do.

According to him, there are several goals connected to AI alignment that need to be addressed:

  • Ensure that intelligent systems are safe and can be corrected by humans in case of mistakes.
  • Guarantee that we can ascertain human objectives as intelligent systems become more advanced than humans.
  • Build a more accurate value alignment between AI systems and humans in order to avoid significant downside risks.

Right now, he leads a cross-university project concerned with AI alignment. Similar tasks are explored at Google Brain, an initiative aimed at developing functional and ethical AI systems.

Why is AI alignment important?

AI alignment is an important field because as AI systems become more powerful, they may begin to act in ways that are not aligned with our goals. For example, an AI system that is designed to optimize a particular metric, such as profit or efficiency, may end up causing harm to humans or the environment if it is not also designed to take into account other values, such as safety or sustainability.

For example, in his book Superintelligence, Nick Bostrom, a Swedish philosopher and Oxford professor, explores some of the main issues connected to the emergence of intelligent AI systems that are as intelligent or more intelligent than us and aren’t aligned with our goals. He sustains that imbuing AI with human values is hard because it is hard to formulate them unambiguously. Moreover, there is an inconsistency throughout human society what a “good” or “friendly” AI would constitute.

For example, he writes that if any single government would secretly develop a superintelligent system at one point we all would be screwed. It would embody the values of that particular group. An alternative approach to make the project multicultural and develop it in a world collaboration.

Industries that should start investing in AI alignment right now

There are several areas where AI alignment is particularly important.

Autonomous vehicles

One area that needs to invest more in AI alignment research is in the development of autonomous vehicles. Autonomous vehicles have the potential to greatly improve safety and efficiency on our roads, but they also raise important ethical questions. For example, how should an autonomous vehicle make decisions in the case of an unavoidable accident, where both options may result in harm to humans? AI alignment is the field that aims at targeting such controversial issues.


Another area where AI alignment is important is in the development of AI systems for healthcare. AI systems can help to improve diagnosis and treatment, but they must be designed in a way that takes into account the unique ethical considerations of the healthcare field, such as patient privacy and autonomy.


AI alignment is also important in the development of AI systems for national security and defense. AI systems have the potential to greatly enhance our military capabilities, but they must be designed in a way that aligns with our values and goals as a society, such as avoiding civilian casualties and preventing unnecessary escalation of conflict.


AI alignment is an important field that seeks to ensure that AI systems are built on principles that protect humans and ensure their safety. As we continue to develop increasingly advanced AI systems, it’s important that we also consider the ethical implications of these systems and work to ensure that they align with our values as a society.

Banner that links to Serokell Shop. You can buy stylish FP T-shirts there!
More from Serokell
What are convolutional neural networks?What are convolutional neural networks?
What is self-supervised learning? And why is it so important for the future of AI?What is self-supervised learning? And why is it so important for the future of AI?
Classification algorithms; classification in machine learningClassification algorithms; classification in machine learning