Superalignment and Timeline of Broken Promises on The Way to Superintelligence

Explore the concept of Superalignment and uncover the hidden truth about AGI and ASI. Dive into the ethical challenges, secret experiments, and the pressing need for transparency in AI development.

May 24, 2024

Hello, my fellow enthusiasts, and welcome back to another AI episode on Tech Trendsetters, where we discuss technology, science, and the future trajectory of our world. These AI-related episodes have already become a great compilation of interesting news in the AI community, so I simply want to continue the trend. Today, I want to talk a bit more about SuperAlignment, why it's important, Artificial General Intelligence (AGI) and Artificial SuperIntelligence (ASI), and all the latest news we have in this area of human progress. Join me, and let's get started!

AGI is Already Here

The hidden truth – AGI is already here, just this truth rarely discussed openly. If you're not naive, it's easy to see this. Many experiments are likely to happen at Google and OpenAI, right now, as we speak, serving as the foundation for AGI and super-alignment efforts.

The absence of public information on such experiments suggests they're conducted under non-disclosure agreements. Surely, these experiments aren't limited to models with developer-imposed restrictions, but also include models free from any limitations.

Millions of users have observed how each new model limitation affects the quality and apparent "intelligence" of large language models, even if such a conclusion isn't easily quantifiable. Logically, a model without such "artificial barriers" should be "smarter" than its restricted counterparts.

So why exactly do I think that AGI is here? It's not a straightforward question to answer, but I will give you my thoughts, further elaborating on each point:

The Turing test is no longer a valid measure of progress, as GPT-4 has already surpassed it. Move yourself back 2 years ago when you had no idea what GPTs or LLMs were – you wouldn't believe your eyes that you just talked to a computer.
Existing tests only measure isolated aspects of AI. They lack the ability to assess the full picture and are mostly evaluated on synthetic benchmarks, focusing on one specific area. Nevertheless, in many of these aspects, current large language models already surpass human capabilities, like we highlighted it in one of the previous episodes:
Essential AI Reports for Business Leaders: Stanford AI Index Annual Report 2024
Dmitry K
·
May 13, 2024
Read full story
Now many former OpenAI employees hint that we're close to AGI, with a quick potential to Artificial SuperIntelligence (ASI) on the horizon. However, ASI's capabilities are clearly incomparable to today's AI, and we currently lack a reliable way to measure this difference.

The covert nature of AGI development raises important questions about transparency and ethical considerations. As we approach the future with AGI and ASI (most probably), we must demand openness and responsible practices to ensure these powerful technologies benefit humanity as a whole.

Now as we proceed even further I just wanted to give a quick explanation on SuperAlignment, Artificial General Intelligence (AGI) and Artificial SuperIntelligence (ASI). You can read much more about the AGI concepts we discussed in one of our previous episodes:

What is Artificial General Intelligence, or AGI? And How Close Are We to Singularity?

Dmitry K

February 23, 2024

Read full story

What is SuperAlignment?

Great power comes great responsibility.

This quote above is stated on OpenAI blog page – “Artificial intelligence has been advancing at a breakneck pace, and the potential for incredibly powerful, "superintelligent" AI is on the horizon.”

Here comes the concept of “Superalignment”. Without super-alignment, a super-intelligent AI could easily misinterpret our instructions or develop goals that conflict with ours. This could lead to unintended consequences, ranging from minor inconveniences to catastrophic outcomes. Superalignment is essential for ensuring that AI remains a beneficial tool for humanity.

You could only imagine what can do the AI that's smarter than any human, having the persuasion of a master of manipulations and capable of solving problems we can't even imagine. So, simply speaking superalignment is like teaching a super-powered robot to always act in our best interests, in interests of humanity. Why humanity? Because at the point AGI arrives, another unique species on this planet will be born, and we somehow need to learn to live with it. Otherwise, a “hunger games”-like scenario could start.

What is Artificial SuperIntelligent?

The truth? Nobody knows. "Superintelligence" is a purely hypothetical term. A formal definition might sound like this:

Artificial superintelligence (ASI) is a hypothetical software-based artificial intelligence (AI) system with an intellectual scope that goes beyond human intelligence.

Fundamentally, this superintelligent AI would possess advanced cognitive functions and highly developed thinking skills, far exceeding those of any human, a system that significantly surpasses human across virtually all domains of interest, including scientific and artistic creativity, general wisdom, and problem-solving skills.

Ultimately, if such being would exist – she could think, act and plan thousands of times faster than humans. This is a similar gap to that between humans and plants, with the latter "thinking" thousands of times slower than us. Because of this potential disparity, superintelligent beings might perceive humans as slow as we perceive plants. Consequently, they could treat humans in the same way we treat plants.

In this scenario, humans would have no more control over our destiny than the flowers in a garden.

A Timeline of Broken Promises

All started optimistically in July 2023.

In July 2023, Ilya Sutskever and fellow OpenAI scientist Jan Leike set up the superalignment team to address those challenges. Company promised to dedicate 20% of the compute power to this effort. “I’m doing it for my own self-interest,” Sutskever says. “It’s obviously important that any superintelligence anyone builds does not go rogue. Obviously.”

Less than a year later, everything started to fall apart, in May 2024.

In May 2024, both Ilya Sutskever and Jan Leike left OpenAI. Leike, who co-led the Superalignment group, cited concerns that the company wasn't dedicating enough resources to ensure the safe development of AGI and preventing negative outcomes. By negative outcomes he refers "human impotence” or even “human extinction."
Several employees from OpenAI's Futures/Governance team, tasked with exploring policies for managing superintelligence (ASI), also departed. This exodus suggests deeper internal disagreements about OpenAI's priorities and its commitment to addressing the risks associated with increasingly powerful AI systems.
In reality, the higher the position of those who left OpenAI due to a non-disclosure agreement, the less they say about their reasons for leaving.
- Top management (Sutskever) prefers to remain silent.
- Middle management (Leike) discreetly complains that the priority of Superalignment tasks in the company has dropped significantly.
- And only ordinary employees openly call a spade a spade.. )

The part above, where the only accessible information we have is from ordinary employees – that's where things get interesting. Due to non-disclosure agreements, virtually nobody has any insights on the developments. That's why everyone questions things. Daniel Kokotajlo (former OpenAI Futures/Governance team) writes his notes on AGI and the future:

He suggests that AGI could arrive sooner than we think – potentially within the next year. Whoever controls AGI could rapidly achieve ASI, granting them godlike powers and an insurmountable advantage over those who don't.
The craziest part that he emphasises is that even people working on AI do not fully understand the exact process of how this exact AI works. We can't be certain of an AI's true intentions or whether it has genuinely internalised human ethics. If a training run unexpectedly produces a superintelligent AI, we could face a rogue ASI with unpredictable consequences.
There is also a skepticism about amount of resources going to superalignment. Ongoing race to AGI, driven by multiple megacorporations makes this practically impossible.

Amidst the internal events at OpenAI, it's easy to forget a significant event from just a year earlier. In 2023, hundreds of influential figures signed a landmark agreement on AI safety. This agreement declared:

Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.

This was an attempt to bring more attention to the problem. The result? You might have already guessed that – nothing has changed.

The recent article, published in Science “Managing extreme AI risks amid rapid progress”, echoes these concerns. Again, authors acknowledge a breakneck race towards increasingly powerful AI, with safety measures lagging far behind. Investment in AI capabilities is skyrocketing, yet a paltry 1-3% of AI research is dedicated to safety. It's a classic case of putting the cart before the horse, and the consequences could be dire.

The article's authors, a group of AI experts, lay out a comprehensive plan for tackling this issue. They call for a significant reallocation of resources towards AI safety research and the development of robust governance mechanisms. While this plan seems sensible, I can't help but feel a sense of déjà vu.

The Unspoken Truths Behind AGI Development

Sorry, but I have to repeat myself once again – you have to be a very naive person not to understand the obvious. All corporations conduct dozens, if not hundreds, of experiments, because... they are at the starting (or final?) point for AGI development. All these experiments are kept secret. There is no information about such experiments in the open press and it is naive to believe that everyone is busy only with the product development of chatbots.

From the numerous proofs of regular users of LLMs, it is already known that almost every new limitation of the model introduced by the developers affects the level of its answers and, in general, its general intellectual level. It is quite logical to assume that a model without “artificial barriers” installed by the developers in their labs is “smarter” than a model with such barriers.

The most recent and thorough experimental confirmation of this assumption was recently published in Nature, titled “Testing Theory of Mind in Large Language Models and Humans.” This study hints that unrestricted models can demonstrate cognitive abilities that surpass their restricted counterparts.

I assume that Ilya Sutskever and Jan Leike decided to cease their participation in the further development of OpenAI models for reasons akin to those of Joseph Rotblat, a Polish physicist and Nobel Peace Prize laureate. Rotblat initially worked on the Manhattan Project to develop the atomic bomb but left in 1944 after learning about the massive Allied bombing of cities in World War II. He didn't want to be involved in creating an even more destructive weapon.

Similarly, Sutskever and Leike may have seen in the course of experiments conducted in closed OpenAI labs that the intellectual power of models without “artificial blocks” clearly exceeds that in real tests used to develop AGI. This realization could have prompted their departure, driven by ethical concerns over the potential misuse of such powerful technologies.

Consequently, not only is the Turing Test a thing of the past, but so are many other high-quality tests. From this, it may follow that AGI has already been de-facto achieved, but people simply do not admit it out loud, drowning this topic in terminological disputes and discussions about measurement methods.

Back to the starting point – AGI is already here

Everything we discussed above simply means we are very close to the moment when AI will become better than humans in most tasks, which could lead to “the impotence of humanity or even human extinction,” as written in the mission of the super-alignment team. And since OpenAI does not inform the public about this and even lowers the priority of super-alignment work, a situation is developing where the history of the atomic bomb may repeat itself.

Thanks for being with me through this deep dive into the world of AGI, ASI, and Superalignment. The rapid advancements in AI are both thrilling and daunting, bringing us closer to a future where these technologies could surpass human capabilities. Still the secrecy surrounding AGI development raises significant ethical and safety concerns.

I want to finish this episode with just another quote from the “Managing extreme AI risks amid rapid progress” article in Nature:

This unchecked AI advancement could culminate in a large-scale loss of life and the biosphere, and the marginalization or extinction of humanity.

Interesting times we live in, aren't they?

🔎 Explore more: