Quick Notes on OpenAI's New o1 Model

A quick look at OpenAI's new o1 model. We'll explore its improved math and coding skills, potential impacts, and some concerns. Is this AI truly thinking, or just really good at faking it?

Sep 16, 2024

Article voiceover

1×

0:00

-8:56

Hey there, fellow tech enthusiasts and AI divers! Welcome back to another episode of Tech Trendsetters – your front-row seat to the future of technology, science, and artificial intelligence. So, you might know that we talk about AI from time to time here, and that's exactly the topic for our short discussion today.

I know you've probably seen plenty of AI announcements about the new o1 OpenAI model that all look similar, but I've been looking into it too, and something got me thinking. I've jotted down some quick notes – the interesting bits, the potential impacts, and yeah, a few concerns too.

On September 12, 2024, OpenAI made a significant announcement that sent ripples through the tech industry. They unveiled o1, their first AI model with advanced reasoning capabilities.

Today, we will discuss why I think this isn't just another incremental step in AI development, but rather represents a substantial leap forward. In short: OpenAI o1 demonstrates performance metrics that are truly remarkable: it's five times more proficient at solving mathematical problems and six times more effective at writing code compared to its predecessor, GPT-4o.

The Path to AI Supremacy

As we dive into OpenAI's vision for the future of artificial intelligence, it's crucial to understand how they're conceptualizing the journey towards increasingly sophisticated AI systems. OpenAI has introduced a five-tier system to track its progress towards developing artificial general intelligence (AGI), a type of AI that can perform tasks like a human without specialized training. A more detailed discussion about AGI we had before:

What is Artificial General Intelligence, or AGI? And How Close Are We to Singularity?

Dmitry K

February 23, 2024

Read full story

This framework provides a roadmap not just for OpenAI's internal development, but also offers a lens through which we can view the entire field's progression. Let's break down these levels and examine what each stage represents:

Level 1: Chatbots, AI with conversational language
At this level, we find AI systems capable of engaging in human-like conversations. This is where current technologies like ChatGPT reside. While impressive, OpenAI considers this just the starting point.
Level 2: Reasoners, human-level problem solving ← we are here
The next step involves AI that can solve basic problems at a level comparable to a human with a PhD, but without access to external tools. This represents a significant leap in cognitive capabilities.
Level 3: Agents, systems that can take actions
At this stage, AI systems would be able to operate independently over extended periods, taking actions on behalf of users. This implies a level of autonomy and decision-making far beyond current capabilities.
Level 4: Innovators, AI that can aid in invention
Here, we're looking at AI systems capable of generating new ideas and innovations without relying solely on existing knowledge. This level of creativity and original thinking would be truly revolutionary.
Level 5: Organizations, AI that can do the work of an organization
The pinnacle of this framework envisions AI systems that can effectively perform the work of entire organizations, potentially outperforming human-run entities in efficiency and capability.

What's particularly noteworthy is OpenAI's current self-assessment. Despite the impressive capabilities of models like GPT-4, the company places its current technology squarely at Level 1. However, they believe they're on the cusp of reaching Level 2 with developments like o1.

I started this episode with OpenAI's framework because I believe it provides crucial context for grasping the true significance of o1's capabilities. You might look at a fivefold improvement in mathematical problem-solving or a sixfold boost in code writing and think, "Okay, that's nice, but is it really groundbreaking?" But when I view these advancements through the lens of this five-tier system, I see something far more profound. To me, this isn't just about incremental improvements – it's potentially signaling a monumental shift from Level 1 to Level 2. We're not just talking about AI getting a bit smarter or faster; we're witnessing a fundamental change in the very nature of AI capabilities. It's like watching a child suddenly grasp abstract reasoning. In other words: it’s huge.

OpenAI’s o1 – First AI Model with Reasoning Capabilities

According to OpenAI, the o1 project responds to queries and complex tasks at the level of a PhD in exact disciplines. o1 is also tailored for competitive programming, math Olympiads, exact sciences, and even philosophy.

The developers explained that o1 does not simply assemble an answer from words in a dataset, but actually thinks like a human.

The True Significance of o1 in AI Reasoning

Now, you might be wondering, "What's the big deal about o1?" Let me break it down for you in terms that I hope will resonate with both tech enthusiasts and business leaders alike.

Here's the crux of the matter: Models like GPT, Llama, or Claude face a growing risk of error with each generated token, thanks to a process called autoregression. But o1? It's a a little bit different.

Imagine having an internal fact-checker, constantly reviewing and adjusting its own work. That's essentially what o1 does. It "checks" its reasoning at each step, steering its internal state in the right direction. This is crucial for tackling complex tasks that demand extended chains of logic.

Now, I want to be clear: o1 isn't necessarily superior to GPT in general text generation. The knowledge embedded in its transformer architecture remains largely the same. But where o1 truly shines is in logic, programming, and mathematics. Why? It all comes down to training methodology.

OpenAI trained o1 by generating countless chains of reasoning and then applying reinforcement learning to those chains that led to correct answers. Think of it as giving the AI a treat for "good" reasoning. In fields like math and programming, where correct answers can be predetermined, this training can be scaled to millions of iterations.

But here's the kicker: you can't apply this same approach to tasks like writing poetry. Why? Because evaluating the quality of a poem isn't something we can automate easily. It would require manual checks, which are slow, expensive, and limited by human resources.

This distinction is crucial for understanding where AI is headed. Machine learning excels in areas where training can be automated and scaled massively. It's cost-effective and fast. Manual training, on the other hand, is slow and hits a ceiling due to human limitations. That's why recent versions of transformers, while impressive, have shown relatively modest improvements in general-purpose tasks.

The million-dollar question now is: Will enhancing an AI's logical and mathematical abilities lead to emergent improvements in other areas? If human history is any indicator, there's a good chance it might. Just as advancements in logic and mathematics have driven progress across various fields of human endeavor, we might see similar ripple effects in AI development.

As I see it, the real impact of this approach will unfold in the coming years. OpenAI and other players in the field will continue collecting data and refining this architecture. We're at the beginning of a new chapter in AI development, and I, for one, am excited to see how it unfolds.

From Research Labs to Real-World Impact

Just three days after OpenAI showed off o1, people started doing some pretty crazy stuff with it. It's as if we collectively held our breath on September 12th, and by September 15th, the floodgates of innovation had burst open. Let me walk you through some of the most jaw-dropping developments that crossed my desk in those first 72 hours.

First up, we have a story that, frankly, made me do a double-take. NASA research scientist Kyle Kabasares reported that OpenAI's ChatGPT o1 neural network wrote the code for his PhD thesis in just one hour, spread across six queries. Now, let that sink in for a moment. This isn't just any piece of code we're talking about – it's PhD-level research that Kabasares and a group of authors had previously spent 10 months developing.

"After about 6 prompts, the current version of ChatGPT o1 produced a working version of the code described in the methods section of my thesis," Kabasares stated. He went on to clarify, "I want to emphasize that while the skeleton code emulates what my code does, the neural network used its own synthetic data that it was asked to create, not the real astronomical data that will be used in the actual work."

What strikes me about this is not just the speed – though that's impressive enough – but the depth of understanding o1 demonstrated. It didn't just regurgitate information; it comprehended the complex methods described in the thesis and translated them into functional code. This level of scientific comprehension and practical application is, in my view, a significant leap forward.

But the surprises didn't stop there. AI enthusiast Maxim Lott reported another feat that caught my attention: o1 passed the authoritative Norwegian Mensa IQ test with a score of 120 points. Now, I've seen AI models tackle IQ tests before, but what's noteworthy here is that o1, even in its preview version, outperformed not just its AI competitors but also a significant portion of the human population on this test.

What's more, one detail about o1 particularly intrigues me: its ability to further train itself. An OpenAI employee suggested that o1 should retake the test in a month to see the difference. This capacity for self-improvement opens up a whole new realm of possibilities – and questions.

These developments, happening in just the first 72 hours after o1's release, paint a picture of an AI system that's not just incrementally better, but potentially transformative.

The Flip Side: Risks and Concerns

So, we've talked about all the cool stuff o1 can do. But here's the thing – with great power comes great responsibility, right? And it looks like OpenAI is taking that responsibility pretty seriously.

OpenAI's internal assessment labeled o1 as having a "medium" risk level for chemical, biological, radiological, and nuclear weapons – the highest risk level they've ever assigned to a model. Now, "medium" might not sound too bad, but get this: it's the highest risk level OpenAI has ever given to one of their models. What that means is they think there's a bigger chance than ever that AI could be used to help develop biological weapons. That's not something to take lightly.

Now, OpenAI isn't just sitting back. Mira Murati, their Chief Technology Officer, says they're being super careful about releasing o1 because it's so powerful. They've had experts from all sorts of scientific fields testing it, looking for weak spots. Murati says the new models are actually a lot safer than the old ones.

But here's the thing that's got me thinking: We're in this race to make AI smarter and more capable, right? Are we really ready to handle what we're creating? I won’t dive on this topic much, but you can find more info in one of our previous episodes about AI alignment:

Superalignment or Extinction – The Manhattan Project of Our Time

Dmitry K

June 22, 2024

Read full story

TLDR;

If you scrolled down up to this point or just want the quick rundown on OpenAI's new o1 model, here's what you need to know:

Main feature: New models can solve more complex problems in science and programming, but need more time to answer;
During training, they improve thinking processes, try different strategies, and realize mistakes;
Developers say "future versions might think for hours, days, even weeks" – longer thinking time could mean better answers;
Current o1 thinks for a few seconds;
Preview model integrated into ChatGPT chatbot and API;
Lightweight o1-mini model released, focused on specific programming tasks;
Available to ChatGPT Plus and Team users; o1-mini might be included in free version;
Usage limits: 30 messages/week for o1-preview, 50 for o1-mini;
Not a full replacement for gpt-4o; excels only in specific tasks, that require reasoning, like calculations and coding;
Performs at gold medalist level in math and programming Olympiads; comparable to a doctoral student in complex physics tasks;
Model doesn’t have internet connection or ability to search information;
Preview version may have bugs; full model exists but still being tested;
Prompts can be simple – o1 is "all understanding", no need for detailed explanations.

Now, before we wrap up, let's tackle the elephant in the room. I know some of you out there might be tempted to poke holes in o1's abilities with some basic math problems. But hold your horses!

Terrence Tao, aka one of the brightest mathematical minds walking the planet today, also took o1 for a spin. He believes that, although o1 cannot yet generate its own conceptual ideas, to reach the level of a “competent graduate student” mathematician, the model may require only one or two iterations of improvement (and integration with other tools, such as computer algebra packages and proof assistants).

Now, you might be wondering what kind of brain-teasers Tao threw at o1. They were really tricky problems, almost impossible to solve with the help of earlier models. To be honest, I barely understood the task myself – but hey, here’s an example below!

So, while o1 might not be the next Einstein (yet), it's definitely flexing some impressive mental muscles. Food for thought, right?

Remember, the current model is just a preview. The full model is coming, so stay tuned for more developments! Also, I can't help but chuckle at the naming schemas companies use for their models. o1 is better than 4o, which is better than 3.5? Looks like they just asked AI how to name a model. Until next time!

Explore more: