ChatGPT O3 Preview Announcement

85 points by franze 6 months ago

jsheard 6 months ago

Since OpenAI bills you for tokens expended on chain-of-thought, I assume this "deliberative alignment" has the funny side-effect of making you pay for the time the model spends ruminating on whether it should refuse to do what you asked it to do. Talk about adding insult to injury.

xxpor 6 months ago

Maybe the most convincing argument yet that this is similar to how humans think.
- skinner_ 6 months ago
  
  Okay, but is this similar to how humans bill you?
  
  PhunkyPhil 6 months ago
  
  Yes, a lawyer could refuse your case after consulting them and their time considering the case.
  
  stogot 6 months ago
  
  How often does this happen? I’ve consulted lawyers for an hour who refused to support my case , and was billed nothing
  
  danpalmer 6 months ago
  
  At the lowest end of the industry it’s common to get free consultations because it’s too much expense for individuals otherwise, but in corporate law or higher end services I believe it’s much less common.
  
  grapesodaaaaa 6 months ago
  
  I’ve been to a lawyer once, and this happened. I got billed and told to go to a different lawyer.
  
  uncomplexity_ 6 months ago
  
  and sometimes they charge $1k per hour and still not have solutions after that hour
  
  batmaniam 6 months ago
  
  The point of technology is to make our lives easier and cheaper. There's no point to have chatgpt to charge this way other than to make the capitalist owners more money.
  Otherwise the self-serve machines at Mcdonalds should cost the company the same as an employee with a fair salary. Mass assembly lines building cars and machines should cost the same as 100+ workers including costs of health benefits and breaks.
  
  e-clinton 6 months ago
  
  I’d imagine the point is for you to pay for resources used… considering your prompt still costs money.
whamlastxmas 6 months ago

I don’t really see this any differently from an API billing you per call and still billing you for calls that where you pass in wrong credentials and get unauthorized error responses
- Incipient 6 months ago
  
  I don't think I've ever seen an API bill for incorrect credentials, or even anything that returns a non-200?
486sx33 6 months ago

So … hypothetically, I Spent $5 on tokens to figure out the answer - then got an answer that my answer is blocked !
rmbyrro 6 months ago

using the service is optional and they're clear about how pricing works, though
one can always refuse to use

jordanmorgan10 6 months ago

This video format is killing me. Would love to keep up with the announcement but I have no desire for 10-20 minute videos.

layer8 6 months ago

AI video summary at your service:
In this video, Sam Altman and team introduce OpenAI o3 and o3-mini, showcasing their advanced reasoning capabilities and performance benchmarks compared to earlier models. The session highlights the models' robust accuracy in coding and mathematical tasks, as well as new safety testing initiatives involving community participation. The o3-mini is noted for its cost-efficient performance, while safety strategies are enhanced through deliberative alignment.
## Key Points
### Introduction to o3 and o3-mini models
During the final day of their 12-day event, OpenAI announces two new models: o3 and o3-mini. These models are positioned as advancements in AI reasoning capabilities, following the success of the earlier o1 model.
### Performance benchmarks
The o3 model achieves significant improvements in capabilities, scoring 71.7% on software benchmarks and reaching near-expert levels on competitive math exams. In comparison to o1, it shows over 20% improvement in coding tasks.
### Safety testing initiatives
OpenAI emphasizes safety testing for the new models, opening access for researchers to facilitate public testing. The goal is to ensure models are safe for general use while their capabilities are further validated.
### Introduction of o3-mini
The o3-mini is presented as a cost-efficient alternative to o3, also equipped with strong reasoning powers, allowing adjustable thinking time to optimize performance based on user needs.
### New safety techniques: deliberative alignment
A new safety training technique, deliberative alignment, uses reasoning capabilities of the models to create better safety benchmarks, enhancing the ability to accurately reject unsafe prompts.
### Future developments and availability
The video concludes with information on how to apply for early access to test these models with expectations of full public availability in early 2025, alongside a call for safety researchers to contribute.
- uncomplexity_ 6 months ago
  
  the mini models are the most exciting to me. they seem to hit the balance of general utility and cost.
  people are fond of comparing large bleeding edge models, but when we compare the small ones to say gpt3.5, it is still astonishing.
  to me the smaller models being the balance of intelligence and cost is the best indicator of what the general population can use and afford.
  the larger models end up being used mostly by individuals teams and orgs who can afford to pay for it and learn how to use it.
  keep in mind at this day and age there is still a lot of people who dont know these things exist. it's just outside of their reality. then there are who have the slightest idea about it, but won't commit the time and money needed to learn it and fully experience it.
  this is like the new generational transfer of wealth and information. even if you're an individual or a small team, with enough levers you can use these technologies to your advantage, and with more levers (like yc) you can enter the battlefield and compete with existing companies.
- in-pursuit 6 months ago
  
  Can this “expert in math” do binary addition? o1 falls apart after 10 bits, which is easily memorizable.
  
  fny 6 months ago
  
  Can this "expert in math" write a function that performs binary addition?
  
  in-pursuit 6 months ago
  
  The issue isn’t performing the specific addition. Rather, you’re asking o1 to take n-bits of data and combine them according to some set of rules. Isn’t that what these models are supposed to excel at, following instructions? Binary addition is interesting because the memorization space grows at 2^n, which is impossible to memorize for moderate values of n.
  
  uncomplexity_ 6 months ago
  
  this is the way.
  its internal mechanism is still statistical prediction of text in units of tokens. that math seemed lucid enough to be able to use functions.
  
  fny 6 months ago
  
  It’s like everyone suddenly forgot you can’t do O(n^2) compute in O(n) time.
  
  in-pursuit 6 months ago
  
  Binary addition is O(n)
  
  fny 6 months ago
  
  I meant this in the general case, not specifically binary addition. Also, returning an token by ChatGPT is technically an O(1) operation, so the same principle applies. Returning a computation answer of O(n_required_tokens) cannot be delivered in O(1) time without some sort of caching.
- karimdaghari 6 months ago
  
  It's a bit ironic that it got the availability date wrong no?
  
  layer8 6 months ago
  
  It’s typical I’d say. I corrected it now.
  (It originally said “early 2024” instead of “early 2025”. The YouTube transcript says “around the end of January”.)
muratsu 6 months ago

I strongly agree. I stopped checking their page for news after day 2 due to video only announcements.

transcriptase 6 months ago

Safety: not from causing you physical harm or risking property damage, since there are no guardrails against it confidently walking you through a flawed breadboard circuit that will cause components to explode or immediately catch fire. Then say

“You’re absolutely right. We shouldn’t have run 120v through that capacitor rated for 10v. Here’s an updated design that fixes the issue”, then proceed to explain the same thing with a resistor placed at random.

No we mean safety from our LLM saying something that pisses off a politician, or even worse… hurting someone’s /feelings/.

binary132 6 months ago

It is more accurate to say they are thinking in terms of “brand safety” or liability, that is, being safe from being blamed for making a mean robot.
- sourcepluck 6 months ago
  
  Well, if one of OpenAI's LLMs started quoting Mein Kampf in its answers to people, would you laugh, or would you not laugh? That's the question.
  
  binary132 6 months ago
  
  your fallacy is: reductio ad hitlerum
  
  sourcepluck 6 months ago
  
  > Reductio ad Hitlerum (Latin for "reduction to Hitler"), also known as playing the Nazi card, is an attempt to invalidate someone else's argument on the basis that the same idea was promoted or practised by Adolf Hitler or the Nazi Party.
  I can't tell if you're being serious, or joking, or what. In any case, there's the actual definition, and I don't see how it applies whatsoever. I was not disagreeing with you in the first place? Perhaps there's simply a misunderstanding.
  reduction ad hitlerum =/= "mentioning hitler in any way"
  
  binary132 6 months ago
  
  Ok. Usually if someone suggests that you might laugh at a Mein Kampf quote in response to your argument, they are not doing it in good faith.

faizshah 6 months ago

Did I miss it or did they say whether o3 is going to be available under ChatGPT Plus or Pro?

luke-stanley 6 months ago

I watched the live stream and don't recall any mention of it. It's not at that stage!

luke-stanley 6 months ago

The title is very wrong, "ChatGPT O3" is not a thing, OpenAI's new o3 model is not even demoed in ChatGPT, they don't call it "O3". The subtitle at the top is "o3 preview & call for safety researchers". The web page title is "12 Days of OpenAI".

byyoung3 6 months ago

Summary here https://wandb.ai/byyoung3/ml-news/reports/OpenAI-Introduces-...

daft_pink 6 months ago

We’d really likt to use this today.

throwaway314155 6 months ago

    def should_upvote(headline: str) -> bool:
        if "o3" in headline.lower():
            return True
        return False

Seriously though, is there anything new here? Also why the need for the editorialized headline? The article is titled "12 Days of OpenAI", not "ChatGPT O3 Preview Announcement" (which frankly makes it sound like it's about to be available to the public, which it isn't).

layer8 6 months ago
```
   return "o3" in headline.lower()
```
would have done the job.
- throwaway314155 6 months ago
  
  This is more painful than any downvote. Have an upvote.
  
  mercer 6 months ago
  
  These threads are my favorite kind of pedantic :)