numlocked 2 days ago

Hi folks -- I'm Chris from OpenRouter. This one hurts. We're back, but our database was down for about 45 minutes, which caused user and credit lookups to fail, and took down the API. We are investigating why, and of course going to look into improving durability so this failure mode can't happen again. We will share a post-mortem on the site when we have finished our investigation. I'm sorry to our users who count on us.

vintagedave 2 days ago

One of OpenRouter's main points is that it allows you to bypass individual AI vendors' downtimes. I was considering using it for an uptime-critical project of mine.

The post-mortem will be worth watching.

  • SamLeBarbare 2 days ago

    OpenRouter: eliminating Single Points of Failure… by introducing a beautifully centralized one.

    • lordofgibbons 2 days ago

      Their uptime is still infinitely better than any single provider though.

      • sokoloff 2 days ago

        infinitely?

        • phh 2 days ago

          Well in FP4

  • drclegg 2 days ago

    To be fair, it is still useful on this front; it's much faster than waiting for requests to fail and fallback to a backup yourself.

    You still need another backup provider or two for cases like this though.

  • logicchains 2 days ago

    >One of OpenRouter's main points is that it allows you to bypass individual AI vendors' downtimes.

    Only if you're using a model hosted by multiple providers (e.g. an open model).

    • gkbrk 2 days ago

      Nope, for closed models too. Claude for example has multiple providers they work with. Google Vertex, Amazon Bedrock and Anthropic themselves all provide inference for Claude.

      The vast majority of models on OpenRouter (both closed and open) have multiple providers.

      • simianwords 2 days ago

        Interesting. I would think they would safeguard core IP from competitors.

      • OJFord 2 days ago

        Also you might be fine with routing to a different model.

gitmagic 2 days ago

Been down for ~50 minutes now and there's no information other than the automated notice on their status page.

  • euazOn 2 days ago

    FYI, they (oddly enough) communicate mostly through Discord, and they have said they are investigating the issue at 10:30am UTC - 13 minutes after the first user reports.

  • rozenmd 2 days ago

    Frankly I prefer that than a green tick and "All Systems Operational"

    • baq 2 days ago

      yellow: "volcano has erupted under the datacenter and it's being flooded with lava. engineers are investigating"

      red: "datacenter has been subject to multiple nuclear strikes. next update in 30 min"

    • euazOn 2 days ago

      Could that be due to contractual clauses for uptime in SLAs?

    • gitmagic 2 days ago

      True, that happens far too often.

blitzar 2 days ago

Can someone power it off and back on again please?

lvl155 2 days ago

How can a router be down this long? I would have to reconsider using them moving forward.

jug 2 days ago

Should be coming up now.

rvz 2 days ago

Looking forward to the postmortem.