points by alecco 1 day ago

PSA: Don't use OpenRouter for DeepSeek V4 as it messes up you caching. Use DeepSeek API directly and you'll get 2x to 3x more cached tokens.

numlocked 1 day ago

Can you share more? I'm with OpenRouter and we would love to address this! We don't see this in our own testing, I don't believe -- but will share this feedback and dig in.

  • bwfan123 8 hours ago

    Here is some data from my experience using both deepseek v4 flash directly, and deepseek v4 flash via openrouter.

    Directly: 135M input tokens - $0.57 (134M cached)

    Via OpenRouter 6M tokens - $0.81 (caching stats & inp/out not reported)

    Caching is a huge win with using deepseek directly.

  • alecco 6 hours ago

    Just try. In a case last week it was ~3x and I tried multiple providers: deepseek, gmicloud/fp8, novita/fp8, and another one I can't remember. It was a large job where at least 2/3rds of the start of the prompts was exactly the same (literally a static string).

    Then I read somewhere (I think X) that OpenRouter adds stuff and breaks caching (telemetry? headers? can't remember). So I stopped the job, switched to actual DeepSeek provider, and voilá, caching 3x more tokens per request (on average).

    • alecco 2 hours ago

      > switched to actual DeepSeek provider

      I meant actual DeepSeek API.

  • phainopepla2 6 hours ago

    I am experiencing this using Opencode. Caching works fine via Deepseek API but not so good via Openrouter

jaggs 6 hours ago

Yes, I definitely noticed a problem with openrouter and deepseek v4 pro. It's much more expensive.

SV_BubbleTime 21 hours ago

When you say Deepseek API, you mean servers in China? Or is it a copy of the model operated and run by OpenRouter?