jrmromao 2 days ago

Hey HN,

Solo dev here. I've been using the OpenAI and Anthropic APIs quite a bit for various projects, and like many others, I started getting pretty concerned about the monthly bills racking up. It felt like I was spending more time trying to manually tweak prompts or guess which model (GPT-3.5, GPT-4o, Haiku, Sonnet, Opus?) was just good enough for a task to save a few cents, instead of actually building.

So, I built CostLens (https://costlens.dev/) to try and automate that process. It's essentially an SDK wrapper that acts as a proxy for the official openai and anthropic Node.js libraries (Python support is next on the list).

The setup is designed to be minimal – you install the costlens package and wrap your existing client instance, like: const openai = costlens.wrapOpenAI(new OpenAI()); Or for Anthropic: const anthropic = costlens.wrapAnthropic(new Anthropic());

After that, you use the client exactly as you normally would, but the requests go through the CostLens service first. The idea is to automatically apply cost-saving techniques without needing code changes for every API call:

Smart Model Routing: For requests where you specify a high-end model (like gpt-4o or claude-3-opus), CostLens can analyze the request and potentially route it to a cheaper, faster model (like gpt-3.5-turbo or claude-3-haiku) if it determines the task doesn't need the extra power. (This is configurable).

Prompt Optimization: It tries to automatically reduce the number of tokens sent in the prompt and history, cutting down payload size while aiming to keep the context intact.

Caching: Identical requests made close together can return a cached response, saving the cost and latency of hitting the LLM again.

Everything is routed through a central service which applies these rules and then forwards the request to the actual OpenAI/Anthropic API. There's also a dashboard (https://costlens.dev/) where you can get analytics on your usage, see cost breakdowns, track individual prompts, and configure the routing/optimization settings.

It's still quite new (just launched!), so I'm keen to get feedback from the HN community, especially on:

Does the proxy approach feel right? Any major concerns about latency or sending prompts/API keys through a third service like this? (Security and privacy are obviously critical).

Are the current optimization/routing strategies useful? What else would you want to see? (e.g., Request batching? Support for more providers? Finer-grained routing rules?)

How important is having a dashboard vs. just the core SDK savings?

There's a free tier generous enough to handle a decent volume of requests, so you can hopefully see if it actually makes a dent in your costs before paying anything. Docs are here: https://costlens.dev/docs

Appreciate any thoughts or feedback you have! Thanks.