Show HN: Superglue – open source API connector that writes its own code

github.com

156 points by adinagoerres 17 hours ago

Hi HN, we’re Stefan and Adina, and we’re building superglue (https://superglue.cloud). superglue allows you to connect to any API/data source and get the data you want in the format you need. It’s an open-source proxy server which sits between you and your target APIs. Thus, you can easily deploy it into your own infra.

If you’re spending a lot of time writing code connecting to weird APIs, fumbling with custom fields in foreign language ERPs, mapping JSONs, extracting data from compressed CSVs sitting on FTP servers, and making sure your integrations don’t break when something unexpected comes through, superglue might be for you.

Here's how it works: You define your desired data schema and provide basic instructions about an API endpoint (like "get all issues from Jira"). superglue then does the following:

- Automatically generates the API configuration by analyzing API docs.

- Handles pagination, authentication, and error retries.

- Transforms response data into the exact schema you want using JSONata expressions.

- Validates that all data coming through follows that schema, and fixes transformations when they break.

We built this after noticing how much of our team's time was spent building and maintaining data integration code. Our approach is a bit different to other solutions out there because we (1) use LLMs to generate mapping code, so you can basically build your own universal API with the exact fields that you need, and (2) validate that what you get is what you’re supposed to get, with the ability to “self-heal” if anything goes wrong.

You can run superglue yourself (https://github.com/superglue-ai/superglue - license is GPL), or you can use our hosted version (https://app.superglue.cloud) and our TS SDK (npm i @superglue/client).

Here’s a quick demo: https://www.youtube.com/watch?v=A1gv6P-fas4 You can also try out Jira and Shopify demos on our website (https://superglue.cloud)

Excited to share superglue with everyone here—it's early so you'll probably find bugs, but we'd love to get your thoughts and see if others find this approach useful!

dantodor 11 hours ago

Once you have the OpenAPI specs, you can build an MCP server on top of that. Automatically. You don't have them ? There's MITM2Swagger[0] that will do it's best to infer it. Probably you'll need some manual adjustments, but still. And MCP servers can now be integrated with any LLM, not only Anthropic. While I appreciate your approach, how do you fight the MCPs?

[0] https://github.com/alufers/mitmproxy2swagger

  • nbbaier 4 hours ago

    > Once you have the OpenAPI specs, you can build an MCP server on top of that. Automatically.

    What's the route to doing this automatically? Is there some tool for doing this?

  • babyshake 8 hours ago

    And with MCP, the idea is that the agent can translate a natural language instruction into the specific API request and then translate the schema into whatever structured output format you want? Is there anything that Superglue does that you wouldn't get somewhat out of the box using agents and MCP? I'm not too familiar with MCP so still trying to understand how it compares to this sort of thing.

    • sfaist 5 hours ago

      my personal understanding (anyone feel free to correct me here) of MCP is that it is basically a standardized interface for tool use. So, if you as an API provider (e.g. stripe) want agents to connect to your API, you can offer an MCP server that serves as a middleman between you and the agent. What we fundamentally do is serve also as a middleman, but not (primarily, yet) for agents, but for normal (non-AI) applications that would otherwise need to use the REST/SOAP/whatever API with a bunch of integration code. Also, MCP does not do any data transformation, that would be on the agent to do.

  • sfaist 10 hours ago

    Thanks for sharing! We're taking a bit of a different angle here. The APIs we are looking at are not the ones that websites are using, but rather then ones you would typically integrate with when thinking about data integrations. Also, while you could use superglue as an MCP server, the usecases we see right now are less in the AI / agent world but rather in the workflow / ETL / onboarding world.

    That being said, the mitmproxy2swagger approach is really really cool as an alternative to mindless scraping.

DaiPlusPlus 16 hours ago

> Automatically generates the API configuration by analyzing API docs.

The problem with a lot (most?) integration work is that often there simply aren't any API docs - or the docs are outdated/obsolete (because they were written by-hand in an MS Word doc and never kept up-to-date) - or sometimes there isn't an API in the first place (c.f. screen-scraping, but also exfiltration via other means). Are these scenarios you expect or hope to accommodate?

  • sfaist 16 hours ago

    you can give it any context you have, worst case in text form, and the llm will try to figure it out, call different endpoints etc. Recently someone mentioned to me the intern test by Hamel Husain: if avg college student can suceed with the given input (with a lot of trying and time), then llms should be able to do it too. So that's the bar we're aiming for.

    No api at all is out of scope for now, there are other tools that are better suited for that.

promocha 15 hours ago

Really nice idea and product. Does it update and cache changed schema for the target API? For ex. an app makes frequent get calls to retrieve list of houses but API changed with new schema, would Superglue figure it out at runtime or is it updating schema regularly for target API based on their API docs (assuming they have it)?

  • sfaist 15 hours ago

    Yes, it does update and cache changed schema for the target API. At runtime. The way it works that every time you make a call to superglue, we get the data from the source and apply the jsonata (that's very fast). We then validate the result against the json schema that you gave us. If it doesn't match, e.g. because the source changed or a required field is missing, we rerun the jsonata generation and try to fix it.

    I guess you could regularly run the api just to make sure the mapping is still up to date and there are no delays when you actually need the data, depending on how often the api changes.

codenote 3 hours ago

I have the same issue. I'm curious to see how it's implemented, so I'll take a look at the source code.

ijustlovemath 9 hours ago

What are the limitations on usage? What's the approximate usage percentage, say, per kilotoken of context? Is there a point at which users are not allowed to query (profitably)?

Re: open source, what's your general attitude/commitment towards the community? Is it more like SQLite (no contributions accepted), or more like Rust (let's get everyone involved)?

  • sfaist 8 hours ago

    If you're self-hosting, you can bring your own model and there are no limitiations. For the hosted version, we currently do custom pricing agreements with our customers using this in prod, and keep it free for hobbyists within fair use limits. We still need to figure out what the boundaries will be, tbh.

    On your open source question, we accept contributions from non-team-members and have done so in the past, particularly on bugs or new features on the backend.

rahul_agarwal 12 hours ago

Really cool project! I'm very bullish on LLMs for structured data.

Curious - why did you decide to open source? It's neat to see a lot new YC open source companies. I'm curious why you thought open-sourcing superglue was strategically advantageous

  • sfaist 11 hours ago

    Thanks! The primary reason is because we want folks to be able to run this locally and contribute to the project / fix issues as they come up. This is much harder when you have a black box tool and rely on our small team for support.

    • babyshake 8 hours ago

      Because of the GNU General Public License, any project/startup that makes use of Superglue is required to open source all their code under the same license? I'm not a license/copyright expert, so I'm a bit fuzzy about how this is supposed to work.

      • sfaist 5 hours ago

        Not quite. The server runs standalone, so you can use it just as you would use linux as part of your project without affecting your own code. The client libraries that become part of your code are MIT licensed. The reason we made this decision is to prevent AWS & co from copying all of our code without contributing to the project.

edunteman 8 hours ago

This was one of my favorite demos I've seen live at YC - congrats on the launch!

gatienboquet 7 hours ago

I want this and as input scanned pdf. It would solve my issues.

m0rde 16 hours ago

Great idea, congrats. Can you speak a bit about the the validation piece? Were LLM hallucinations an issue and required this? Are you using some kind of structured output feature?

  • sfaist 16 hours ago

    Sure! We use structured output for the endpoint, but not for the jsonata since it's hard to actually describe as a format. 3 big levers for accuracy / reducing hallucinations: 1. direct validation: we apply the jsonata that is generated and check if it really produces what we want (we have the schema after all). This way we can catch errors as they come up. 2. using a reasoning model: by switching to o3-mini, we were able to drastically improve the correctness of the jsonata. takes a bit longer, but better waiting a bit than incorrect mappings. 3. using a confidence score: still in development, but sometimes there are multiple options to map something (e.g. 3 types of prices in the source, but you only want one. Which one?). So we're working on showing the user how "certain" we are that a mapping is correct.

hoerzu 17 hours ago

Love it, is there also a possibility for alarms if schema changes?

  • cpursley 11 hours ago

    Alarm? What about self-healing. That would be neato.

    • sfaist 11 hours ago

      self healing is already a feature :)

  • sfaist 16 hours ago

    working on it... ping me if you have a usecase in mind and I can set it up for you.

npollock 16 hours ago

something like this that runs as a browser agent, allowing me to extract structured data from websites (whitelisted) using natural language queries

asdev 15 hours ago

why would use this when I can just add API docs to my LLM context and have it generate the integration code?

  • sfaist 15 hours ago

    depends on your usecase: - this abstracts away a lot of the complexity, including pagination and format conversion. Also integrated logging and schema validation. - this is self-healing, so when data comes through that you have never seen before or if the api changes it is a lot less likely to break. - if you need to integrate a lot of APIs, or if you have multiple apps needing access to these apis, it is much easier to set up here than writing 1000s of lines of integration code. If none of this is important / applies to you and the generated code works well, then you could also just do that.

AvImd 13 hours ago

Access to XMLHttpRequest at 'https://graphql.superglue.cloud/' from origin 'https://app.superglue.cloud' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.

  • sfaist 13 hours ago

    Thanks for flagging this. Odd. Did this happen on the website or in the actual app? Might be a server overload looking at our logs.

tsvoboda 9 hours ago

this is dope, i hate maintaining custom integration code

dboreham 15 hours ago

Doesn't someone own a trademark in that general area?

tayloramurphy 15 hours ago

Does this have any connection to the previous "Supaglue" startup [0]? Similar problem space, slightly different/pre-llm solution.

[0] https://docs.supaglue.com/

  • cpursley 11 hours ago

    ooof, I assumed this was the same company.