Launch HN: JumpWire (YC W22) – Easily encrypt customer data in your databases

109 points by hexedpackets a year ago

Hi HN! We are William and Ryan, co-founders of JumpWire (https://jumpwire.ai), a security tool that encrypts columns of sensitive data stored in databases, in a way that works automatically with any backend application.

We've built startups (Ryan was Dir. of Engineering at N26) and worked in big tech (William was SRE at Spotify), and in every company, saw the same pattern of data spreading out of control. It felt like a dirty secret: hundreds of employees are granted access to customer PII through internal systems. Engineers responsible for securing data end up in a race against a growing list of SaaS or internal tools in use across the organization, and fall back to using bad access management workflows. Trying to secure data by controlling access is a risky proposition on its own — data leaks due to compromised access have become an all-too-regular occurrence, e.g. Uber contractor breach in September as a recent example.

Companies that outgrow the access control approach typically do one of two things. Either developers have to write custom logic into all of their applications to encrypt/decrypt data, or they partition the data by putting some fields in a data vault and others in the main database. Both options are costly in terms of implementation work and ongoing maintenance. We’ve seen entire teams dedicated to just maintaining ETL pipelines for scrubbing PII into secondary databases!

JumpWire automates the encryption of data by identifying fields that contain sensitive information in databases and APIs. We do this without developers needing to modify their applications or manage access control rules. You define policies that determine how data should be handled—for a ‘user’ record, this might mean that the email address, name, and birthday are labeled as ‘PII’ and encrypted while signup date and favorite type of cheese are not.

JumpWire is a transparent proxy between applications and a database. The application connects using the same library it would if it was going directly to the database, and JumpWire intercepts and inspects queries before forwarding them on. Based on policies you define, individual fields can be encrypted/decrypted, nulled out, or audited as the requests and responses flow through the proxy. These policies are designed to be granular and map to table-specific schemas—for example, a policy might say to encrypt all PII, and the users table has a schema marking email address as PII. Different access controls can be applied to allow a subset of applications to bypass the policies where needed.

Because our proxies implement the underlying database protocols, application code or clients do not have to be changed to work with JumpWire.

The product is built to be self-hosted. The main component, our proxy engine, is run on your network as a cluster of Docker containers. The web interface is run by us by default but is also available to self-host. Our engine uses your own AWS KMS or HashiCorp Vault installation to store sensitive configuration data, such as database credentials and encryption keys. This ensures that confidential data is never transmitted across the Internet, and you remain in full control of the data infrastructure and keys. We do have a hosted Vault option as well to make it easy to get started or try things out in a staging environment.

Our database proxy supports PostgreSQL and MySQL/MariaDB, and DynamoDB is in beta. We also have an API proxy in early alpha that uses OpenAPI specs instead of connecting directly to a DB.

We actually built similar (but half-baked) versions of this at startups we were part of (a neobank and payment API), but it was always part of backend application code. We realized it could be abstracted out of the application entirely, and integrated via configuration instead. This would be easier to maintain, since application code wouldn’t need updating each time data policies changed. However, building it was never feasible at these other companies, because it was too remote from their core products. So we decided to start JumpWire to do it.

We have a free-as-in-beer version of our product available to use in small environments. After that we charge a monthly subscription fee based on the number of databases or APIs configured.

We’re still early and would love to hear what you think about what we’re building, as well as any general thoughts on data security. Thanks!

angryasian a year ago

I've worked with systems like this in the past. It becomes a huge burden eventually when you have teams like marketing, analytics, etc that need access to the raw data and you eventually have to store all this stuff somewhere else unencrypted.

  • hexedpackets a year ago

    Yeah, the mix of permissions can definitely be a big pain. We're building with that in mind - policy exceptions can be set so that specific groups of applications get the raw data when querying. All of the policies stack too; one common setup is to encrypt by default, then allow some specific tool to get raw data but audit the queries it's doing.

    • angryasian a year ago

      I guess but yeah usually the hot databases will be encrypted. When moving to the data lake / warehouse its all unencrypted. I think it really comes down to What kind of data you're working with.

      Last Password was hacked twice this year. How many data breaches are there on a regular basis. The reality is the cost of trouble worth the possibility. As much as everyone wants to say they care about users data, the reality in most companies is very different.

  • wdb a year ago

    But is marketing and analytics even allowed to use PII if the user didn’t give explicit consent for such usage? Would be cool if such a system could have checks for these things

gerad a year ago

Have you thought about solving the problem from a different direction? Providing a read-only, sanitized clone of the database that can be accessed outside of the core application code?

Seems like that could kill more birds with the same stone?

  • hexedpackets a year ago

    We have thought about that! It's a nice approach for some use cases but having just a read-only copy ends up being pretty limiting. Often people using internal tools (particularly customer success) needs to modify some fields in a record but shouldn't have unrestricted access to everything. We've found that being able to protect specific fields instead of the entire database gives a lot more flexibility.

  • acrefoot a year ago

    Tonic.ai seemed to fit that bill, but we ended up rolling our own ETL job due to cost concerns, and some security preferences for a simple to audit tool to do this. tonic.ai does it on-the-fly, which was merely a nice-to-have for this use case.

  • pistoriusp a year ago

    That's exactly what we're doing at https://www.snaplet.dev, I would love to chat with the founders about offering generated production accurate snapshots for developers to code against for users of their proxy!

    • debussyman a year ago

      Happy to chat anytime! You can reach me by email (ryan at [ourdomain]) or book directly on my cal - https://calendly.com/ryan-jump/yc-founder-meeting

      We've peeked at Snaplet in the past, and :heart: your design aesthetic

      • pistoriusp a year ago

        Thanks! I'm just about to catch a flight, but will reach out when I get back to the land of the living!

    • nogenhat a year ago

      Are you looking for investment? Happy to close the deal, love that idea so much and trust the co-founders redwoodjs ;-)

      • pistoriusp a year ago

        Unfortunately not anymore, we have a great set of people backing us, but thanks for the vote of confidence.

        As an aside: Not exactly sure why the parent is getting down voted.

        • dang a year ago

          It's common for startups to hijack competitors' launch threads. Some readers find that distasteful; perhaps that's why there were downvotes.

          I'm not saying that your post was such a hijack, but it's difficult to interpret these things accurately, so any post of this kind will always land on a spectrum of responses.

          • pistoriusp a year ago

            Indeed, but IMHO they're very different products if you understand the positioning, which I guess is hard to guage.

        • hodgesrm a year ago

          Yeah, that's wierd. I just upvoted you to compensate.

hangonhn a year ago

So if the fields are encrypted by the proxy on the way to the DB, how do queries and indices work since it would be pretty much invisible to the DB and the query planner? Thanks!

I really like the approach you are taking since it could be a quick drop-in deployment that solves a huge problem for us.

  • hexedpackets a year ago

    Glad to hear you like our approach! We haven't fully solved indexing/complex querying yet. We have two modes we can operate in - directly encrypting in the database, or doing just-in-time encryption as the query results come back. When encrypting directly in the database most queries other than direct comparison won't work. We have some early work started on using both homomorphic encryption [1] and format-preserving masking which opens up the ability to use other query operations.

    With JIT response encryption none of that is an issue, the database still has the raw data but applications are protected. The downside is it can be slow for large amounts of data.

    [1] https://en.wikipedia.org/wiki/Homomorphic_encryption

    • mike_d a year ago

      So basically you break relational databases and turn them into fancy key value stores?

yardstick a year ago

How easy is it to rotate encryption keys in the event of a compromise? Eg a key was accidentally included in a log file, so the data encrypted by that key now needs to be re-encrypted with a new key.

  • hexedpackets a year ago

    A manual rotation is one click on the web page, and we can automatically rotate on a schedule to limit the scope of a compromise if a key gets leaked. Full rekeying is Coming Soon™ - fields encrypted with JumpWire have some metadata about which key is used which makes it easier to find rows that need to be re-encrypted, but the end to end process isn't launched yet.

bironran a year ago

How did you solve range queries? Prefix/suffix queries? Index performance? Aggregation on database end?

  • hexedpackets a year ago

    The short answer is we haven't fully solved it yet. We have two modes we can operate in - directly encrypting in the database, or doing just-in-time encryption as the query results come back. For the former most queries other than direct comparison won't work - we have some early work started on using both homomorphic encryption [1] and format-preserving masking to help there.

    With JIT response encryption none of that is an issue, but it can be slow for large amounts of data. Any kind of big-data analytics will be a poor fit for JumpWire right now.

    [1] https://en.wikipedia.org/wiki/Homomorphic_encryption

    • bironran a year ago

      yeah, FHE isn't, yet, something that can be used in a busy production env. At best I'd say it's a specialized tool, though in my mind it's a toy solution - can work for n=1, possibly for n<(low numbers) but not for large N.

      • hexedpackets a year ago

        Totally agree. We're likely going to implement a partially homomorphic solution that allows for some specific queries. We aren't trying to build a general purpose computational environment as the use cases we support don't require arbitrary computation. The data people encrypt with JumpWire is pretty much all strings and the queries on them are mainly doing some sort of substring matching (mainly prefix/suffix).

javaunsafe2019 a year ago

Idk as you loose the index and several operators your database might get useless and this solution would not scale well.

From my experience you better have dedicated views for different stakeholders and your problem is solved without those downsides.

  • llamaLord a year ago

    I think you're looking at this from the wrong level. This product kinda assumes that field level encryption is desired for certain protected fields and works from there.

    That may not be a correct assumption for ALL systems, but it's a safe assumption for A LOT of systems.

    OP this is seriously cool, nicely done.

    • javaunsafe2019 a year ago

      Maybe you did not get my point of a dedicated view/interface/bucket …

  • hexedpackets a year ago

    It depends on the use case. In our experience, it's been rare that queries on PII need to do anything more complex than substring matching (which we're working on support for). We're definitely not trying to be able to encrypt every column, just to make some common workflows around PII and PHI a lot easier.

    Custom views can help, but it does mean you're dealing with access controls directly in the database which can be hard to manage. And the database is fully exposed through backups or engineers with server access.

  • atonse a year ago

    You don’t lose the ability to search exact values if you use convergent encryption and the same word encrypts to the same ciphertext.

    • sk5t a year ago

      This is generally considered a bad thing.

juliennakache a year ago

I liked very much the idea of dynamic masking. However, I wonder how good it works in practice. I was actually assessing one of your competitor (www.satoricyber.com) and found an easy to workaround the masking - I was able to essentially access any mask data using not-so-advanced SQL functions. Do you guys have a publicly available test suite against your proxy that people and security researcher can review? Also, do you have a bug bounty program and / or a clear disclosure policy when a vulnerability is found?

  • debussyman a year ago

    Interesting to hear your work on Satori, thanks for sharing! Curious if you've done the same analysis for Immuta?

    We haven't set up a public test suite or bug bounty program yet, but will look into this, it makes a lot of sense.

ianpurton a year ago

So I've seen something like this before i.e. https://github.com/cossacklabs/acra

So for me everything has to be infrastructures as code. I don't want to log into a UI and start configuring connections etc.

Also not keen on giving you production accesses to my databases, but maybe I misunderstood your implementation.

So I like the idea of a docker container that does this as a proxy.

It's a tough market you're going into, $395 per database is a big ask.

  • debussyman a year ago

    Acra does offer similar functionality to JumpWire!

    We don't have production access to your databases, it's a pretty fundamental part of our value prop. Database credentials can be stored in your own secret store (i.e. HashiCorp Vault) and is loaded directly from there by the proxy. And if you are concerned about the UI harvesting credentials as they are being entered, you can self-host the web app as well for full isolation.

    We are also expanding our IaC support, many of the configurations in our product can be defined as YAML in a git repository with webhooks. For deployment, we provide helm charts [1] and terraform modules [2] to include in your existing ci/cd pipeline.

    [1] https://charts.jumpwire.ai [2] https://github.com/jumpwire-ai/infrastructure-tools/tree/mai...

spak9 a year ago

FYI, I think there may be a typo on your `https://jumpwire.ai/pricing` page on the `How are keys handled?`

``` How are keys handled? We generate unqique encryption keys for every account and store them in a secure secrets manager. Subkeys are routinely created and rotated from the master key. For additional security, we support user provided keys on our Team and Enterprise plan. ```

`unqique` --> `unique`

  • hexedpackets a year ago

    Thanks for letting us know, should be fixed in a minute!

    • lorrit a year ago

      Another typo on the main page: "quickly secure your most valuable asset - you’re data." -> "your data"

      Looks like a great product!

brap a year ago

Cool product. Just curious, is there no existing encryption at the DB level? I would expect modern DBs to be able to do that.

  • hexedpackets a year ago

    Thanks! Some databases have encryption support but it is either coarse (row-level encryption is offered in a few databases for example) or it's a low level construct that becomes really complex to integrate - especially if you want to seamlessly decrypt some data. They're often only available in enterprise versions (MongoDB and MySQL do this).

    pgcrypto mentioned below is a good example. It's a great extension that works really well, and if you're only using PostgrSQL you could build a lot of the functionality of JumpWire using it. But it requires a lot of engineering work to fit into your application. Having the basic encryption functions only gets you part of the way to a full solution - the rest is aligning those with high level policies and keeping up to date as data schemas change.

lobal a year ago

Nice! I've been using a plugin [1] for Prisma that does something similar, but this sounds much more comprehensive.

[1] https://github.com/47ng/prisma-field-encryption

  • hexedpackets a year ago

    Thanks! The plugin is pretty nice if you're sticking to just Prisma for your backends. Always happy to chat about your use case or give a demo of how JumpWire compares if you're interested.

atonse a year ago

Do you guys post any details about the storage format?

Like if I had the encryption key and any salt etc, can I decrypt it without your product?

Also how much has the encrypted format been vetted?

I saw your example and the last name seemed to be massive even compared to using something like KMS.

  • hexedpackets a year ago

    We still need to add the format to our docs, but it's essentially:

    prefix + base64(len_encode(metadata) + len_encode(key_tag) + aes_encrypt(data))

    So definitely possible to decrypt it without JumpWire, if you have the keys. There are some pieces of metadata we add in that we could make optional if you want to reduce the resulting ciphertext size. That metadata adds a few extra bytes, but it doesn't grow with the data size.

    • atonse a year ago

      Thank you – I would recommend writing up a page with all the details on your docs because that would appease a whole lot of people that would be your target customer (like myself)

      Although I might be biased cuz I'm a founder from a tech background so I want those details, but even with those details, I'm one of your target market but my worry with these kinds of products tends to be more about things like:

      - am I adding an unreliable piece of infra to my stack? this is going to be a critical gatekeeper, so if this fails, not only is it like my DB being down, as the only method of decrypting my data, does it have the ability to fail in a way that results in permanent data loss (whereby I can't decrypt some subset of the data)

      - if I had to yank this out, what's the process? will I be stuck?

      - what are the chances of us doing something stupid and lock OURSELVES out of our own data? what guardrails are available there?

      - what is the key management story? (which answers a lot of the above questions)

      - is this roll-your-own crypto (not just which algorithm, but how the messages are constructed, etc) or something standard and vetted? Because there's no secret sauce to be had there, it's more in making all those OTHER elements easier for me.

lyime a year ago

Congrats on the launch! Interesting product.

How are updates handled, if I’m hosting the container in my cloud? How should I plan for troubleshooting if there are incidents involving JumpWire?

  • debussyman a year ago

    We tag releases for the container which gives you flexibility to manage updates on your deployment schedule. In a production setup, our proxy engine automatically clusters across multiple nodes, so that rolling updates minimize downtime.

    Policies are cluster aware, so that individual policies can be pinned to a particular cluster.

    For troubleshooting, our engine publishes events that you can ship into your observability or monitoring stack (datadog/statsd, prometheus, cloudwatch) so any degradation can be handled by an IR process. And we support our customers with quick responses on shared slack channels directly with their engineering teams.

paulgb a year ago

Congrats on the launch! This sounds pretty cool.

Did you have to get into the weeds of the wire protocols that Postgres/Mysql use? What was that like?

  • debussyman a year ago

    Indeed we did get into the weeds. PostgreSQL was fairly straightforward, MySQL was a big challenge. Interestingly the hard parts are supporting the large variety of authentication handshakes that MySQL/Maria supports, not the queries themselves. This is the fun part of our job! ;)

    Also critical is ensuring encryption occurs within the database transaction, so that data doesn't leak into write-ahead logs or change data capture streams. Since we manage keys/rotation this takes some careful logic in our engine.

cloudfalcon a year ago

What's the risk to your business of other data security companies (like BigID) offering this kind of functionality?

  • debussyman a year ago

    We see BigID and others in the data governance space focusing on cataloging schemas and identifying risks around access to data that violates policies. In cases where remediation requires a technical change, such as tokenizing data before sending to a third-party API, JumpWire offers a solution that doesn't require engineering to re-architect their systems.

    Of course BigID could build their own technical controls for customers to install, but I'm seeing more partnerships happening in the space - Cyera and Wiz recently announced a tighter product integration [1].

    There's also problems of offering a solution over SaaS. We believe a proxy must run in our customers' network for low latency, as well as the added security of data isolated to a VPC.

    [1] https://www.prnewswire.com/news-releases/cyera-and-wiz-partn...

acrefoot a year ago

Any comparisons to https://www.tonic.ai?

> Based on policies you define, individual fields can be encrypted/decrypted... Are the policies something like "retool" gets tokenized or faked data back, and the main app gets everything? Or is it more granular even within the main app? Like can I teach JumpWire about my app's users and our AuthZ ruleset?

> or they partition the data by putting some fields in a data vault and others in the main database I was considering using VGS to tokenize sensitive data, but I prefer self-hosted and reasonably auditable code for such sensitive systems. Is that the case here?

> We’ve seen entire teams dedicated to just maintaining ETL pipelines for scrubbing PII into secondary databases!

I do this to make staging environments more realistic, which makes them double as debugging tools on production when you can't give engineers any sort of direct production access. We whitelist non-sensitive fields (most importantly foreign keys), and fill in the rest with faked data. The app looks like production, but if all the users were bots who were saying nonsense at each other. At my scale (50 person company), it works reasonably well enough with just me maintaining it.

  • hexedpackets a year ago

    Tonic is awesome! We think of synthetic data/differential privacy as a different use case - trying to replicate data across scoped environments while preserving certain properties or distributions of the entire data set. There is a security/privacy component from scrubbing the data, but the original data source is unmodified, and that's where we feel risk lies. And the desired outcome isn't to add security but to produce a data set that "looks like" the original well enough for testing/modeling/analytics.

    > Are the policies something like "retool" gets tokenized or faked data back, and the main app gets everything?

    Yep, that's exactly right. Application credentials are grouped under classifications, and policies can be included/excluded across classifications. We aren't passing authz through JumpWire but for something like Retool you can configure it to connect through different proxies for different users.

    > I prefer self-hosted and reasonably auditable code for such sensitive systems. Is that the case here?

    Exactly. The engine which interacts with your data is almost always self-hosted, and the web app also can be if needed.

    > At my scale (50 person company), it works reasonably well enough with just me maintaining it.

    Makes sense! No reason to add more tools to your stack yet if the custom process isn't too burdensome.

dbochman a year ago

Dang this sounds awesome, really dig that clients won’t require changes to play nice

aharm a year ago

This sounds great, but I’d really prefer a fully-hosted solution. Do you offer one?

  • debussyman a year ago

    We can launch the engine into VPC we manage that is co-located in your region/AZ, and peer the networks, instead of offering a traditional multi-tenant hosted solution.

    But we try _really_ hard to ensure your data is never exposed to the Internet. And we do everything we can to limit our ability to read your data, either through self-hosting or ensuring you own the keys.

danbmil99 a year ago

Any plans to support mongodb?

ajnene a year ago

Amazing work guys! Excited to integrate this to shore up our security practices

danielmarkbruce a year ago

Looks great. How do you guys compare to something like Voltage?

  • hexedpackets a year ago

    The value prop is definitely very similar. I'm not as familiar with Voltage as I am with other solutions, but my understanding is that it requires either using the Voltage database driver (JDBC/ODBC in particular) or an HTTP API.

    With JumpWire, all of the works happens in an engine proxy that works directly with the database protocols. That makes the integration simpler - any language and connector can be used by just changing the hostname and auth. The downside is it's harder for us to add new databases - Voltage's approach definitely wins out there.

nox7777 a year ago

amazing! we've been looking for smth like this! just registered via website

liushh a year ago

Great work guys! Looking forward to integrating with JumpWire!

trafnar a year ago

I suppose your company in theory could read all the incoming data? Could engineers at my company decrypt the data? Or are the keys not available to us?

I suppose its more about ensuring the data sitting around in the DB isn't exposed to random employees or hackers yeah?

  • hexedpackets a year ago

    Our engine is self-hosted, so all of the data is kept local to your network and we can't read any of it. Concerns about data access and query latency are the two biggest reasons we decided to take the self-hosted approach.

    Whether engineers can access the keys and decrypt data depends on your setup. The engine can use either AWS KMS or Vault for top-level key management, so if an engineer has full permissions over those then they could get the keys out. We can also host the keys in our infrastructure and sync them over to the engine if you're comfortable with that tradeoff.

  • iLoveOncall a year ago

    Maybe read the post before commenting? They answer all your points in it.

    The proxy layer is self-hosted, the UI can be self-hosted and the keys are your own AWS KMS keys.

valenterry a year ago

What I'd like to have is an app that allows me to easily select specific vocab that I want to learn (with flashcards).

Essentially, I would pick "cooking" and get a list of vocabulary, sorted by usage/importance that contains all the words that I need for "cooking" such as tools, ingredients, techniques and so on.

Or the same for traveling, hiking, cycling, ordering in a restaurant, buying a house, ...

That would be super useful.