Pg_jsonschema – JSON Schema Support for Postgres

219 points by awalias 2 years ago

PG community had a similar patch, which got reverted from PG 15 on the last minute: https://www.depesz.com/2022/04/06/waiting-for-postgresql-15-...

kiwicopple 2 years ago

this would be our preference. we'll try to support this for the next commitfest, and if it gets merged then we will deprecate our extension in favour of the native solution.
- mkurz 2 years ago
  
  Is there a chance it gets committed during the current commitfest? See https://commitfest.postgresql.org/42/4086/
  This years feature freeze will be on April 8th (https://www.postgresql.org/message-id/flat/9fbe60ec-fd1b-6ee...), so if it does not get committed within the next two and a half weeks it will miss this years Postgres release in September...
  
  kiwicopple 2 years ago
  
  I'll flag it with the team this week. I'm not sure what the blocker was previously, but it might just be a matter of submitting the patch again (with minor changes) so that it's "in" for commitfest, with someone willing to own the work over the next few months.
  
  mkurz 2 years ago
  
  I think the things that needed to be fixed from last year are already committed (more general stuff not directly related to the JSON patches). Also according to this message at least partial stuff from the JSON patches should be committed "...in the next few days..." however that was two weeks ago: https://www.postgresql.org/message-id/454db29b-7d81-c97a-bc1...
  I am a bit worried, even though the patches seem to be "stable", they will miss the deadline... (since I would need those features as well)
  
  kiwicopple 2 years ago
  
  I don't know any of the people involved in this patch, so I've sent it to Alexander Korotkov to get his opinion. I'll let you know his response after he has a chance to look at it.
  
  kiwicopple 2 years ago
  
  Alexander's response:
  > This is very long story starting from 2017. This patch should finally be committed. Some preliminary infrastructure already landed to PostgreSQL 16. Regarding SQL/JSON itself I doubt it will be committed to PostgreSQL 16, because feature freeze is coming soon. It's likely be postponed to PostgreSQL 17.
  > Regarding replacement for pg_jsonschema, I don't think it will be a good replacement. Yes, one can construct a jsonpath expression which checks if particular items have particular data types. But I doubt that is nearly as convenient as jsonschema.
  It looks like there would still be some benefit for pg_jsonschema, unless the community decided that they wanted support jsonschema validation. We could propose this, but I don't think it would arrive to pg core any time soon.
- nextaccountic 2 years ago
  
  That JSON_TABLE feature looks pretty useful [0], but it seems complimentary to pg_jsonschema. Can it actually be used to validate a json prior to inserting it to the database?
  [0] For my use case, there is a problem: if the json represents a sum type (like Rust enums or Haskell ADTs) as SQL tables. Often you will have a "tag" that specifies which variant the data encodes, and each one has its own properties. When representing as a table, you will usually add the fields of all variants as columns, setting as non-null only the fields belonging to the variant of each row. And the reason I insert jsons into the database is really just to represent sum types in a better way.
- debugdog 2 years ago
  
  Well written article! Also very glad to hear your approach to support the native implementation. For all of our projects when we're integrating external services we usually keep the relevant original JSON responses as a jsonb as kind of a backup. Next to that we extract the data we'll be using to queryable data. To be able to use those "dumps" directly would be a nice thing to have.
nextaccountic 2 years ago

How can this be used to validate JSON? (and prevent invalid json from being inserted into the database in the first place). I don't see "jsonschema" mentioned in that post.
Perhaps Postgres could support jsonschema validation natively.

julian37 2 years ago

I've found that this PL/pgSQL implementation also works well when you don't need maximum performance:

https://github.com/gavinwahl/postgres-json-schema

boomskats 2 years ago

This is long overdue and elegantly done. Great work!

On a personal note, it's great to finally see Neon & Supabase playing with each other. Much more interesting to me than Hasura.

endisneigh 2 years ago

There’s nothing stopping you from using this with Hasura. Don’t see the relevance.
tough 2 years ago

The power of FOSS
- thangngoc89 2 years ago
  
  Hasura is also FOSS
  
  zwily 2 years ago
  
  Hasura is open core. Given the massive price increase they just did for their hosted version, I’d expect more and more future features will not land in core at all, to push people to pay.
  
  klabb3 2 years ago
  
  It’s already quite bad unfortunately. Both support for read replicas and metrics(!) are not in the open source version. They have a prometheus exporter but not on free, afaik.
  I was also concerned with people reporting memory consumption/leak issues, as I’m planning to have lots of subscriptions. I don’t know haskell well enough, but from the outside it does match the symptoms of having dug themselves into an architectural complexity hole, which I assume is much harder to navigate with unorthodox tooling and practices.
  Im still rooting for hasura, I like their way of making things dumb, simple and without too much technophilia for its own sake. They genuinely want you to focus on your business problems instead of blasting you with novel and overengineered concepts, like others do.
  
  tango12 2 years ago
  
  (from Hasura)
  Scaling subscriptions is hard, but we work with our users/customers at scale to make sure settings are tweaked correctly.
  We have users running 100k - 1M concurrent users in production for live-event type platforms. It's not completely trivial to benchmark and setup because query patterns, streaming vs live queries etc have an impact, but it works very reliably. No missing events, no problems disconnecting/reconnecting, no need for sticky sessions and so on.
  An initial POC benchmark [1] should be a quick affair so if you're trying it out and run into any problems, please hit me up! Email on my bio.
  [1]: https://github.com/hasura/graphql-bench
  
  klabb3 2 years ago
  
  Thanks, that looks reassuring. I will let share my own benchmarks too if I get around to it.
  
  zwily 2 years ago
  
  Yeah, I love the product. I’ve been using it a few years in a fairly complicated internal tool used by a dozen people, and it’s been amazing. But I’m pretty discouraged by what I see as its future.
  
  tough 2 years ago
  
  AFAIK I have not run supabase in my own infra but they seem to allow you to do so and are quite good citizens publishing all their built tools on top of pg or whatever and as far as I remember with sane licenses.
  I love supabase, neon are new-ish but a great alterantive for hosted serverless databases (they also did a great staging-db-for-pr's) when launched that we integrated at work quite soon while on beta and saved a lot of headaches of introducing new features that touched database before
  
  kiwicopple 2 years ago
  
  > sane licenses
  FYI, our policy is to use only MIT, Apache2, or PostgreSQL licenses. You can run it all on your own infra, with instructions here: https://supabase.com/docs/guides/self-hosting
  
  tango12 2 years ago
  
  Hasura CEO here. Totally hear you.
  We're working on a new serverless infrastructure layer that'll make the pricing better for users compared to a DIY API server or to a self-hosted Hasura.
  It's a significant engineering lift on our side - at its core we're engineering Hasura to achieve 90%+ infrastructure utlization (no cold-start, sub-millisecond auto-scaling), and that's what will allow us to do this.
  Not there yet, but we'll be demo-ing and talking about the engineering at HasuraCon in June!

Pxtl 2 years ago

I've tried to use JSON Schema and the big gap I couldn't figure was how to handle polymorphism.

That is, if I have a case where object

    {
        "Foo": {
            "Type":"Fred"
            "Bar":1
            "Baz":2
            "Quux":2
        }
    }

and the object

    {
        "Foo": {
            "Type":"Waldo"
            "Bar":1
            "Corge":2
            "Xyzzy":7
        }
    }

are both valid, without just allowing any object members or allowing the union of their members.

I did a hack by multiplexing the types into a child-object, but that was ugly and clumsy.

In XSD or any statically-typed programming language I could handle this trivially using types and polymorphism, because "Fred" and "Waldo" would be different types.

But I can't figure out how to do that in Json Schema.

mullsork 2 years ago

IIRC that's what oneOf is for. i.e. a discriminated union / sum type. My experience with oneOf is that tooling support for it is terrible.
- Pxtl 2 years ago
  
  Ah, I didn't realize you could use oneOf with refs or objects, I've only ever seen it used with primitives. Thanks!

mehdix 2 years ago

The article is also a joy to read. Well done.

hhthrowaway1230 2 years ago

Ah yeah super nice always wanted schema support in progress love to see them hammering away on top of the solid foundation of postgres

Kydlaw 2 years ago

> 2022-08-19

Has anyone tried it? Any feedback on it?

wendyshu 2 years ago

It works just fine but it will only give a boolean value valid/invalid, it won't tell you why it's invalid.

korijn 2 years ago

So how do you apply this in the real world when dealing with schema versions and database migrations?

dtech 2 years ago

You could add a schema_version column and use one constraint per schema version. It will only be per major version, if the schema is backwards compatible there's no problem.
willmeyers 2 years ago

Add a new constraint that supports the new schema version, migrate json columns to conform to new schema as need be, and then drop old constraint. This assumes your schema is backwards compatible. If it isn't I would probably not use a DB constraint and just handle everything on the application layer.
- alexvoda 2 years ago
  
  If the schema is not backwards compatible it either means you will have to do a data migration or it means you want to keep track of the schema version along with the data. In both cases it is much easier if the json columns are split off into a separate table.
- korijn 2 years ago
  
  Thanks, that makes sense.

anonu 2 years ago

well written article and looks like a great extension. However, my only issue with JSON Schema is that it becomes unsupportable once your JSON objects get too big: to many keys or too many items in an array for example. If you are looking to find "where" the issue is in your JSON object, most schema validators don't provide enough guidance, they just say "something is wrong" as it appears this one does.

marwis 2 years ago

The detail error from example is pretty awful and missing any useful detail. Is it the limitation of the jsonschema library?

naedish 2 years ago

This is interesting. Would be curious to see if this can replace pydantic for specific cases.

kkirsche 2 years ago

With pydantic’s funding, it wouldn’t surprise me if they enter this space themselves (the company side not the existing python library side)
Funding Ref: https://pydantic.dev/announcement/

rapfaria 2 years ago

Wouldn't you wanna save the json even if the structure is incorrect?

rco8786 2 years ago

If you want that behavior, just don't use json schema.

raydiatian 2 years ago

This looks rad.

roenxi 2 years ago

I can see some technical advantages to supporting JSON schema directly; but I suspect most people will be using this extension because SQL is really ugly and they don't want to use it to set up their schemas in pg proper.

It says a lot about how weak the SQL syntax is. An extension to replace CREATE TABLE with a JSON schema construct would be wildly popular.