The CUE Data Constraint Language

91 points by devj 6 years ago

Huge congrats to mpvl@ for the final release of CUE on github.

mpvl@ is the main author of borgcfg and one of the two designers of BCL (borg config language). I was maintaining borgcfg 2017-2019, and naturally had occasions to chat with him. I was fortunate to learn the existence of CUE a while ago, and was convinced that its theoretical foundation is solid. That foundation is simply not there in most config languages. CUE (along with Kustomize) appears to me to be the 2 major new ideas of addressing the flaws in BCL, and other template/inheritance-based config languages.

I am very eager to see how the community and industry react to CUE.

purpleidea 6 years ago

CUE itself is rather cool. The fact that people think kube and templating YAML is the answer is very uncool. CUE is an okay solution around a bad problem.
Best to avoid this kube+YAML nightmare and go with something with a solid core design. No the Borg paper had no real convince logic in it.
- justicezyx 6 years ago
  
  > No the Borg paper had no real convince logic in it.
  What does this mean?
  Are you saying that Borg paper [1] has some arguments for using configuration?
  It sounds like you have a good insight around the config mess in infrastructure management.
  [1] https://ai.google/research/pubs/pub43438

dmitshur 6 years ago

It’s interesting for me to see a project made mostly by 1 person that happens to be employed by Google described as “by Google”. [1] Not that there’s no truth to it, just that the reality is much more nuanced than a simple phrase can capture.

[1] https://github.com/cuelang/cue/graphs/contributors

gravypod 6 years ago

It looks like Google owns the project and that, to contribute, you have to turn over copyrights [0]. I think the ownership makes it "by Google" more than employing the main dev.
[0] - https://cue.googlesource.com/cue/+/HEAD/doc/contribute.md#st...
- jsolson 6 years ago
  
  That's common to most projects released by Googlers. Options are either a getting the IP officially assigned over to yourself or releasing it under a Google copyright with a CLA: https://opensource.google.com/docs/creating/
  The latter is lower friction, in my opinion.
  This is especially common for "personal" projects that people end up working on or using on the clock.
  I don't know if this was the case with Cue specifically (this is the first I've heard of the project, but less boilerplate in my JSON has some appeal).
- magicalist 6 years ago
  
  > that, to contribute, you have to turn over copyrights
  *grant a copyright license
  https://cla.developers.google.com/about/google-individual

leshow 6 years ago

I like dhall better than this, https://dhall-lang.org/

svnpenn 6 years ago
I just cant understand how a sane person would use syntax like this:
```
    { home       = "/home/bill"
    , privateKey = "/home/bill/id_ed25519"
    , publicKey  = "/home/blil/id_ed25519.pub"
    }
```
and call it "human friendly" with a straight face. just use the normal syntax:
```
    {
       home = "/home/bill",
       privateKey = "/home/bill/id_ed25519",
       publicKey  = "/home/blil/id_ed25519.pub"
    }
```
advocates will say, "well you can delete the last line without it breaking". ok. but what about the first line? youve just moved the problem. its frustrating because this is a solved problem. either remove comma altogether (YAML) or allow trailing comma (Python). Dont do this weird leading comma stuff. Its just living in denial.
- swsieber 6 years ago
  
  Hmm... the advocates I've run into say that you can add a newline with out modifying the old last line. And that one makes sense, since you tend to (or at least I tend to) add things at the end.
  
  saagarjha 6 years ago
  
  That's why every well-designed language lets you put an extra commas on the last line.
- throwawayjava 6 years ago
  
  Really? They seem equally readable to me.
  I can't tell why a sane person would care much one way or the other, other than the way sane people care about vim vs emacs or tabs vs spaces or Kirk vs Picard...
AzzieElbab 6 years ago

In the very least, dhall is typed
- cwp 6 years ago
  
  So is CUE. See the section on types and values here - https://github.com/cuelang/cue/blob/master/doc/tutorial/basi...
nikolay 6 years ago

How about we see which one gets adopted more? I guarantee you, Cue will rule. Dhall is too weird to become mainstream.
- kreetx 6 years ago
  
  The counter could be that we don't want anything from the big corp either.
  But regardless, would you think anything new will become mainstream in this space? We already have json, yaml, perhaps a few more. It appears that the problem that the new ones solve start paying off when the software they configure go over some pretty high complexity threshold.
  
  nikolay 6 years ago
  
  So, you don't use Go, Kubernetes, and a bunch of other projects, just because they come from Google?
  
  crdrost 6 years ago
  
  I mean there would be better reasons to not use Go or Kubernetes than just that they're Google tech. Both can add tremendous overheads to otherwise simple prototypes. Here, a Google aversion can be more pronounced because the thing is not obviously a bad choice.
  With Cue, I personally think it is uglier than TOML and Dhall and only slightly less reader-ambiguous than YAML. But it’s like game theory: you have these strategies you add in the middle which come with half the costs and half the benefits and so they aren't strongly dominated by any of the other options and they get their fair share of the Nash equilibrium because they occupy that middle ground. So if one cannot say “this is clearly a wrong choice” one can nevertheless default to “this is a possible choice but I just have a lot of options and I can choose to be more arbitrary about what's important to me.
  At one point in the early 00s I needed a shared hosting provider for a small site, and all of them looked unbelievably the same. I adopted the first narrowing criterion of “Reject any hosting provider which has a name of two words mashed together in PascalCase”, which IIRC cleared away well over half of the competition. There's nothing wrong with them, I just had a lot of options so I could afford to say “this looks so ugly and repetitive to me and I don't want to do it.”
  That cleared away a lot but not enough to narrow it down to a few options, so I then arbitrarily limited it again, “no free hosting providers whose Contact Us links contain stock photography of a solitary white woman wearing a headset.” (Like she can be in a group, she can be a minority race, she can be holding a phone, she can be an actual employee of the given company, it could be somewhere else where it is not a link you click on, like a splash page—any of those is fine, just stop trying to suggest “we are trendy and support diversity but not so much diversity that you worry your precious white man head about being routed to a call center.”) That one was actually unreasonably effective and got it down another 80-90% or so to a manageable handful.
  When you have a lot of middling options that cannot strictly be eliminated for technical merit the soft things are valuable for winnowing.
  
  zeliard 6 years ago
  
  >With Cue, I personally think it is uglier than TOML and Dhall and only slightly less reader-ambiguous than YAML
  I think you're missing the main point of these languages.
  YAML, TOML, JSON are all basically the same thing modulo syntactic differences, which are largely a matter of taste.
  Dhall, Cue, Jsonnet, BCL, etc. are in a different league - they allow you to express some computations with data in your config. It helps you to eliminate boilerplate and duplication in your config, avoid copy-paste mistakes, make configs more concise, express some complex abstractions specific to your domain. They're miniature programming languages for your configuration data.
  If all you have is a few dozen key/value pairs to specify, it doesn't really matter which you choose, just a matter of taste. But as your config files suddenly span thousands LoC, duplicate same things over and over, need to configure dozens of different things (e.g. with jsonnet out of a single jsonnet source you can generate multiple configs for basically anything json/yaml/ini driven), get edited by multiple teams and you start looking for better ways to structure this mess, that's when these more advanced config languages come in to rescue your sanity.
  
  nikolay 6 years ago
  
  After I dug more into Cue, it's actually super complicated and ugly. Jsonnet is definitely cleaner and easier to learn if TOML is too static for you.
  It's nice to have type safety, but the way it's implemented in Cue is unintuitive and too complex to be practical.
  The approach of JSON (borrowed from XML) to have a separate schema file is better and you can use to to just validate.
  
  ithkuil 6 years ago
  
  It might be unintuitive (depending on youtlr background) but I wonder where do you see the complexity. It's actually one of the simplest approach to this problem space I've seen lately. I'm honestly curious about what are your concerns with it.
  
  nikolay 6 years ago
  
  I am using Jsonnet daily. And I thought Jsonnet was weird - it took me a while to figure it our, and many of the developers on the team struggle with it. But most developers will have no clue with Cue - I guarantee you!
  It is not intuitive and a lot of things need to be memorized, looked up, etc.
  No wonder both Jsonnet and Cue arrive from Google!
  
  ithkuil 6 years ago
  
  It's indeed a hard problem in language design: the amount of investment people are willing to make depends on so many factors, like pre-existing adoption/popularity, and how often you need to use it to get some other work done.
  Contrast it with the programming language you use at your day job. Chances are that you invested many hours in order to learn it properly and probably you don't think you necessarily have exhausted things to learn about it. It also probably has many quirks and emerging complexity that wasn't obvious to the its designers or resulted from trade-offs.
  Yet we put up with all this because, well because we have no choice. When you need to get something done that needs writing some code, you need to enter the realm of programming languages and people build their careers around mastering them.
  The problem around configurations is often underrated.
  Solutions to the problem of not even wanting to learn a new thing ranges from "It's just configuration after all, why can't it be just some basic structures and if you need more you're clearly doing something wrong" to "well just stick your $favourite_language as a template engine and you can emit whatever you want".
  Jsonnet strikes a nice balance and I like it. It's easy to grasp for those who know some functional programming languages and being untyped you can gloss over a sizeable amount of learning curve. But those shortcuts don't come for free. I use jsonnet at dayjob and the general feeling is that things do get complex white quickly and it's hard to know where values come from. I always thought that with some effort we could add some nice tooling to help with all of that, but cue caught my eye and I'm willing to give it a shot.
  
  nikolay 6 years ago
  
  My issues with Jsonnet is:
  - weird std lib, which doesn't follow other popular std libs - for example, JavaScript;
  - the inability to have dynamic import paths - the author cites some security reasons, but I don't buy them;
  - no wildcard imports!
  - lazy evaluation is a paradigm many struggle with;
  - redundant 'local' keyword - Bash-like, should have used 'var' or 'let' instead;
  - slow!
  - `importstr` does not follow Linux conventions about trailing newline;
  - no type safety!
  - could borrow some operators from Cue!
  - no native support for YAML and TOML!
  And many more!
  
  ithkuil 6 years ago
  
  All valid points.
  Fwiw, in https://github.com/bitnami/kubecfg we have added a native functions for yaml.
  
  kreetx 6 years ago
  
  I do, but these are more complex things where they actually add some type of value (even if it's "this is safe to use because it will be around forever since this big company has vested a lot into it"). But cue is a config language which in itself is a niche area. Also, it is much simpler to rewrite configs from one language to the other, so it's safer to choose the "weird" one.

thdxr 6 years ago

We've been using Google's jsonnet with k8s and it's been a life saver. Think it works pretty well so wondering what gapes they're trying to address with Cue

wstrange 6 years ago

This issue is a good comparison of the two approaches:
https://github.com/cuelang/cue/issues/33

oweiler 6 years ago

The JSON sugar way too much for me. See folding as an example

https://github.com/cuelang/cue/blob/master/doc/tutorial/basi...

nikolay 6 years ago

True. I'd rather have dots instead of spaces.
- oweiler 6 years ago
  
  That would at least be somewhat intuitive.
- dalore 6 years ago
  
  Exactly, now how would you do a key that contained spaces?
  
  ithkuil 6 years ago
  
  How would you do a key that contains a dot? :-)
  
  pas 6 years ago
  
  Quotes? Escape the space?
svnpenn 6 years ago
Whoa yeah agree, as YAML would give you something completely different:
```
    {
       "outer middle inner": 3
    }
```
tln 6 years ago

TOML has a similar feature.
CUE appears absolutely stuffed with features. If there are any misfeatures, apart from having so many features, they don't jump out at me.
- oweiler 6 years ago
  
  For me having too many features IS a misfeature.

jefftk 6 years ago

I found paging through the tutorial a helpful way to understand what it does: https://cue.googlesource.com/cue/+/HEAD/doc/tutorial/basics/...

parhamn 6 years ago

There are quite a few of these things (Dahl, Jsonnet, etc). and I've always thought it would be impossible to get a large enough team to learn them.

At a pervious organization I worked I wrote a Kubernetes manager (docker build + kube config generation + blue/green deploy) that was essentially TOML + Jinja (we we're a python shop). A django + pg + redis app would look like [1].

[1] https://gist.github.com/pnegahdar/1e90f42c1686009e1ff9392b79...

There were definitely some limitations to doing this (some good, like reduced kubernetes surface) but in the end I think we got a large percent of the team writing kubernetes configs through cargo-culting while having generally sane defaults, environment isolation, etc.

I wonder how organizations get people to use these things in a broad way? I'd still be hesitant to write what was described above in something like jsonnet even though it would be much more semantically correct -- I can't imagine getting anyone outside of the ops team to use it!

Generally though, TOML works really well with templating languages (jinja, golang templates, etc) if you include some helper funcs. You can even pass the toml to itself on rendering til you hit a fixed point for references and such.

wodenokoto 6 years ago

Which of all the links will show a simple example of why one wants to store configurations in a constraint language?

deckar01 6 years ago

I think it is mostly about being able to declare the config schema in the same language. You can declare a base config file with types instead of values that has to be extended. The parser will validate the types and insert defaults when it is merging the configs. It also has some nice features to avoid duplication and boilerplate.
References: https://github.com/cuelang/cue/blob/master/doc/tutorial/basi...
Disjunctions: https://github.com/cuelang/cue/blob/master/doc/tutorial/basi...
Templates: https://github.com/cuelang/cue/blob/master/doc/tutorial/basi...
blankaccount 6 years ago

I also flicked through to find a little example use case and solution, but was disappointed.

sansnomme 6 years ago

At this rate we might as well just use Prolog for configuration.

nikolay 6 years ago

You don't recall Marelle? https://news.ycombinator.com/item?id=6701362

mickeypi 6 years ago

Unfortunately it inherits JSON’s lack of support for dates.

ithkuil 6 years ago

It has a way to define types with regexps, so you can get a decent approximation of date support.
I wish there was a way to implement more complex validation rules with custom code, although I'm not sure which language that would better be.
tlb 6 years ago

What’s the use case for dates in a config language?
crontab is an example that comes to mind, but it uses its own system of date-time patterns. Javascript Date objects wouldnt be suitable.
- jsnell 6 years ago
  
  Date arithmetic, parsing and formatting are very useful in config languages. Maybe your configs needs to express that when a program is run, the input should be read from a log file from two days ago named like /foo/bar/2019/07/02/access.log. That requires knowing what the current time is, computing what the time was two days ago, and formatting it in an arbitrary custom format (in this case YYYY/MM/DD).
  That doesn't necessarily mean that the language needs dates as a first class data type with a literal syntax etc.
skybrian 6 years ago

Cue hasn't reached 1.0 yet, so maybe it will be added? I don't see an issue for it, but you could ask.

AlphaSite 6 years ago

This seems like one of those languages which is designed to be easy to write and easy to parse, but reader ambiguity appears to have been left on the wayside.

ithkuil 6 years ago

One of the nice things is that all the merging rules are commutative associative and idempotent. Not subtly changing the semantics when you move fields around (willingly or unexpectedly during e.g. a git merge operation) is a huge plus for me. From the lexical point of view, I don't see the usual red flags that plague yaml (where v1.0 is a string but 1.0 is a number

ytklx 6 years ago

Shameless plug: A JSON-like configuration language with considerably less features, no magic and no ambiguity: https://github.com/yuce/jacl

redder2 6 years ago

I like YAML and TOML for fro configs. Never liked JSON at all. There are some nice sugars when in comes to quotes, commas that annoy me in JSON but it seems way to complicated and filled with tons of features to learn when here are established config languages that work and have a huge future set you just do not need to use at all like YAML.

Also its looks ugly having even have roon in curlies. Have not look at everything, I am sure Google has its use for this but I think its to much for just configs. And if someone can point why they not just use YAML I would be interested.

ithkuil 6 years ago

If you can get away with just a static config file then pick toml or whatever and live a happy life (less so with yaml because it has some traps like when type autodetection fails spectacularly, try "a: yes".
Sometimes you might have a legitimate reason to build complex systems out of configs (cloud formation templates, kubernetes, terraform). In that case you might need a bridle to enforce some structure, to keep common things common, and to ensure that variants (such as testing and staging environments) can be expressed in a robust way.

syn0byte 6 years ago

We need to stop using the phrase "don't reinvent the wheel" and start using the phrase "don't reinvent the hammer" CS/IT seems to have some weird solipsistic fetish for tooling in some dogmatic quest for abstraction.

I'm sure there is great for someone somewhere doing something but I just see another turtle in the stack...

root_axis 6 years ago

So don't use it? Why do so many people on this site complain about people making stuff? It's really odd. Just don't use it.
- stuffbyspencer 6 years ago
  
  I agree with this. It's sorta strange to me that the above poster is aware enough to mention that this "might be useful to someone", but then dogs on it.
  If you don't have a need for it, perhaps consider it is not for you.
wsc981 6 years ago

I don't see the value either, but I guess perhaps some people will find a purpose for this tool and regardless some developer might have had some fun working on this, perhaps learned a bit from working on this project.

nikolay 6 years ago

Another configuration language, which, I think, is pretty clean and down to Earth: https://github.com/zaphar/ucg

nikolay 6 years ago

Ewww. Raw strings are yucky! "_|_" is just as terrible.

unixhero 6 years ago

Does this have any relation with .cue sheets used to describe data structures within .bin/.iso file dumps of data CDs, audio CDs, VCDs, SVCD, DVDs and Blu-ray's???

unixhero 6 years ago

Edit: No it does not. The cue format I am referring to here was published by CDRWIN developers. Later also used by ExactAudioCopy, Ahead Nero and all other relevant CD-R burning software at the time, and currently as well.

gaze 6 years ago

What is it and what does it do? Examples for how you'd apply this in production? Anything?

billfruit 6 years ago

I wish if something similar was available for binary data, im not aware of any.

avmich 6 years ago

How does it compare with ProtoBuffer?

rockwotj 6 years ago

Protobufs are a data serialization format. Mostly for taking typed data and be able to parse and write to bytes.
This is something that could generate protobuf values, as you can have expressions and such. Really it does more than protobuf and doesn't handle serialization. It's main use seems to be a config language

iddan 6 years ago

I needed this for so long. Looks