Attempto Controlled English

en.wikipedia.org

243 points by solarkraft 6 years ago

svat 6 years ago

Starting in about the 11th century, philosophers in India of the Navya-Nyāya school adopted a similar artificial dialect of Sanskrit, to make their arguments precise. Of course there was no computer implementation backing the formalization, but the goal was similar: to have a language that remains a subset of natural language and (somewhat) intelligible to a typical educated person, yet is a formal language with unambiguous meanings. A good description of this language is in a couple of papers by Ganeri [1].

For example (taken from [2]), instead of saying the straightforward “Caitra goes to the village”, if you wished to be really precise you might say “There is an activity which leads to a connection-activity which has as Agent no one other than Caitra, specified by singularity, [which] is taking place in the present and which has as Object something not different from ‘village’.” (Sounds a bit less unnatural and also a bit less confusing in Sanskrit.)

> [Attempto Controlled English] can serve as knowledge representation…

It is not clear to me when this project started, but there was a 1985 article pointing out the Navya-Nyāya/“Shastric Sanskrit” language as an example of something that is both (somewhat) a natural language and (somewhat) usable for knowledge representation [2]. In its own way that article became, in popular culture in some circles in India, the source of various unfounded memes about Sanskrit being good for computers, or more absurd claims (https://news.ycombinator.com/item?id=14295285).

[1]: http://www.columbia.edu/itc/mealac/pollock/sks/papers/Ganeri... (can't find a link to the second paper now)

[2]: https://www.aaai.org/ojs/index.php/aimagazine/article/view/4...

vinceguidry 6 years ago

I believe Latin was still being used for theological discourse in Europe well into the 1700s for that same reason, precision. Something nice about a dead language is the meanings of the words don't have a way to change from under you.
- schoen 6 years ago
  
  Latin is still perfectly capable of every kind of ambiguity that other natural languages are. For example, I once wrote the sentence
  Quisque aliquid habet quod occultet
  for a t-shirt.
  While the intended meaning is 'everybody has something to hide', in a different context one could imagine that the subject of "occultet" is someone else previously referred to. For example, if we had just been talking about Moxie Marlinspike, we could conceivably read this sentence as 'everybody has something for him [Moxie] to hide'. (Like, all of us users out here have got different things that Moxie can help each of us to protect.)
  There's also a famous joke "malo malo malo malo malo" ('I prefer (being) a bad man in a bad situation to (being) an apple in an apple tree'). I'm sure we can proliferate examples of ambiguous Latin to match every other natural language.
  A cool disambiguation feature in Latin is the distinction between the possessive pronouns "eius" and "suus", where "suus" is used when referring to possessions of the grammatical subject of the sentence and "eius" when referring to someone else's possessions. While English can specify the former ("his/her/its own"), it doesn't have a straightforward way to show that the possessor is not the subject of the sentence.
  You can see the contrast between eius and suus in the text of the Magnificat
  https://en.wikipedia.org/wiki/Magnificat#Text
  where "ancillae suae" ('his handmaiden') occurs in a sentence whose grammatical subject is God, but "nomen eius" ('his name') in a sentence whose grammatical subject is the name. And sure enough, there is an actual disambiguation between the subject of a sentence and someone else later on:
  Suscepit Israel, puerum suum, recordatus misericordiae suae, sicut locutus est ad patres nostros, Abraham et semini eius in saecula.
  He [God] has taken up Israel, his [God's] servant, remembering his [God's] mercy, as he [God] said to our ancestors, Abraham and his [Abraham's] seed forever.
  In this case "his mercy" and "his seed" refer to God's mercy but Abraham's seed, but there is no referential ambiguity about that in the Latin because one is "misericordiae suae" and the other is "semini eius".
  
  svat 6 years ago
  
  That's a great point and very interesting comment. I imagine one of the reasons for this formalized (or if you don't like it, over-precise) language to arise and take hold in the Indian setting may have been the fact that Sanskrit too is capable of an amazing amount of ambiguity, in multiple ways:
  1. Creative re-interpretation/re-analysis: A lot of commentators have re-interpreted existing verses to derive different meanings, and this is very possible to do. To pick a simple example, there is Kālidāsa's verse “…jagataḥ pitaru vande pārvatīparameśvarau” which clearly is a prayer to Pārvatī and Parameśvara (Shiva). But pārvatī-parameśvarau could be re-analyzed as pārvatīpa-rameśvarau (Parvati's and Ramā (Lakshmi)'s husbands), i.e. a prayer to Shiva and Vishnu. Similarly there are Shiva-para and Vishnu-para interpretations of various verses/works, people have written spiritual commentaries on love poetry, etc. So if something straightforward you say can be interpreted to have any meaning whatsoever (exaggerating a bit) by a sufficiently clever commentator, you better be careful :)
  2. Happening naturally: Poets have used this too, what appear to be the same word being used in different settings. For example, here's a prayer that millions of people recite, to Ganesha: “agajānāna-padmārkaṃ gajānanam aharniśam / anekadantam bhaktānām ekadantam upāsmahe” — here the first line has the well-known word “gajānana” (the one with an elephant face), but it also starts with “agajānana” with appears to be the opposite (not an elephant face?). Actually the simple compound “agajānāna-padmārkaṃ” turns out to be made from: a-ga=mountain (that which does not move), thus agajā=daughter of the mountain (Pārvatī), agaja-ānana=Pārvati's face, agajānana-padma=the lotus of Pārvati's face, and the whole word agajānana-padma-arka=the sun to the lotus that is the face of Pārvatī (the sun of course being what makes a lotus bloom), thus it's a simple adjective describing Ganesha (namely that he makes his mother's face bloom with joy). And this is a perfectly straightforward usage of language that most educated readers will simply understand and find unremarkable, not a trick. At most a pleasing coincidence that the same syllables repeat (known as “yamaka”). In the second half, “ekadantam” refers to Ganesha having one tusk, but the “anekadantam” that it starts with is not the opposite of that but simply “anekadam taṃ” (him, who gives many things).
  3. Used intentionally: At the extreme, poets have used ambiguity in the above way and also using puns (śleṣa, words with multiple meanings like kara=hand/doer/tax), to compose poems that have multiple meanings, including entire continuous works of poetry that tell two stories at once (each stanza being interpretable in two ways, and in one instance even up to six ways). There's a book about this called Extreme Poetry (https://cup.columbia.edu/book/extreme-poetry/9780231151603 — unfortunately for the lover of literature, this is a product of modern academia so heavy on theory and light on examples, but worth a look nevertheless). There's even a verse that consists of the syllable yā repeated 32 times (yāyāyāyā...) (https://www.scribd.com/document/6591853/The-Wonder-That-is-S...) which is not just “yeah, yeah” but is intended by the author to mean something. :-)
  
  schoen 6 years ago
  
  > straightforward usage of language that most educated readers will simply understand and find unremarkable, not a trick
  Do you think those readers would recognize the specific words, or that they would successfully parse the words in context at first glance using their language ability?
  > There's even a verse that consists of the syllable yā repeated 32 times
  Wow! It seems like this tradition or at least possibility is shared between Sanskrit and Chinese.
  https://en.wikipedia.org/wiki/Lion-Eating_Poet_in_the_Stone_...
  In general, a lot of the wordplay techniques and genres that are mentioned in the Extreme Poetry book you linked to are also practiced in similar forms in modern English (and to some extent French due to the Oulipo), but many of them were only invented or popularized during the 20th century, so I imagine some of these Sanskrit wordplay traditions are dramatically older.
  https://en.wikipedia.org/wiki/Constrained_writing
  
  svat 6 years ago
  
  I meant that they would successfully parse the words in context (and recognize the specific words and meanings), using their language ability. To use an example from an interview with a modern master of Sanskrit constrained writing (http://www.indictoday.com/interviews/citrakavya-the-wonder-p...), in an English sentence like “She is his panicky Hispanic friend”, the sound “hispanic” may recur when spoken (depending on the speaker's accent I guess) but most listeners would successfully parse it anyway. (I think there is some “lookahead” in the way we parse things... even if “She is Hispanic” and “She is his panicky…” begin with the same set of sounds, when listening to the latter sentence the listener would quickly backtrack and latch on to the correct understanding, which includes unconsciously re-analyzing the earlier words.)
  The difference I was suggesting is that such instances are either rare or awkward in English. But in Sanskrit they are common (helped by poets' enormous skill over centuries, honed in a highly language-focused tradition) even in popular works, and feel natural/elegant.
  
  linguistbreaker 6 years ago
  
  When it comes to ambiguity in languages, linguistics has a distinction/designation of low-context vs. high-context languages which can also be related to the concept of entropy. Latin is often used as an example of a low-context language because so much is explicit in the surface representation. Japanese is often used as an example of a high-context language - meaning that a good deal of context is needed to properly interpret a sentence. This is loosely analagous to Java (low-context) vs. a Groovy DSL (high-context).
  https://en.wikipedia.org/wiki/High-context_and_low-context_c...
  
  schoen 6 years ago
  
  A famous ambiguity built into Latin grammar is the objective vs. subjective genitive.
  https://en.wikipedia.org/wiki/Genitive_case#Functions
  People would usually mention that "amor Dei" ("love of God") could mean love toward God, or love that God has toward someone else.
  Similarly you could have "odium brassicae" ("hatred of cabbage") which could mean a person's attitude toward cabbage, or perhaps a cabbage's attitude toward something or someone else.
  Latin obviously has other ways to be more specific ("odium, quod brassica in te fert" 'hatred that cabbage bears against you' or something) but if you just use a genitive to express an attitudinal relation, it's going to be gramatically ambiguous which direction that attitude flows!
  
  simonebrunozzi 6 years ago
  
  You remind me of the famous "I vitelli dei romani sono belli" [0], which:
  1) Is correct Latin
  2) Sounds Italian, not Latin
  3) In Italian it would mean "God's veals are beautiful"
  4) In Latin it actually means "Go, Vitellio, to the sound of war (made by) the roman God"
  [0]: https://it.wikipedia.org/wiki/I_vitelli_dei_romani_sono_bell... (didn't find an English version of it)
  
  lone-commenter 6 years ago
  
  3) Actually it means "Romans' calfs are beautiful".
  
  schoen 6 years ago
  
  4) The "Vitelli" in Latin would be the vocative of the name "Vitellius" rather than "Vitellio".
  Cool sentence -- I had never heard of it before!
  The Latin etymological equivalent of the Italian sentence would be "illi vitelli de illis Romanis sunt belli", which uses "de" in a way that's unidiomatic for ancient Latin (where it only means something like "from", not "of" in the sense of "belonging to").
  
  simonebrunozzi 6 years ago
  
  Oh my, you are right! Somehow I guess I got confused when trying to explain.
- jquinby 6 years ago
  
  Indeed, Ecclesial Latin is still the official language of the Catholic Church, inasmuch as documents like papal encyclicals are published in Latin first before translation to other languages.
  https://en.wikipedia.org/wiki/Ecclesiastical_Latin#Current_u...
  
  schoen 6 years ago
  
  However, I don't think the people using Latin for these purposes really think that logical precision is an important reason to do so. The Vatican Latinist I briefly studied with, Reginald Foster, doesn't seem to think that way -- he regularly made fun of people who acted as if "Latin came down from heaven in a gold box".
  You might say that the use of Latin has a benefit of precision for the Catholic Church because there are some familiar theological and ecclesiological terms available whose meaning should be clear and which can be matched up with similar vocabulary in older Christian texts. Still, these texts are not using some kind of formal logic, and they don't do a more careful job of avoiding ambiguities than lawyers drafting legislation or contracts in modern languages do.
  
  posterboy 6 years ago
  
  then these texts must be really horrible to read and no less bug ridden than any reasonably sized code-base, except that any bug is swiftly explained away as a feature by the commanding authority. Whether bear arms or it fell like scales from his eyes, this stuff doesn't age well, which is pretty much the point.
  To be fair, though: The later bit is from Greek; but why not tomatoes, huh?
elboru 6 years ago

It would be great to use that kind of language in software development, I think it would avoid a lot of bugs and discussions between QAs and developers.
- empath75 6 years ago
  
  All higher level programming languages attempt to do this more or less.
- avmich 6 years ago
  
  A modern attempt on an unambiguous language is Lojban.
empath75 6 years ago

There were also European attempts:
[1]: https://en.wikipedia.org/wiki/An_Essay_Towards_a_Real_Charac...

grenoire 6 years ago

The meat and potatoes of course is the parser created for this language: https://github.com/Attempto/APE

This is the primary benefit for heavily structuring the sentences. It effectively turns into a programming language with its own form of definitions and statements. My question is: Can we make it Turing-complete by means of recursive sentences? Maybe by using the sentences that redefine proper nouns?

posterboy 6 years ago

you are being nothing but evil
Stevvo 6 years ago

Possibly, but then it would lose the benefit of being understood by any reader of English; recursion requires a little mental gymnastics.

mmastrac 6 years ago

Has anyone attempto'd to combine this with state machines in a way that you could specify rules in an english-like way, with a computer checking that you've resolved all ambiguities?

For example, a system that starts and stops based on observing parts of another system, but also allowing a user to input explicit start/stop commands would require additional rules:

  The system is stopped if the subsystem is in state A.
  The system is started if the subsystem is not in state A.
  The system is stopped if the user sets the stop flag.
  The system is started if the user unsets the stop flag.

  Error: ambiguous state if subsystem not in state A and user sets the stop flag.

There's some real value here in allowing non-programmers to work through all the edge cases of a system, while simultaneously adding tools to convert standard English to ACE (eg: identifying ambiguous English and asking them to rewrite pronouns or split sentences up).

johnfactorial 6 years ago

Have you seen Cucumber? https://cucumber.io/docs It's a system that attempts to enable specifications for a software product to be written in a way that can be parsed by an automated testing framework.
In theory, you carefully say what you want the software to do in given scenarios, human developers understand the spec as-is, and automated tests can read and execute the spec as-is as a test.
I have no professional experience with it because all the docs make it look like something I would run from: lots of in-code scaffolding and talking like a computer just to get the same job done. But maybe you'll like it.
- rgoulter 6 years ago
  
  FWIW, the "specifications can be tested" is nothing magical. It binds statements of the Cucumber spec to snippets of code using regular expressions. (The effect is about the same as titles given to blocks of code in Ruby's RSpec).
  The "it's not code" part of Cucumber is that the test document is something that a non-programmer won't freak out at manipulating. (But a developer would still have to make sure the code snippets bind to the statements, so...).
  Still, I like it. I really like the suggested tactics discussed in Specification By Example https://gojko.net/books/specification-by-example/
- hinkley 6 years ago
  
  I have found that test suites solve this problem about 98% of the time and are easier to teach people to debug.
  That last 2% is pretty damned awkward though. When you have 2 preconditions your tests go cartesian, you have to pick a dominant one. It's always a matter of one sucking less than the other, but that situation isn't stable. It tends to flip as bugs are identified or requirements shift.
  Thing is, if it's 3 concerns, and definitely by 4, you're probably due for a refactor anyway, instead of reaching for Cucumber or a similar tool.
schoen 6 years ago

It seems like you could perhaps write a translator from ACE to TLA+ or another specification language
https://en.wikipedia.org/wiki/TLA%2B
and formally verify properties (but perhaps that's something that the ACE developers already do).
- mmastrac 6 years ago
  
  That would be very interesting. I couldn't find any references to TLA+ and ACE together, so it might be a novel area of research.

deckar01 6 years ago

This reminds me of when my team started code reviewing feature documentation. The text was going to have to be read hundreds of times and understanding it would be critical for QA to function properly. A common problem we ran into was sentences that were hard to understand without reading them multiple times due to small connective parts of speech being omitted. It intuitively felt wrong, but it was not immediately obvious why. After researching it a little every case seemed to boil down to resolving ambiguity. English is filled with ambiguous words that can be multiple parts of speech. These seemingly unnecessary pieces of syntax are actually really efficient at hinting what part of speech will come next so that the context is immediately understood.

mxcrossb 6 years ago

Wikipedia articles like this bug me. It is a well written introduction to the project, but why is that on the Wikipedia? Put that on your own website! I wish more people would follow wikipedia’s very clean and easy style.

This is of course why the article can’t follow wikipedia’s citation rules. And also all the references just link to articles by one group, so the article can’t give you any real context. Is this a serious, notable work? Or is someone just boosting their search rankings with a Wikipedia link?

edtechdev 6 years ago

Yeah, why would something well-written be on Wikipedia?
It's definitely not more notable than a list of butterflies on stamps of Australia https://en.wikipedia.org/wiki/List_of_butterflies_on_stamps_...
throwaway744678 6 years ago

This would have been cooler if they had written the page in the language itself (going meta, here)
- avmich 6 years ago
  
  More appropriate would be to have English Wikipedia translated wholesale to ACE. In that one the article about ACE would be naturally in ACE.
  
  throwaway744678 6 years ago
  
  Sounds great, let's go and fork Wikipedia.
- garmaine 6 years ago
  
  Heh, it actually bugged me while reading that it wasn’t written in ACE.
chadlavi 6 years ago

It seems like the latter, though if so, it's surprising to me that it hasn't been removed. Wikipedia's editors are pretty brutal about the notoriety threshold.
- pessimizer 6 years ago
  
  Only with things in which editors exist who feel confident enough (sometimes wrongly) to judge their notability. You can completely make up pages describing concepts in obscure fields, as long if you can generate a citation or three - bonus points if the papers cited are unintelligible to all but a few dozen people in the world, and don't even mention the concept.

lopmotr 6 years ago

This sounds kind of like the simplified English used on aeroplanes [1] except more formal and less for humans to write. That leads to signs like "no step" instead of "don't stand on this".

[1] https://en.wikipedia.org/wiki/Simplified_Technical_English

glenneroo 6 years ago

Finally I understand why all those years of Google searches for documentation about ACE (ADAPTIVE Communication Environment)[1] have been so painful. To make matters worse, versioning was very close - Attempto is at version 6.7 while the programming ACE is at 6.5.

[1]: http://download.dre.vanderbilt.edu/

avmich 6 years ago

A friend of mine, who worked with Verilog a lot, was expressing some anger about extension .v on the Internet (Github?) mostly related to Coq, not Verilog. Oh well...

dlojudice 6 years ago

As I'm getting older I feel the need to make the code as readable as possible [1] to a point where non-programmers would be able to read it, especially in complex business rules enviroments (ex: finance, insurance, health, etc).

The quote "the limits of my language are the limits of my world" [2] shows when we try expressing these business rules in high-level programing languages like C, C#, Javascript. There are (a significant) parts of my code where I wish to not be limited by the syntax of the language.

DSLs helps, but usually takes a disproportionate effort to implement.

Would be great see more natural languages embedded on our day to day programing language.

[1] https://twitter.com/ianmiell/status/1144154072217522176?s=19 [2] https://www.quora.com/What-did-Ludwig-Wittgenstein-mean-by-t...

parentheses 6 years ago

"A girl had no name"

Jaqen H'ghar spoke ACE

carapace 6 years ago

FWIW, I thought it might be something like E-Prime but instead it's a little bit like Loglan, eh?

https://en.wikipedia.org/wiki/E-Prime

https://en.wikipedia.org/wiki/Loglan

Animats 6 years ago

Bell Labs did something like this, called "FASE", in 1969.[1] Didn't go anywhere; way too early.

[1] http://digitalcollections.library.cmu.edu/awweb/awarchive?ty...

Koshkin 6 years ago

The idea is similar to one behind the Structured English QUEry Language we all know and love.

auggierose 6 years ago

Only in as much that the idea is similar to most programming languages ...

rthomas6 6 years ago

Can you define new nouns and verbs? AKA objects and functions? Imagine being able to have the specification, documentation, and actual code implementation all be the same thing!

dragonwriter 6 years ago

> Can you define new nouns and verbs?
There is defined syntax for variables, but they are expressly nouns exclusively.

airstrike 6 years ago

To be honest, I'm a bit disappointed this article isn't written in ACE

Or maybe "To be honest, I do not like that the Wikipedia editors chose not to write this article in ACE"

TremendousJudge 6 years ago

Isn't it? The "Overview" section doesn't read like standard English at all. I thought it may be ACE
- pierrec 6 years ago
  
  It's not. Take this sentence: "ACE construction rules require that each noun be introduced by a determiner (a, every, no, some, at least 5, ...)."
  The sentence does not follow the rule it describes, since "rules" does not have a determiner. It might be worth a try, but I suspect ACE would make the article awkward and harder to read.

jgalt212 6 years ago

English: The Good Parts

http://shop.oreilly.com/product/9780596517748.do

tunesmith 6 years ago

If fact databases were written in this language, wouldn't it make NLP much easier? To allow machines to more easily query fact databases and infer new facts?

capableweb 6 years ago

It's basically what Datomic (and other Datalog inspired tools) do, checkout https://docs.datomic.com/on-prem/architecture.html
- avmich 6 years ago
  
  From what I see Datomic isn't written with anything resembling natural language - not anywhere nearly ACE.
  Can you give good examples to the contrary?
  
  capableweb 6 years ago
  
  I was replying especially to the "easily query fact databases " and less about the general post about ACE. I could imagine though that you could take ACE syntax and transform it into Datalog/Datomic queries quite easily.

droithomme 6 years ago

This seems similar to Applescript. Someone wants to make a precise language that looks like English and can be read and written by non-programmers. Ultimately you get the reading part, sort of, but writing is harder. You end up with very subtle and precise rules in the grammar which are not as easily inferred correctly as with many common programming languages. Just writing sentences that you think should be correct won't work. You need to fully understand the programming language. So you're back at requiring programmers to work with it.

iainmerrick 6 years ago

It might be great to have tool-assisted writing, where you write naturally and the computer says “do you mean: [pedantic unambiguous version]”.
The main risk would be people blindly accepting corrections that actually change the meaning -- just like any other autocorrection system. I’m not sure how to mitigate against that.

joaojeronimo 6 years ago

Newspeak!

trhway 6 years ago

Actually legalese comes to mind. Therein and hereto ... I mean the same purported goal to disambiguate and precisely specify. Because of complexity of the result it is hard to say whether that goal is achieved or not :)