What does the ??!??! operator do in C?

636 points by isomorph 3 years ago

susam 3 years ago

I learnt C, more than 20 years ago, from the book The C Programming Language written by Brian W. Kernighan and Dennis M. Ritchie, also known as K&R. I read the book almost cover to cover all the way from the preface at the beginning to its three appendices at the end while solving all the exercises that each chapter presented. As someone who knew very little about programming languages back then, this book was formative in my journey of becoming a programmer.

Appendix A (Reference Manual) of the book broadened my outlook on programming languages by providing me a glimpse of what goes into formally specifying a programming language. Section A.12 (Preprocessing) of this appendix specifies trigraph sequences. Quoting from the section:

> Preprocessing itself takes place in several logically successive phases that may, in a particular implementation, be condensed.

> 1. First, trigraph sequences as described in Par.A.12.1 are replaced by their equivalents. Should the operating system environment require it, newline characters are introduced between the lines of the source file.

Then section A.12.1 (Trigraph Sequences) further elaborates trigraph sequences in more detail. Quoting this section below:

> The character set of C source programs is contained within seven-bit ASCII, but is a superset of the ISO 646-1983 Invariant Code Set. In order to enable programs to be represented in the reduced set, all occurrences of the following trigraph sequences are replaced by the corresponding single character. This replacement occurs before any other processing.

  ??=  #
  ??/  \
  ??'  ^
  ??(  [
  ??)  ]
  ??!  |
  ??<  {
  ??>  }
  ??-  ~

> No other such replacements occur.

> Trigraph sequences are new with the ANSI standard.

vonwoodson 3 years ago

To be fair, and definitely a part of its appeal, the K&R is only 312 pages long. It covers the language and most of the standard library you’ll need.
As opposed to say, “Learn You a Haskell for Great Good! A Beginner's Guide” which is 881 pages and doesn’t even moderately cover the prelude.
Anyway, C is an amazing language and I keep a K&R on my phone as a pdf
- 1-more 3 years ago
  
  My copy of Learn You a Haskell ends on page 360 and the index is another 16 pages. Is this some weirdly shaped PDF of it with tiny phone-sized pages?
  
  sli 3 years ago
  
  According to its listing on the No Starch website[0], the PDF is (currently) 400 pages. I'm not a fan of that particular Haskell book but it's in no way even close to 881 pages.
  [0]: https://nostarch.com/lyah.htm
  
  Kalq 3 years ago
  
  Why don't you like that book and which Haskell book would you recommend then?
  
  bbarnett 3 years ago
  
  That's 800 pages double sided! Simply because the back side is blank, is no reason to nitpick.
  
  tremon 3 years ago
  
  I don't get it. Wouldn't 800 pages double-sided require a PDF containing 1600 pages?
  
  stonemetal12 3 years ago
  
  400 single sided printed pages. If you are trying to inflate the page count that is 800 pages, but every other page is blank.
  
  bbarnett 3 years ago
  
  Sorry, been looking at new cars, and salespeople have been explaining new car, extended warranties to me.
  
  vonwoodson 3 years ago
  
  Could be, now that you mention it. Which, now, makes me wonder if my K&R is even shorter.
  
  DonHopkins 3 years ago
  
  There's one thing that's usually true about every well used copy of K&R, and that's that it always opens up easily and by default to the page with the huge operator precedence order and associativity table, because that's the page everyone needs to refer to the most often, which is usually bookmarked, but in an old copy doesn't even need to be.
  That says something about the C programming language design, that I as a deeply stack based FORTH programmer and explicitly parenthetical LISP programmer find horrible.
  
  dahfizz 3 years ago
  
  Just use parentheses if you're doing many operations on one line? This is common sense in any programming language.
  I've been programming in C for years and never had an issue with operator precedence.
  
  falcrist 3 years ago
  
  There's a lot of exaggeration that goes on with certain fans of postfix systems when they talk about infix systems.
  For example: I really like HP calculators, so I'm in several facebook groups for fans of RPN/RPL and HP specifically. Sometimes a few of them go way too far out of their way to try to demonstrate how inferior algebraic systems must be.
  For the record, my copy of K&R wants to open into either section 7.6 or appendix B. No idea what this says about me, though.
  
  michaelcampbell 3 years ago
  
  Indeed, different strokes for different uses. I "grokked" RPN pretty early in my career and use it for all my calculator stuff... only. infix for programming languages is equally comfortable for me. I still have to think and study quite hard the prefix LISP notation. I get it, it's just not internalized like the other 2.
  
  mrguyorama 3 years ago
  
  I don't know what it is with C programmers and programming "terseness". It hasn't been the 60s since the 60s, and you have gobs of memory available, your source code can have syntactic sugar for the purpose of readability and the world won't end.
  
  dahfizz 3 years ago
  
  Can you give an example of syntactic sugar in a modern language that makes order of operations a non-issue?
  
  1-more 3 years ago
  
  These aren't syntactic sugar, but formatters. I rather like Ormolu for Haskell and elm-format for Elm. I occasionally type a bunch of parenthesis so that I'll for sure have the right order of operations. Then the format on save removes the redundant ones. It's delightful. The typecheckers and tendency to wrap primitives in a semantically significant constructor help with that.
  
  rdlw 3 years ago
  
  Parentheses
  
  dahfizz 3 years ago
  
  Yeah, that's why I said
  > Just use parentheses
  Not sure what that has to do with memory usage..?
  
  rdlw 3 years ago
  
  Just to clarify, I'm not the original person you replied to, and we all agree about using parentheses.
  mrguyorama just implied that the only reason you would check the operator precedence chart would be to shave a few bytes off the size of your source code, which has not been a reasonable reason to do anything for many decades, and yet C programmers seem to like to do it anyway.
  
  coldpie 3 years ago
  
  I agree, but I'm still cracking open that page when I'm reading someone else's code. I guess you work solo most of the time?
  
  dahfizz 3 years ago
  
  Nope, I work on a team. I guess we all have similar instincts about what is readable.
  
  DonHopkins 3 years ago
  
  It's not that YOU have any issues with it -- because you're perfect.
  https://www.youtube.com/watch?v=fKHaNIEa6kA
  It's about the poor people who read your code that relies on both you and them having perfectly memorized every single little detail of operator precedence and associativity, instead of simply and consistently using parenthesis.
  Quick without looking: can you tell me what the precedence and associativity of the ternary ?: operator is?
  The designer of PHP got it wrong (which isn't surprising given his proudly self proclaimed contempt towards computer science and incompetence at parser writing), but then millions of PHP programmers also learned it the wrong way.
  https://en.wikiquote.org/wiki/Rasmus_Lerdorf
  Do you really want any of those people who were corrupted by PHP messing around with your code, if you relied on it being one way, and they assume it works the other way?
  It's not that you can't tell what it actually does, it's that you can't tell what the person who wrote it actually meant, which is more important than what it actually does, especially when it has bugs.
  Don't do many operations on one line, AND do use parenthesis, AND do use indentation, with no exceptions except for very simple expressions. Take every opportunity to use line breaks and vertical alignment to make symmetry and repetition and nesting visually obvious, like:
  float distance = sqrt( (x * x) + (y * y))
  Redundant parens, plus breaking expressions into multiple lines and indenting according to depth, unambiguously express programmer INTENT, so the reader doesn't need to wonder if the person who wrote it had a clue or was just showboating.
  Just use parenthesis, and put a comment on it, sailor.
  https://wellcomecollection.org/works/m33njwx3/items
  My copy of The Little Schemer won't open to page 13 because of the jelly stains.
  https://vpb.smallyu.net/[Type]%20books/The%20Little%20Scheme...
  
  dahfizz 3 years ago
  
  What is with the gay sailor condom ad? You're being completely ridiculous.
  I'm glad you enjoy Forth so much, I guess. I'm sure postfix will catch on any day now.
  
  benj111 3 years ago
  
  Isn't that what brackets are for?
  Well that's what I do. If you're having to look it up to write it, you're going to have to look it up to read it again down the line.
  
  belter 3 years ago
  
  Can I up vote for first paragraph, and down vote the second? :-)
  
  zh3 3 years ago
  
  Just checked my first edition K&R (copyright date 1978, last page number is 228 (end of index) after which there is a single tearout page for other "High Quality C and Unix system titles" from Prentice Hall. There's also a front section that has about 10 pages in roman numerals (2 pages of prefix starting with 'ix') so about 240 pages total.
  Page 1 starts: "Chapter 0: Introduction".
  
  checkyoursudo 3 years ago
  
  Page 1? What, it's not zero indexed?
  
  DonHopkins 3 years ago
  
  It looks like it's 1-indexed, but it core dumps when you get to the last page.
- metafunctor 3 years ago
  
  My Second Edition K&R (purchased in the mid-90s) is only 272 pages, including the index.
- pjmlp 3 years ago
  
  Which is why most workloads bring POSIX for the ride as means to make anything actually usefull.
  
  dahfizz 3 years ago
  
  POSIX is an OS API. You're complaining that C interacts with the operating system to do useful work? What language do you use that can do useful work without interacting with the OS?
  
  pjmlp 3 years ago
  
  POSIX is the part of the C standard library in UNIX, that should have been part of ISO C as well.
  It wasn't, so any C application that is more than a toy hello world with stdio, pings back into POSIX for any kind of meaningful work, that wants to stay cross platform.
  Basically it the the C runtime library, that wasn't part of ISO.
  I use JVM, .NET, Web and C++, not caring if the runtimes are bare metal or running on top of an OS, type 1 hypervisor, or whatever.
  
  salawat 3 years ago
  
  >I use JVM, .NET, Web and C++, not caring if the runtimes are bare metal or running on top of an OS, type 1 hypervisor, or whatever.
  If you're downloading a JVM binary, you're missing out on the build step. It's C dependent, friend. How do you think that VM interfaces with the OS? Go on. Try it. ldd the java executable.
  It's libc all the way down. C itself is a sort of "VM" specification utilized to create the tools to run the tools to build the tools that make other high level languages possible.
  Unless you create something entirely custom in platform specific assembly, you're running on C at some level.
  
  arinlen 3 years ago
  
  > POSIX is the part of the C standard library in UNIX, that should have been part of ISO C as well.
  I'm not sure what you're trying to say. The Portable Operating System Interface (POSIX) is specified in an ISO standard, and basically specifies what a UNIX operating system's programmable interfaces are.
  https://en.wikipedia.org/wiki/POSIX
  POSIX also specifies stuff like "awk must be made available". Is that what you think the C programming language specifies?
  
  dahfizz 3 years ago
  
  I don't think you really understand what POSIX is.
  POSIX is an IEEE standard (example [1]). POSIX defines the Operating System API. You can see the C implementation of this API here[2].
  > so any C application that is more than a toy hello world with stdio, pings back into POSIX for any kind of meaningful work
  Simply calling printf relies on writing to a file descriptor. A "Hello world" application on linux uses posix. ANY hello world application uses posix. Even your Java Hello world App will call into the posix APIs. `System.out.println` isn't magic. It calls into the C posix implementation.
  If you want to do anything in any language (write to files, create threads, allocate memory, network communication), you need to go through the OS. POSIX is what defines that OS interface.
  > I use JVM, .NET, Web and C++, not caring if the runtimes are bare metal or running on top of an OS, type 1 hypervisor, or whatever.
  So you use POSIX, you just don't think about it.
  [1] https://standards.ieee.org/ieee/1003.1/7700/
  [2] https://en.wikipedia.org/wiki/C_POSIX_library
- wiseowise 3 years ago
  
  Comparing C to Haskell is like comparing razor to laser scalpel.
  
  bryanrasmussen 3 years ago
  
  And not just any razor, a really useful one!
  
  pjmlp 3 years ago
  
  One of those that always cuts the user no matter how carefully they try to get hold of it.
  
  benj111 3 years ago
  
  No it's a tool.
  I suppose you could compare it to a table saw. C is one without a guard or any other safety measures, so you need to be careful not to cut your fingers off. More modern languages have the guard and break etc.
  For general use you probably do want all the safety bits, but occasionally it is useful to be able to take it off to do a weird cut on a weird bit of wood.
  None of that necessarily means you will cut your fingers off though.
unwind 3 years ago

I learnt it a few years before that, and I remember how reading K&R once I got it felt like having someone turn UP the lights, open the blinds, wash the windows and basically TURN UP THE SUN compared to things I read before. So much clarity.
Number of times I've seen trigraphs in "real code": still zero. I hope it's the same for you.
- moomin 3 years ago
  
  I read it at a similar time, and I remember that feeling well. However, if you revisit it with a critical eye, you find a hundred places where a result isn’t checked, bounds aren’t checked, memory is leaked and so on.
  All of this was pretty much fine in the context in which it was written, but these days bullet-proofing things is pretty much mandatory and K&R’s elegance disappears in the face of such challenges.
  
  unwind 3 years ago
  
  Well yeah, but there is a difference between teaching the mechanics of a language, and teaching how to write safe, correct and secure code in that language.
  Perhaps that mindset is part of what made C survive for so long and in such diverse roles.
usr1106 3 years ago

In 1990 IBM donated a 9370 computer to our university. The default code page for German EBCDIC did not support square brackets.
I don't remember whether trigraphs were not supported by the compiler at the time or whether we just wanted to avoid completely unreadable code. Not experienced in VM/370 administration we spent weeks to modify the system to use some international EBCDIC codepage.
The system never saw much use, everybody preferred Unix workstations where programming in C was a natural thing.
kjs3 3 years ago

I, too, learned C by reading K&R cover to cover and solving all the exercises (in front of a Sun 3/160 running SunOS 3.5-ish). Even then back in those ancient days, it was obvious trigraphs were evil and should have been abolished to a special place in hell.
Taniwha 3 years ago

This is because IBM 029 card punches don't support these characters right?
- aidenn0 3 years ago
  
  I thought it was because of international character sets that lacked the punctuation of EBCDIC CP37 or ASCII.
  [edit]
  For example ISO/IEC 646 is ascii with punctuation replaced by other characters.
  
  Taniwha 3 years ago
  
  EBCDIC is very much IBM - on our old Burroughs machine we used to have to use cent signs (and something else that I forget) for square brackets and a 3-hole multipunch for ';'
  
  sargstuff 3 years ago
  
  stills leaves the issue of how to print out something that doesn't exist as a physical old style physical type face character. (way pre-dot matrix / laster printer stuff). -- aka substituting 3 characters way more informative than blank space.
  
  aidenn0 3 years ago
  
  My (possibly wrong) understanding is that trigraphs were a late addition to the ANSI standard, which would place it in late 80s, well into the CRT terminal and dot-matrix era.
- dragonwriter 3 years ago
  
  I think the ISO/IEC 646 invariant character set is more the issue.
  
  randomswede 3 years ago
  
  In the "Swedish 7-bit ASCII", the C code "a || b" would look like "a öö b". The same character mappings were used in Finland, and that's why IRC count the characters {|}[\] as letters (that would typically have been displayed as "äöåÄÖÅ").
  On the Compis II computer (a CP/M machine built on the 80186 CPU), there were places for {|}[\] in the character set, but they were in the top half of the 8-bit characters and not generally useful for programming.
- wglb 3 years ago
  
  Or an ASR33
dijonman2 3 years ago

Fantastic book. I used Learn C in 21 days and that is what started everything for me. I had a second book on Linux administration and installed Slackware from 1.44mb disks, ultimately setting up pppd and using Mosaic.
Great memories!

bradford 3 years ago

Trigraphs make this obfuscated C submission possible: (https://gist.github.com/Property404/e31b99deb3527159e183)

I've pasted it here for convenience (formatting fixed, thanks child comment!):

   //  Are you there god??/
   ??=define _(please, help)
   ??=define _____(i,m, v,e,r,y) r%:%:m
   ??=define ____ _____(a,f,r,a,i,d)
   main(__)<%____(!_(-~-??-((-~-??-!__<<-
   ??-!!__)<<-??-(!!__<<!!__))+-~-~-??--~-~
   -~-~-~-~-??-(-~-~-~-~-??-!!__<<-~!!__),-
   ??-!__))<%??>%>_(__,___)??<____
   (printf("please let me die??/r%d bottle%s"
   " of bee%s""""??/n",(!(___
   %-~-~!!___))?--__+!___++:__+!___++,!(__-!!___)
   &&___%-~-~!!___??!??!!(___%-~-~!!___??!??!__
   -(-~!!___))?"":"s",___%-~-??-!!___<-??-!!___?
   "r on the wall":"eeeeeeer! Take one down,pass ??/
   it around")&&__&&_(__,___),"mercy I'm in pain")??<??>??>

omoikane 3 years ago

Roughly the only good use of trigraphs these days is for obfuscated code, for example here: https://www.ioccc.org/years.html#1990_scjones
But trigraphs have gotten old even for IOCCC. In the guidelines for recent years, they specifically mention "We tend to dislike programs that ... obfuscate by excessive use of ANSI tri-graphs": https://www.ioccc.org/2020/guidelines.txt
thamer 3 years ago
How to format text on HN: https://news.ycombinator.com/formatdoc
```
  For code blocks, prefix each line with two or more spaces.
```
- eek2121 3 years ago
  
  Thanks (I haven't seen this despite lurking on HN for 'a long time' and interacting with it recently, however, you clearly didn't quote the doc, which says Text after a blank line that is indented by two or more spaces is reproduced verbatim. (This is intended for code.)
  Small nitpick, however I am happy you linked the page.
sargstuff 3 years ago

Guess with tri-graph elimination & awk getting unicode support will have to gawk C with cpp using pipology theory.
But think the cpp has to go away first, after enough sed.
https://grayson.sh/blogs/using-piphilology-to-hide-strings
https://www.gnu.org/software/gawk/manual/gawk.html#Signature...
lifthrasiir 3 years ago

Note that this uses not only trigraphs but also digraphs (here `<%`, `%>` and `%:`), which are similar to trigraphs in intended usages but behave much differently to digraphs in that it is a proper token and not a preprocessor substitution pattern. `printf("??(foo??)<:bar:>%c", "quux"<:1:>)` prints `[foo]<:bar:>u`, for example. Therefore digraphs are deemed less dangerous (however obscure) than trigraphs and do not require any compiler options.
- DonHopkins 3 years ago
  
  Bjarne Stroustrup proposed Generalized Overloading for C++2000, which not only lets you override all kinds of white space, like between two symbols separated left to right by a space (i.e. "a b" to add a to b), or two symbols separated top to bottom by a newline (i.e. "a \n b" to divide a by b, like a fraction), or even by tabs, or either kind of comment, but it also lets you override writing two symbols next to each other without any separation (i.e. "ab" to multiply a by b, which mathematicians love)!
  Of course they also had to limit the number of characters per symbol to 1 in order to unambiguously support the "ab" syntax for multiplying a and b (or however you wanted to overload the "absence of white space" operator), but fortunately they mitigated that little problem by making C++ fully supports Unicode, so you had thousands of single character Unicode variable names to choose from. His prophetic intuition was spot-on, now that there are so many expressive and inclusive Emoji characters to use for single character variable names!
  https://www.stroustrup.com/whitespace98.pdf
  I really appreciate Bjarne Stroustrup's clean simple design and coherent long term vision for C++2000, and I'm looking forward to using three dimensional white space overloading in C++3D.
  
  ralphb 3 years ago
  
  I am almost unhappy to learn that this was a joke. Would have been nice to put a final (personal) nail in the C++ coffin with this insanity. However, I guess it says enough that I did have to dig quite far into the paper to realize whether Bjarne was joking or not.
  
  gpderetta 3 years ago
  
  We do not have yet three dimensional white space overloading unfortunately, will have to wait for the next release train, targeting C++26.
  In the meantime, you can use multidimensional analog literals: http://www.eelis.net/C++/analogliterals.xhtml

rdlw 3 years ago

See also: "What is the "-->" operator in C++?"

https://stackoverflow.com/q/1642028

falcor84 3 years ago
And of course, its cousin the slides-to operator, described in the answer https://stackoverflow.com/a/8909176/493553 with the following example:
```
    while (x --\
                \
                 \
                  \
                   > 0)
         printf("%d ", x);
```
- DonHopkins 3 years ago
  
  Whenever somebody complains about how Python uses indentation instead of { } or BEGIN END, you can prove to them that it actually does support that and more, by demonstrating that they simply need to prefix their favorite brackets or keywords with Python's unary "#" operator, like:
  for i in range(10): #{ print(i) #}
  or:
  for i in range(10): #BEGIN print(i) #END
  You can even mix-and-match them, like:
  for i in range(10): #BEGIN print(i) #}
  or:
  for i in range(10): #{ print(i) #END
  or turn them inside-out, like:
  for i in range(10): #} print(i) #{
  or:
  for i in range(10): #END print(i) #BEGIN
  Python is extremely flexible that way, and can easily strangle and eat all other languages.
  
  krylon 3 years ago
  
  It swallows them whole, one might be tempted to say.
  
  paulluuk 3 years ago
  
  Ha, I chuckled
  
  ajoseps 3 years ago
  
  I mean that's fine for code that you write, but most python code is not like that at all
furyofantares 3 years ago

And sort of the opposite of that, I once had someone say they wanted to contribute to the C++ portion of our codebase, but the only problem was they didn't know how to make the "->" character, and did they need to get a special keyboard?
- icambron 3 years ago
  
  Is it possible that their editor provided ligatures but they didn't know about those and so assumed it was actually a character in the source?
  
  furyofantares 3 years ago
  
  No, it was much too long ago for that. They were just very new to programming and had interpreted it wrong the first time they saw it (most likely in a book, that's how we used to get our first introduction to a language).
- Izkata 3 years ago
  
  Semi-related: One of the people on my team uses a font that displays things like ">=" as "≥". I was a bit confused the first time I saw it.
teawrecks 3 years ago

Yeah, I thought this was going to involve the ternary operator. TIL about trigraphs.
- LorenPechtel 3 years ago
  
  Yeah, C# has some shortcuts these days of the form x <symbol>= y that compile as x = x <symbol> y. They are an actual advantage as x only needs to be stated once and it is absolutely clear that is assignment into the value--more information readily communicated with fewer characters. It also has the null coalescing operator ??. Put those together and you can have x ??= y (if (x = null) x = y;)--useful for lazy initialization and it can be returned, making lazy initialization getters much clearer. This looked awfully similar, I was trying to figure out how you could negate null coalescing.
  
  pjmlp 3 years ago
  
  These days, like since C# 1.0.
  
  tuukkah 3 years ago
  
  It's interesting to see the history of any given piece of syntax. This specific one is called augmented assignment or compound assignment: https://en.wikipedia.org/wiki/Augmented_assignment
  +:= in Algol68
  =+ in B
  += in C
- sargstuff 3 years ago
  
  still can, just have to add precidence changing characters ( )

layer8 3 years ago

From the ASCII Wikipedia page (https://en.wikipedia.org/wiki/ASCII#7-bit_codes):

> Almost every country needed an adapted version of ASCII, since ASCII suited the needs of only the US and a few other countries. For example, Canada had its own version that supported French characters.

> Many other countries developed variants of ASCII to include non-English letters (e.g. é, ñ, ß, Ł), currency symbols (e.g. £, ¥), etc. See also YUSCII (Yugoslavia).

> It would share most characters in common, but assign other locally useful characters to several code points reserved for "national use". […]

> Because the bracket and brace characters of ASCII were assigned to "national use" code points that were used for accented letters in other national variants of ISO/IEC 646, a German, French, or Swedish, etc. programmer using their national variant of ISO/IEC 646, rather than ASCII, had to write, and, thus, read, something such as

  ä aÄiÜ = 'Ön'; ü

instead of

  { a[i] = '\n'; }

> C trigraphs were created to solve this problem for ANSI C, although their late introduction and inconsistent implementation in compilers limited their use. Many programmers kept their computers on US-ASCII, so plain-text in Swedish, German etc. (for example, in e-mail or Usenet) contained "{, }" and similar variants in the middle of words, something those programmers got used to. For example, a Swedish programmer mailing another programmer asking if they should go for lunch, could get "N{ jag har sm|rg}sar" as the answer, which should be "Nä jag har smörgåsar" meaning "No I've got sandwiches".

dhosek 3 years ago

One of the challenges of | is that it was never entirely clear whether the ASCII | should be equivalent to EBCDIC’s | or ¦. As I recall, Waterloo C wanted ¦ as its vertical bar character, although I could be wrong. On the IBM system that I used back in the 80s, we had ASCII terminals which were run through a muxer to the actual system (which was part of the magic that allowed it to have thousands of concurrent users all getting real-time access—a lot of UI was offloaded to these systems which were essentially minicomputers on their own).

watersb 3 years ago

Great article (that appeared on HN somewhat recently) from Ken Shirrif on the history display terminals, and a great photo of the IBM 2848 Display Controller.
http://www.righto.com/2019/11/ibm-sonic-delay-lines-and-hist...
The next-gen was far more common.. The IBM 3270 terminal hooked to a local controller that talked to the mainframe. Could also hook a printer to the controller, you could print screen and simple forms independently from the mainframe.
You know all this, but I've always thought it was cool, and try to refresh my understanding of the setup. I no doubt have many details wrong.

NegativeLatency 3 years ago

There's also iso646.h which allows you to do some particularly python looking stuff:

  #include <iso646.h>
  #include <stdbool.h>
  #include <stdio.h>
  #define is ==
  
  bool is_whitespace(int c) {
    if (c is ' ' or c is '\n' or c is '\t') {
      return true;
    }
    return false;
  }
  
  int main() {
    int current, previous;
    bool in_word;
  
    while ((current = getchar()) not_eq EOF) {
      if (is_whitespace(current) and not is_whitespace(previous)) {
        putchar('\n');
      } else {
        putchar(current);
      }
      previous = current;
    }
  
    return 0;
  }

garaetjjte 3 years ago

Of course when you are willing to use preprocessor, you can do things like Bournegol: http://oldhome.schmorp.de/marc/bournegol.html
- sargstuff 3 years ago
  
  Or give C C++ functionality : https://libcello.org/
- sargstuff 3 years ago
  
  or throw exceptional loops : https://www.chiark.greenend.org.uk/%7Esgtatham/mp/
  
  sargstuff 3 years ago
  
  https://github.com/Hirrolot/metalang99
gpderetta 3 years ago

In C++ these are genuine operators and do not require the macros from iso646.
I quite like them, but then again, I have been writing way too much python lately.

chromatin 3 years ago

Wow, and I thought I knew C pretty well. Great post.

edited to add: I really like "Modern C" and just re-checked -- no mention of the preprocessor feature!

https://hal.inria.fr/hal-02383654/file/ModernC.pdf

ryandrake 3 years ago

I think the only remaining purpose for trigraphs is when you are at the very end of a C interview, and your amazing candidate has answered every question perfectly, and you just have to find something they might not know about--only then do you reach for the trigraphs.
- Someone 3 years ago
  
  No, that’s the next-to-last question. If they know that, you ask about digraphs
  <: and :> are [ and ] <% and %> are { and } %: is #
  (since C99, and expanded a bit later than trigraphs)
  ‘Unfortunately’, none of the characters used here can be coded using trigraphs, so you can’t use trigraphs to generate digraphs in source.
  
  gpderetta 3 years ago
  
  A very long time ago I had to write some throwaway code on a laptop with an European keyboard with {} in very inconvenient positions (requiring pressing the alt key). I resorted to digraphs and I don't regret it.
- thrwyoilarticle 3 years ago
  
  Gasp! You mean they've heard the volatile question before?
  
  sidewndr46 3 years ago
  
  The register keyword is far more interesting.
  
  Thorrez 3 years ago
  
  How about the restrict keyword.
  
  WithinReason 3 years ago
  
  Wow, I know all these. I only recently discovered bit addressing in C though.
- mpalczewski 3 years ago
  
  Why would you do this? Some weird insecurity?
  
  ryandrake 3 years ago
  
  Haha no! When a candidate is that awesome, I sometimes get morbidly curious about whether there is actually an end to that depth of knowledge or if it just goes on forever. At that point, they already have the "HIRE" classification and I'm pretty much in awe!
  I love it when a candidate blows through my easy, medium and hard questions and leaves me scrambling.
  
  sirmoveon 3 years ago
  
  Salary negotiating leverage? The worst they feel the more likely they accept to be lowballed
richbell 3 years ago

I think C also has the elusive "down to" operator.
https://stackoverflow.com/a/1642035
- creativemonkeys 3 years ago
  
  "-->" is not an operator in the C language, it's just a way of writing the unary operator "--" and comparison operator ">" together without any whitespace between them, since whitespace is ignored by the lexer.
  
  creativemonkeys 3 years ago
  
  Have any of you downvoters read the C grammar? There is no --> operator in C. I'll never understand some people.
  
  richbell 3 years ago
  
  I didn't downvote you but I presume others did because the StackOverflow post I linked to essentially says the same thing.
Natsu 3 years ago

Honestly, I thought this was about a programming language called C? rather than C.
- rdlw 3 years ago
  
  In the spirit of C++ and C#, there could be a C?'1':'0'

billpg 3 years ago

"There's a problem. Some machines don't have some braces and vertical bars and such. We'll have to add keywords like OR and BEGIN and END."

"Are question marks fine?"

"Yes."

"I'll come up with something."

daptaq 3 years ago

See iso646.h, and https://en.cppreference.com/w/c/language/operator_alternativ....

cl3misch 3 years ago

This reminds me of a comment on a Python discussion >2 years ago, of which I think often:

"Whether it's computer languages or human ones, as soon as you get into a discussion about the correct parsing of a statement, you've lost and need to rewrite in a way that's unambiguous. Too many people pride themselves on knowing more or less obscure rules and, honestly, no one else cares."

https://news.ycombinator.com/item?id=23051202

halileohalilei 3 years ago

Completely agree with that. In fact, it's the first thing I thought of when I saw the code snippet in question. Even if you replace the trigraph with the regular || operator, it's still hard to read that piece of code. Syntactic sugars and short circuits are cool and all but most of the time they have no place in production code that's meant to be read by other developers.

kbob 3 years ago

I'd say, "Congratulations! You're one of today's luck 10,000!", but trigraphs aren't really much fun. Just another reminder that C is old, and computing is even older.

I've used uppercase-only terminals, and I've used ancient C, but not at the same time.

WalterBright 3 years ago

Ancient C didn't have trigraphs. My copy of K+R (1978) doesn't mention them.
- kragen 3 years ago
  
  No, they were a design error introduced by the ANSI committee.
  
  krylon 3 years ago
  
  I thought ISO added them when C went from ANSI C (1989) to ISO C (1990), along with wchar.h and such. I might misremember, though, it's been a long time since I did anything serious with C.
  Come to think of it, didn't they remove trigraphs in one of the more recent iterations of the standard?
  
  kragen 3 years ago
  
  They did remove trigraphs in one of the more recent iterations. It's possible that it was as you say, but I have this vague memory that it was ANSI who added them. I think maybe what the ISO added were https://en.wikipedia.org/wiki/C_alternative_tokens.

kenniskrag 3 years ago

trigraphs are removed in c++ 17

https://en.m.wikipedia.org/wiki/C%2B%2B17#Removed_features

amelius 3 years ago

I've never seen them used anywhere.
- mfost 3 years ago
  
  They were meant for coding C on machines that had even less than ASCII as available text encoding really. So no wonder you never see them.
- shakna 3 years ago
  
  They were meant, mostly, for punch-card machines.
  So if you started programming anywhere after the point in time when you needed to hand off your code to a punch card operator, you're unlikely to have seen them.
  
  WalterBright 3 years ago
  
  They were meant to support EBCDIC.
- Rebelgecko 3 years ago
  
  They're good for obfuscating source code but AFAICT that's about it on modern machines
  
  sargstuff 3 years ago
  
  Less obscure if physical print type head doesn't have the corresponding trigraph representation.
  Unicode is a worthy successor to trigraphs -- no need for pre-processing!
  Guess with tri-graph elimination & awk getting unicode support will have to gawk C with cpp using pipology theory.
  But think the cpp has to go away first, after enough sed.
  https://grayson.sh/blogs/using-piphilology-to-hide-strings
  https://www.gnu.org/software/gawk/manual/gawk.html#Signature...
pjmlp 3 years ago

They are still around in C though.
- david2ndaccount 3 years ago
  
  They are being removed in C23.
  
  pjmlp 3 years ago
  
  I see, thanks.
- KerrAvon 3 years ago
  
  gone in C23
  
  sargstuff 3 years ago
  
  ... but if someone sed it back in .....
  
  pjmlp 3 years ago
  
  I see, thanks.
piesquaredarr 3 years ago

Huh, I never realized that C++ standards were removing C features. Time to be more careful about using g++ for everything.
- Denvercoder9 3 years ago
  
  C++ has never been a strict superset of C. The most obvious example is the "class" and "new" keywords which can be used as an identifier in C, but not in C++. There's more subtle differences as well, such as character literals having type int in C and char in C++.
  
  wheels 3 years ago
  
  Another really common one is that casting from void * to any other type doesn't require a cast in C, but it does in C++:
  #include <stdlib.h> int main() { int *foo = malloc(sizeof(int)); return 0; }
  That works in C, but not in C++.
  There's actually another subtle different in there that main() means "unspecified arguments" in C, and "no arguments" in C++. ("No arguments" in C would be main(void).) However, it's no longer commonly used that way in C, but casts from void * to other types is very common in C.
  
  favorited 3 years ago
  
  The `func()` vs `func(void)` difference has been deprecated for a while, and is removed in C23.
  
  tialaramex 3 years ago
  
  Using unions for type punning is legal C, but the exact same code has UB in C++
  The modern C++ way to do this ~safely isn't legal C, and yet the type pun isn't safe in C++. I believe using memcpy() to launder the bits is legal in both languages and in some cases your compiler can figure out what you're doing and not actually emit the unnecessary copy.
  
  sumtechguy 3 years ago
  
  I used a few different compilers for C in one project. Ended up at memcpy and byteswaping to get data between different instances of the code correctly (some ARM, mips, and x86, and each of those can set the byte order). Using a union is possible if it supports packing and the bytes happen to be in the same order and the compiler keeps the struct in the same order. I found that is not true of all compilers, by default. Massively annoyed having to rewrite about 50 file writes/reads that were nice and simple with massive memcpy cascades. Inside the same code on the same compiler you can get away with a lot of things. But port to another arch or try to get bin data out of your program into another (good luck). These days there are realistically 4 compilers people use and they tend to behave mostly the same, also nice libs that do most of this for you. That was the same project I learned not all printf's are created equal. Different CRTs do very different things even in the same compiler family. There is a reason everyone decided to use json and xml to transport data. Because of that mess.
  
  bee_rider 3 years ago
  
  Ah, what an elegant example, haha.
- turminal 3 years ago
  
  Using g++ for C code is a recipe to get badly burnt - for unrelated reasons. Trigraphs are disabled in gcc by default anyway.
  
  professoretc 3 years ago
  
  That's true for any C++ compiler, really. Although C++ tries to retain some element of compatibility with C, there have always been differences (you can name a variable `class` in C but not in C++).
- sltkr 3 years ago
  
  By default, GCC ignores trigraphs in C code too.
  You have to explicitly pass -std=c17 (or whatever) to get standard-conforming behavior including trigraphs.
- Jorengarenar 3 years ago
  
  https://mcla.ug/blog/cpp-is-not-a-superset-of-c.html

DonHopkins 3 years ago

Years ago I wrote a perfectly reasonable comment like /* WTF??!?!!?!???? */ and the old C compiler complained about "invalid trigraph". A syntax error in the middle of a comment!

Took me a while to figure out that "trigraph" was referring to some part of "??!?!!?!????" and not "WTF".

hvdijk 3 years ago

That's a bug, there is no such thing as an invalid trigraph. ?? followed by any character other than =, /, ', (, ), !, <, >, or - is not a valid trigraph, but that doesn't make it an invalid trigraph, that just makes it not a trigraph, it's perfectly valid to have ??? in a comment, or in a string literal.
- DonHopkins 3 years ago
  
  Are you telling me that C compilers in the early 90's had bugs and confusing error messages??!!?!??? WTF?!??!?!?
- benj111 3 years ago
  
  Oh so that's why their called trigraphs, because there's 3 valid states?
  Valid Invalid ??? (Exercise for the reader to decide if this is a trigraph or not)

Agentlien 3 years ago

Every time I hear about trigraphs I think of this horror:

http://stackoverflow.com/questions/53315710/ddg#53315821

FabHK 3 years ago

There are two aspects to this, the trigraph, and using the short circuiting behaviour of the binary logic operator for control flow.

The latter is a very common idiom in Julia code, which I found obscure and puerile at first (“look how smart I am”), but have come to appreciate as concise and natural by now.

For example:

  function fact(n::Int)
     n >= 0 || error("n must be non-negative")
     n == 0 && return 1
     n * fact(n-1)
  end

https://docs.julialang.org/en/v1/manual/control-flow/#Short-...

divbzero 3 years ago

In addition to trigraphs, there are apparently a set of C alternative tokens defined as follows:

  #define and &&
  #define and_eq &=
  #define bitand &
  #define bitor |
  #define compl ~
  #define not !
  #define not_eq !=
  #define or ||
  #define or_eq |=
  #define xor ^
  #define xor_eq ^=

I suppose that allows for code like this:

  if (x or not y or not z) {
      return 1;
  }

https://en.wikipedia.org/wiki/C_alternative_tokens

pwdisswordfish9 3 years ago

Makes for great obfuscated C++.

    template <typename T>
    void print(T const bitand foo) {
        std::cout << foo << std::endl;
    }

pjmlp 3 years ago

    void print(auto const bitand foo) {
        std::cout << foo << std::endl;
    }

Since C++20.

pavon 3 years ago

The instructor at the branch college where I learned C++ in the late 90's taught us that those were the preferred operators and that the old operators belonged in the wastebasket of history along with printf and str* functions.
It made for some amusing group projects when I got to university, when classmates had never seen those operators and were trying to figure out where they were coming from and why I would write such silly things. I trolled them by replacing all my brackets with `begin` and `end` in the next assignment before moving to the standard use of C operators for the rest of the class.

curling_grad 3 years ago

Anecdote: An online judge website (which is pretty well known in Korea) has an easy problem[0] asking to write a program which adds "??!" to input. A lot of beginners' C/C++ submissions got "Wrong Answer" verdict because of trigraphs.

[0]: https://www.acmicpc.net/problem/10926

hgs3 3 years ago

Reminds me of the "goes to" operator [1]

[1] https://stackoverflow.com/questions/1642028/what-is-the-oper...

cesaref 3 years ago

This sort of practice goes back to BCPL, which wikipedia says is the first braced programming language. Because { and } weren't universally available, compilers also supported the sequence $( and $) to represent these, which were typeable and printable on just about anything.

https://en.wikipedia.org/wiki/BCPL

This is the earliest example of this sort of thing i'm aware of - is there an earlier example?

Also, BCPL supported // for comments, again, probably the first use of this sequence.

virtualritz 3 years ago

> Has Microsoft Windows finally been open-sourced or where did this come from?

This comment on the SO post made my day. :D

anfractuosity 3 years ago

In gcc I got:

    1.c:1:11: warning: trigraph ??< ignored, use -trigraphs to enable [-Wtrigraphs]

Is there a preprocessor directive to enable support out of curiosity?

sargstuff 3 years ago

from [1], trigraphs or not:

  int main() {
     [](){}()
  }

is still wierd.

Wonder if there will be a request for an emacs macro to handle the replaced cpp trigraphs? [2]

[1] https://zygoloid.github.io/cppcontest2018.html [2] https://www.emacswiki.org/emacs/CppTemplate

planede 3 years ago
Good news, in C++20 you can add <> there somewhere, although probably it can't be empty.
Anyway, probably obscure enough:
```
  int main() {
      []<class=void>(){}();
  }
```
- pjmlp 3 years ago
  
  And in C++23 drop the arguments parenthesis, so this is also valid lambda call, :)
  int main() { []{}(); }
  
  planede 3 years ago
  
  []{}() was always valid, but you can drop the arguments in more cases in C++23.

Waterluvian 3 years ago

If we deprecated trigraphs and removed that step from the compiler would it speed compilation up much? I’m going to guess maybe by milliseconds?

pantalaimon 3 years ago

They are already deprecated and removed in C23
zik 3 years ago

Probably not by any measurable amount
quickthrower2 3 years ago

I imagine microseconds or less
- sargstuff 3 years ago
  
  0 if sed used to expand the trigraphs before passing output to cpp/compiler.
  
  quickthrower2 3 years ago
  
  Always zero if you make it someone else’s problem :-)
NavinF 3 years ago

microseconds, not milliseconds

chris_wot 3 years ago

C++17 removed trigraphs. Sadly will no longer work.

omnicognate 3 years ago

s/Sadly/Gladly/

olliej 3 years ago

Oh trigraphs may you never die

jawadch93 3 years ago

[dead]