Clang’s -O0 output: branch displacement and size increase

userbinator 12 days ago

This reminds me of fasm, the only assembler immediately coming to mind that will do multi-pass branch optimisation by default. Most other assemblers either choose the long form always unless specified explicitly as "jmps" or "jmp short" (and then complain when the target turns out to be too far away), or the short form only if the destination is known when it's encountered (backwards jump).

I've long held the opinion that O0 on all the major compilers should be considered more like an O-1 because of the glaring stupidities it leaves in its output, which almost looks like it was pessimising instead of not optimising.

This article is also only the 2nd time I've seen "relaxation" used in this context. The first was https://news.ycombinator.com/item?id=10219007 over 8 years ago.

o11c 12 days ago

Despite the fact that you say "all the major compilers", GCC and Clang make very different decisions for each optimization level.
In particular, GCC generates fairly debuggable code at all optimization levels, so there is less motivation for -O0 in the first place.
- MaskRay 10 days ago
  
  There is ongoing work to improve debuggability for optimized code. https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-ne...
``` Mode | Execution Time | Debuggability | Compile Time O0 | 1.0000 | 1.0000 | 1.0000 Og | 0.3439 | 0.5357 | 1.8630 O1 | 0.3082 | 0.4241 | 1.7880 O2g | 0.2823 | 0.4845 | 3.0420 O2 | 0.2514 | 0.3908 | 2.9380 ```
pcwalton 11 days ago

> This reminds me of fasm, the only assembler immediately coming to mind that will do multi-pass branch optimisation by default. Most other assemblers either choose the long form always unless specified explicitly as "jmps" or "jmp short" (and then complain when the target turns out to be too far away), or the short form only if the destination is known when it's encountered (backwards jump).
Not true, gas will do the relaxation by default.
> This article is also only the 2nd time I've seen "relaxation" used in this context.
It's been the standard term among toolchain developers for quite a while. I remember seeing it all over the linker in 2007 when working on ARM stuff.
mkup 11 days ago

NASM has an option (-Ox) to specify how many passes it should take trying to optimize near jumps for short jumps. I usually specify -O9.
- MaskRay 10 days ago
  
  Thanks for mentioning nasm.
  Both GNU assembler and LLVM integrate assembler parse and match instructions only once. hey then store an internal representation in memory and perform fixed-point iteration. The section/fragment representation gives a lot of flexibility.
  In contrast, nasm parses and matches instructions multiple times depending on the optimization level. It also assigns addresses during parsing and uses an ad-hoc method for JMP/JCC instructions. The end conditions of the fixed-point iteration algorithm (global_offset_changed and stall_count) seem unconventional. -O0 does not "relax all" short jumps to near jumps.
ryukoposting 12 days ago

I'll second this. As a firmware dev, I almost never encounter situations where O0 gets me anything in terms of asm readability/debugging that O1 didn't already give me.
mati365 11 days ago

Not the only. Mine assembler (and C compiler), written in TypeScript, does the same despite being painfully slow and useless.
https://github.com/Mati365/ts-c-compiler
dataflow 12 days ago

Are you aware of -Og? It might be what you want.
- usefulcat 11 days ago
  
  I believe -Og is only meaningful for gcc. IIRC it’s the same as O1 for clang.

zhouzhouyi 11 days ago

When I debug a program, the first thing I do is to compile with "-O0", very nice to remove "-mrelax-all" as the default for -O0, because "-mrelax-all" increases both VM size and the file size.

MaskRay 12 days ago

Thanks for posting:)

ezekiel68 11 days ago

> people generally care less about -O0 code size.

Right. A ~5% additional increase in debug artifact size is really not a high tax.