| Svelte Hacker News

points by erlehmann_ 8 years ago

An issue I have with make is that it can not handle non-existence dependencies. DJB noted this in 2003 [1]. To quote myself on this [2]:

> Especially when using C or C++, often target files depend on nonexistent files as well, meaning that a target file should be rebuilt when a previosly nonexistent file is created: If the preprocessor includes /usr/include/stdio.h because it could not find /usr/local/include/stdio.h, the creation of the latter file should trigger a rebuild.

I did some research on the topic using the repository of the game Liberation Circuit [3] and my own redo implementation [4] … it turns out that a typical project in C or C++ has lots of non-existence dependencies. How do make users handle non-existence dependencies – except for always calling “make clean”?

[1] http://cr.yp.to/redo/honest-nonfile.html

[2] http://news.dieweltistgarnichtso.net/posts/redo-gcc-automati...

[3] https://github.com/linleyh/liberation-circuit

[4] http://news.dieweltistgarnichtso.net/bin/redo-sh.html (redo-dot gives a graph of dependencies and non-existence dependencies)

tinus_hn 8 years ago

Make has no memory so it can't remember things. It simply compares the dates of files. If a dependency is newer than a target the target is rebuilt.

If you want to keep some kind of memory you have to build and keep track of it yourself.

But the problem you point at is simply poor design. It is not a normal occurrence for system header files to move around like you state. If they do, a full rebuild is indeed required. It shouldn't so often that that is an issue.

feelin_googley 8 years ago
"It simply compares the dates of files."
test(1) also compares the dates of files
```
   test file1 -nt file2
   test file1 -ot file2
```
Is there anything else that make does in addition to comparing dates of files?
(Besides running the shell.)
tsort(1) does topological sorting
tsort + sed + sort + join + nm = lorder(1)
lorder can determine interdependencies
beagle3 8 years ago

If an apt-get upgrade fixed an issue in a system header or a library, but the date of the fix predates the last build (quite common; last build from yesterday, fix from two days ago but downloaded today) then make will do nothing (or any subset of the right things but not all) and make clean; make will do the right thing.
Relying on time stamps is a design decision that was good for its time, but it is no longer robust (or sane) an a constantly connected, constantly updated, everything networked world.
djb redo takes it to one logical conclusion (use cryptographic hashes to verify freshness)
There are other ways in which make is lacking: operations are non atomic (redo fixes that too), dependency granularity is file level (so dependency on compiler flags is very hard, dependency on makefile is too broad; redo fixes this too); dependency is manual (redo doesn't fix this; AFAIK the only one that properly does is tup)
- vog 8 years ago
  
  > Relying on time stamps is a design decision that was good for its time, but it is no longer robust
  I agree with the sentiment, but a small nitpick:
  Relying on time stamps for older/newer comparisons is not robust.
  Using time stamps (and perhaps file size) for equality checks is quite robust. And the combination with cryptographic hashes is even better (if a file is recreated but has the same contents afterwards, timestamp checks would trigger an unneeded rebuild, while a crypto hash check would recognize that there's nothing to rebuild).
- pm215 8 years ago
  
  Typically if a system header has changed or been added due to upgrading a library package you'll need to rerun any configure script anyway (since it very likely checks for header features and those decisions will change). So unless your build system magically makes configure depend on every header file used in every configure test it runs, you'll need to redo a clean build anyway, pretty much.
  Make has a whole pile of issues, but this one really isn't an aggravation in practice, I find.
  
  beagle3 8 years ago
  
  apt-get upgrade does not usually upgrade a package, despite the name; 99.9% of the time it applies a bug or security fix, almost never changing any functionality or interface - and would result in the same config script.
  And that assumes you actually have a config script, which is also a nontrivial assumption.
  Djb redo lets you track e.g. security fixes that change libc.a if you are linking statically, but that's not usually done.
  The only build system I know that guaranteed a rebuild whenever and only when it is needed is tup. (Assuming you have only file system inputs)

tedunangst 8 years ago

Is this a common problem? I can't think of any project that does this, and there's a simple solution as well: don't shadow system headers. That's just asking for pain, regardless of how well make handles it.

tomjakubowski 8 years ago

I don't think this problem is limited to system headers. Something as innocent as #include "foo/bar.h" can be affected by this if you pass -I with at least two unique paths to the compiler.
- tedunangst 8 years ago
  
  ok, sure, I revise my answer to don't shadow any header.
  
  tomjakubowski 8 years ago
  
  Easier said than done, especially integrating over the lifetime of a years-long project with many ever-changing dependencies :-)

boris 8 years ago

Could you summarize how you handle this in redo? Also what about the case where a header file does exist but is out-of-date and because of that triggers an error (e.g., version compatibility check with #error) -- how do you handle that?

JdeBP 8 years ago

I, for one, handle it with a tool that mimics the compiler's preprocessing phase and emits both redo-ifchange information for all of the headers that are used, and redo-ifcreate information for all of the non-existent headers that are looked for during the process.

    JdeBP %cat test.cpp
    #include <cstddef>
    void f() {}
    JdeBP %/package/prog/cc/command/cpp test.cpp --iapplication . --icompiler-high /usr/local/lib/gcc5/include/c++ --icompiler-low /usr/local/lib/gcc5/include/c++/x86_64-portbld-freebsd10.3 --iplatform /usr/local/include --iplatform /usr/include  -MD -MF /dev/stderr 2>&1 > /dev/null|fgrep redo
    redo-ifcreate ./cstddef ./bits/c++config.h /usr/local/lib/gcc5/include/c++/bits/c++config.h /usr/local/include/bits/c++config.h /usr/include/bits/c++config.h ./bits/os_defines.h /usr/local/lib/gcc5/include/c++/bits/os_defines.h /usr/local/include/bits/os_defines.h /usr/include/bits/os_defines.h ./bits/cpu_defines.h /usr/local/lib/gcc5/include/c++/bits/cpu_defines.h /usr/local/include/bits/cpu_defines.h /usr/include/bits/cpu_defines.h ./stddef.h /usr/local/lib/gcc5/include/c++/stddef.h /usr/local/include/stddef.h ./sys/cdefs.h /usr/local/lib/gcc5/include/c++/sys/cdefs.h /usr/local/include/sys/cdefs.h ./sys/_null.h /usr/local/lib/gcc5/include/c++/sys/_null.h /usr/local/include/sys/_null.h ./sys/_types.h /usr/local/lib/gcc5/include/c++/sys/_types.h /usr/local/include/sys/_types.h ./machine/_types.h /usr/local/lib/gcc5/include/c++/machine/_types.h /usr/local/include/machine/_types.h ./x86/_types.h /usr/local/lib/gcc5/include/c++/x86/_types.h /usr/local/include/x86/_types.h
    redo-ifchange /usr/local/lib/gcc5/include/c++/cstddef /usr/local/lib/gcc5/include/c++/x86_64-portbld-freebsd10.3/bits/c++config.h /usr/local/lib/gcc5/include/c++/x86_64-portbld-freebsd10.3/bits/os_defines.h /usr/local/lib/gcc5/include/c++/x86_64-portbld-freebsd10.3/bits/cpu_defines.h /usr/include/stddef.h /usr/include/sys/cdefs.h /usr/include/sys/_null.h /usr/include/sys/_types.h /usr/include/machine/_types.h /usr/include/x86/_types.h
    JdeBP %/package/prog/cc/command/cpp test.cpp --iapplication . --icompiler-high /usr/local/lib/gcc5/include/c++ --icompiler-low /usr/local/lib/gcc5/include/c++/x86_64-portbld-freebsd10.3 --iplatform /usr/local/include --iplatform /usr/include  -MMD -MF /dev/stderr 2>&1 > /dev/null|fgrep redo
    redo-ifcreate ./cstddef ./bits/c++config.h ./bits/os_defines.h ./bits/cpu_defines.h ./stddef.h ./sys/cdefs.h ./sys/_null.h ./sys/_types.h ./machine/_types.h ./x86/_types.h
    redo-ifchange
    JdeBP %

I also have a wrapper that takes arguments in the forms that one would invoke g++ -E and clang++ -E, tries to works out all of the platform and compiler include paths, and invokes this tool with them.

It's then a simple matter of invoking these redo-ifchange and redo-ifcreate commands from within the redo script that is invoking the compiler.

You can see this plumbed into redo in a real system in the source archives for the nosh toolset and djbwares.

* http://jdebp.eu./FGA/introduction-to-redo.html#CompilerDefic...

* https://news.ycombinator.com/item?id=15044438

erlehmann_ 8 years ago
I use strace(1) look for stat(2) syscalls that fail with ENOENT. An advantage of this approach is that I do not have to imitate the C preprocessor, so parser differentials can never happen. The following default.o.do file from my blog post [1] handles the case:
```
  #!/bin/sh
  redo-ifchange $2.c
  strace -e stat,stat64,fstat,fstat64,lstat,lstat64 -f 2>&1 >/dev/null\
   gcc $2.c -o $3 -MD -MF $2.deps\
   |grep '1 ENOENT'\
   |grep '\.h'\
   |cut -d'"' -f2 2>/dev/null\
   >$2.deps_ne
  
  read d <$2.deps
  redo-ifchange ${d#*:}
  
  while read -r d_ne; do
   redo-ifcreate $d_ne
  done <$2.deps_ne
  
  chmod a+x $3
```
This approach is also used for building Liberation Circuit if strace is installed [2].
I think the compiler should output the necessary information. To quote Jonathan de Boyne Pollard [3]:
> As noted earlier, no C or C++ compiler currently generates any redo-ifcreate dependency information, only the redo-ifchange dependency information. This is a deficiency of the compilers rather than a deficiency of redo, though. That the introduction of a new higher-precedence header earlier on the include path will affect recompilation is a fact that almost all C/C++ build systems fail to account for.
> I have written, but not yet released, a C++ tool that is capable of generating both redo-ifchange information for included files and redo-ifcreate information for the places where included files were searched for but didn't exist, and thus where adding new (different) included files would change the output.
JdeBP, could you please release your tool under a free software license? I suspect it has fewer errors than the similar CMake approach [4].
[1] http://news.dieweltistgarnichtso.net/posts/redo-gcc-automati...
[2] https://github.com/linleyh/liberation-circuit/blob/master/sr...
[3] http://jdebp.eu./FGA/introduction-to-redo.html#CompilerDefic...
[4] https://github.com/Kitware/CMake/blob/master/Source/cmDepend...
- JdeBP 8 years ago
  
  Just for the record: My personal preference is for Clang and GCC to be instrumented to emit the names of both found and non-existent header files.

bch 8 years ago

I'm missing something.

Are you saying you want to be able to compile either way /usr/include/stdio.h and /usr/local/include/stdio.h, but remember what the last compilation used and know what header would be used in the next compilation, and if it's different, mark the target as stale and perform the action?

I guess you'd need to keep a log of the build and test cpp invocations for diffs.

I've never run into this scenario.

cjhanks 8 years ago

An obvious case would be a developer supporting multiple versions of a 3rd party library.
- bch 8 years ago
  
  This is where I saw the beauty of including dependencies w a project. Even on my own systems, as environments change, things break, and having a stable in-tree reference had paid off.
  It's a tough situation, but I find myself leaning to @tedunangst position over the years - usually I try to adapt my machines (incl software) to my needs, but this case I need to take control/responsibility, and here be dragons. Does cmake actually solve this? Do other build systems?

Matheus28 8 years ago

I personally use scons instead of Makefiles. Its dependency analysis is amazing, I haven't seen it fail a single time.

erlehmann_ 8 years ago

Please elaborate: What do you find amazing about scons?

Also, how does scons handle non-existence dependencies?

What would be a scons dependency graph for this C code?

  #include<stdio.h>
  main() {
   printf("hello, world\n");
   return 0;
  }

You can see a dependency graph I generated with redo here: http://news.dieweltistgarnichtso.net/posts/redo-gcc-automati...

Matheus28 8 years ago

I love that I get to use python to write the dependency graph, it allows for some interesting stuff.

Other than that, it's mostly the ease of use. This is enough to compile a C++ project (that has all its .c and .cpp files in the same directory as the SConstruct file), and it'll pick up on all dependencies correctly:

    Program(target = 'a.out', source = Glob('*.c') + Glob('*.cpp'))

I also know for a fact that it's able to pick up on how the presence of a new file might trigger a rebuild of what could require it.

Regarding the last question, using --tree=all it prints:

    +-.
      +-SConstruct
      +-a.out
      | +-main.o
      | | +-main.c
      | | +-/usr/bin/gcc
      | +-/usr/bin/gcc
      +-main.c
      +-main.o
        +-main.c
        +-/usr/bin/gcc

I'm not sure if it's hiding dependencies on system headers or not. But I can force it to show them by adding /usr/include and /usr/local/include to CPPPATH (excuse the long code block):

    +-.
      +-SConstruct
      +-a.out
      | +-main.o
      | | +-main.c
      | | +-/usr/include/stdio.h
      | | +-/usr/include/Availability.h
      | | +-/usr/include/_types.h
      | | +-/usr/include/secure/_stdio.h
      | | +-/usr/include/sys/_types/_null.h
      | | +-/usr/include/sys/_types/_off_t.h
      | | +-/usr/include/sys/_types/_size_t.h
      | | +-/usr/include/sys/_types/_ssize_t.h
      | | +-/usr/include/sys/_types/_va_list.h
      | | +-/usr/include/sys/cdefs.h
      | | +-/usr/include/sys/stdio.h
      | | +-/usr/include/xlocale/_stdio.h
      | | +-/usr/include/AvailabilityInternal.h
      | | +-/usr/include/sys/_types.h
      | | +-/usr/include/secure/_common.h
      | | +-/usr/include/sys/_posix_availability.h
      | | +-/usr/include/sys/_symbol_aliasing.h
      | | +-/usr/include/machine/_types.h
      | | +-/usr/include/sys/_pthread/_pthread_types.h
      | | +-/usr/include/i386/_types.h
      | | +-/usr/bin/gcc
      | +-/usr/bin/gcc
      +-main.c
      +-main.o -- This part was removed to decrease comment size, it's the same as the main.o part above

The SConstruct for this last block is:

    Program(target = 'a.out', source = ['main.c'], CPPPATH = ['/usr/local/include', '/usr/include'])

Note that these were generated on macOS.

JdeBP 8 years ago

From what you are showing us, the answer to How does scons handle non-existence dependencies? is that it does not handle them at all.
Go and look at M. Moskopp's graph. It has a lot of dependencies for non-existent files that the compiler would have used in preference to the ones that it actually used, had they existed.

bsder 8 years ago

Did scons finally get a little less opinionated?
It used to be that scons really forced you to use subsidiary SConscript child files for anything more complicated than a couple files in a single directory instead of being able to lump it all into a single SConstruct.
- cjhanks 8 years ago
  
  I think that was quite long ago. Yes, `SCons` can fit into a single file if you so choose. But its behavior under recursive builds (with nested directory structure) is far more predictable than most build systems I have seen.