mhh__ 2 years ago

1. Might want to mention FDO, even if it doesn't use the gcc-native profiling data.

2. Do not use instrumenting profilers for measuring code.

  • mistrial9 2 years ago

    > Do not use instrumenting profilers for measuring code.

    except when that is useful 8-)

    • mhh__ 2 years ago

      Use a sampling profiler. They're faster, more accurate, and don't require recompiling.

      • AlotOfReading 2 years ago

        On the other hand, it's a lot more difficult to implement a sampling profiler than an instrumenting profiler on a bare metal target. There's definitely a niche for the latter and I've made use of the GCC instrumentation hooks for all sorts of analysis tools over the years.

        • mhh__ 2 years ago

          I suppose you're right but working on a compiler/language has made me very wary of "But what about embedded!" as a technical point. You can basically argue any point using embedded.

          • AlotOfReading 2 years ago

            That's a totally fair objection and one I'm sensitive to as a vocal advocate for moving ES teams away from the custom crap that we've spent the past 40 years doing.

            I still think the weird limitations you get from "what about embedded" are often just obvious canaries for problems present throughout much of the vast diversity of computing. For example, sampling profilers used to have serious issues on WSL (where perf must still be manually compiled) and WINE. Having instrumentation hooks also makes things like AFL and custom sanitizers possible. The flexibility isn't always reasonable (e.g. don't assume little endian, don't assume 8-bit bytes), but there may be valid use cases that weren't considered at the toolchain/language level.

      • gnufx 2 years ago

        "more accurate"? The rationale for (typically expensive) tracing in HPC circles is that sampling can miss things you want to pick up. I think that's also covered in the vi-hps.org workshop introductions, for instance. (It doesn't necessarily involve recompilation.)

        • mhh__ 2 years ago

          "More accurate" is a condensed version of a longer (disclaimer-attached) statement about how I think they're more useful for typical profiling.

          If you want to know the exact ratios of how hot code is, then yes instrumentation can be a godsend, however in my experience most people are simply mislead by them because a naive instrumenting profiler usually does not have enough information to capture the right context (i.e. the call stack).

          For most applications profiling is basically just an exercise in proving your own mental model of execution correct, not fine-tuning (initially at least).

          https://youtu.be/6TDZa5LUBzY I gave a talk on this in November.

  • Veserv 2 years ago

    Do not use bad instrumenting profilers. A good modern tracing-based instrumenting profiler provides so much more actionable information and insights into where problems are than a sampling profiler it is ridiculous.

    As a example consider viztracer [1] for Python. By using a aggregate visualizer such as a flame graph you can figure out what is taking the most time then you can use a tracing visualizer to figure out the exact call stacks and system execution and state that caused it. Not only that, a tracing visualizer lets you diagnose whole system performance and makes it trivial to identify 1 in 1000 anomalous execution patterns (with a 4k screen a anomalous execution pattern stands out like a 4 pixel dead spot). In addition you also get vastly less biased information for parallel execution and get easy insights into parallel execution slowdowns, interference, contention, and blocking behaviors.

    The only advantages highlighted in your video that still apply to a good instrumenting profiler are:

    1. Multi-language support.

    2. Performance counters (though that is solved by doing manual tracking after you know the hotspots and causes).

    3. Overhead (if you are using low sampling frequency). Even then a good tracing instrumentation implementation should only incur low double-digit percent overhead and maybe 100% overhead in truly pathological cases involving only small functions where the majority of the execution time is literally spent in function call overhead.

    4. No need for recompilation, but you are already looking to make performance changes and test so you already intend to rebuild frequently to test those experiments. In addition, the relative difference in information is so humongous that this is not even worth contemplating unless it is a hard requirement like evaluating something in the field.

    [1] https://github.com/gaogaotiantian/viztracer

flakiness 2 years ago

Is there anyone still using these instrumentation based profilers (vs sampling profiler like perf)? How is it like today?

I stopped using it a long time ago since it was so slow and not thread-safe. But I occasionally miss the comprehensive coverage it had. If the situation has changed, I'd love to give it another shot.

  • AlotOfReading 2 years ago

    Thread safety isn't an issue that I've observed, but it's still dog slow and pretty much always will be. Sampling profilers are generally better when you can use them.

    • gnufx 2 years ago

      Nothing is generally better when it comes to serious performance engineering. The standard introduction to the performance engineering workshops under vi-hps.org stress that you need a variety of tools/techniques available.

  • gnufx 2 years ago

    Instrumentation is widely used in HPC performance engineering, but you may still do sampling, rather than tracing, with the result. Often only a specific set of function calls is instrumented (e.g. MPI, i/o), and often using LD_PRELOAD and hooks provided for the purpose.