Show HN: Copy-and-patch compiler for hard real-time Python

github.com

63 points by Saloc 5 days ago

I built Copapy as an experiment: Can Python be used for hard real-time systems?

Instead of an interpreter or JIT, Copapy builds a computation graph by tracing Python code and uses a custom copy-and-patch compiler. The result is very fast native code with no GC, no syscalls, and no memory allocations at runtime.

The copy-and-patch compiler currently supports x86_64 as well as 32- and 64-bit ARM. It comes as small Python package with no other dependencies - no cross-compiler, nothing except Python.

The current focus is on robotics and control systems in general. This project is early but already usable and easy to try out.

Would love your feedback!

written-beyond a day ago

THAT'S INSANE!

I always wondered if this could be possible. Like you fuzz a program, map out each possible allocation and deallocation and optimize the code with those hot paths and some statistics.

Very interesting project, would love some sort of write up on it.

  • Saloc 19 hours ago

    Thanks for your comment, I'll give a full write up a try.

    I think for deterministic control applications this concept has a sweet spot. While in conventional code the number of branch combinations can blow up easely, here you need to be able to guarantee worst case execution time which forces you anyway to be very carefull with branching.

    On https://copapy.nonan.net/compiler.html is the how-it-works readme section extended with the generated machine code for stencils and a simple example.

genjipress a day ago

This looks like it could become an excellent alternative in time to not just NumPy and Numba, but also Cython. I know that may be more ambitious than your original intentions, but that's absolutely what sprung to mind.

nmstoker 17 hours ago

Impressive stuff but it would be polite to the potentially interested users if the line, "... this package is currently a proof of concept with limited direct use", had been put a little earlier. It's about nine or so dense paragraphs in.

It's fine that it is still in development but it just seems worth being upfront.

TheCodeDecoders 3 hours ago

Try using Nanobind . Even though Python C API is what you aim for Nanobind has smaller binary files than pybind11. It is a reputed project and is deployed in many new libraries. Including mine. Will it work?

bigbadfeline 18 hours ago

> The result is very fast native code with no GC, no syscalls, and no memory allocations at runtime.

I guess, that's only achievable for certain kind of code, already designed with hard real-time in mind. It would be good to have some information about the limitations of this approach.

vsskanth 14 hours ago

How is this different to casadi ?

  • Saloc 4 hours ago

    Casadi uses either an interpreter or emits c-code, where Copapy directly runs machine code. Would be very interesting to benchmark Copapy against compiled Casadi c-code – looking into it.

newzino a day ago

For hard real-time systems, the bottleneck isn't raw speed but determinism. You need guaranteed worst-case execution time (WCET), and GC pauses, syscalls, and dynamic allocation all make that impossible to analyze. Stripping all three out at the language level is the right call.

The copy-and-patch technique is the same approach CPython 3.13 adopted for its new JIT tier. You precompile a library of code "stencils" (one per operation), then generate native code by copying stencils into a buffer and patching in the correct memory addresses. No LLVM, no compiler infrastructure at runtime.

How does the tracing approach handle control flow in practice? The README says conditions must be known at compile time, with cp.iif() as a branchless workaround. For robotics control loops that's probably fine since the structure is usually fixed. A PID controller is all arithmetic. But a state machine with variable-length trajectories would be harder to express this way.

The autograd via cp.grad() is clever for the robotics angle. Inverse kinematics through gradient descent means you can handle arbitrary linkage geometries without deriving closed-form solutions by hand.

  • Saloc 3 hours ago

    In some fields its quite common to implement state machines in an imperative style with if-blocks and flags. That should be possible with Copapy by having decorated functions where the decorator parses the AST and replaces if-blocks with cp.iif() to end up with branchless code.

    However, from my experience this programming style is ok for simple state machines, but it’s definitely not great, and if things get more complex, it’s getting really hard to keep it comprehensible and correct.

    I think the main challenge is the design of an API that fits state machines. Concerning the WCET, branchless code should be on average not worse than branched code.