That's fine for a tight loop. Performance might still matter in a bigger application. This benchmark is measuring the overhead, which is relevant in all contexts; the fact that it does it with a loop is a statistical detail.
Using a cython binding compared to the Ctypes one gives a speedup of a factor of 3. That's still not very fast, now putting the whole thing into a cython program. Like so:
def extern from "newplus/plus.h":
cpdef int plusone(int x)
cdef extern from "newplus/plus.h":
cpdef long long current_timestamp()
def run(int count):
cdef int start
cdef int out
cdef int x = 0
start = current_timestamp()
while x < count:
x = plusone(x)
out = current_timestamp() - start
return out
Actually yields 597 compared to the pure c program yielding 838.
> If you need a fast loop in Python then switch to Cython.
If you need a fast loop do not use Python.
I am a Python hater, but this is unfair. Python is not designed to do fast loops. Crossing the FFI boundary happens very few times compared to iterations of tight loops.
(I have very little experience using FFI, but I am about to - hence keen interest)
The point is Python's exception mechanism is a particularly heavyweight way to do loop control. This benchmark is heavily dominated by that overhead in a way other interpreted languages aren't.
Can be optimized by assigning plusone = libplus.plusone, before using it as plusone(i).
Otherwise it will do an attribute lookup in each loop iteration, Python has no way to assume zero side-effects of function calls, in case lib.plusone was overwritten to something new inside the plusone function.
C FFI takes 123 seconds?! That's pretty insane, but if you mean 123.2 ms, it's still very bad.
Doesn't feel like that would be the case from using NumPy, PyTorch and the likes, but they also typically run 'fat' functions, where it's one function with a lot of data that returns something. Usually don't chain or loop much there.
Edit: the number was for 500 million calls. Yeah, don't think I've ever made that many calls. 123 seconds feels fairly short then, except for demanding workflows like game dev maybe.
All python code generally does is call C/C++ code and you're telling me it is slow to do that as well? Yikes.
It's probably the Python loop that is slow rather than calling the code.
cffi is probably the canonical way to do this on Python, I wonder what the performance is there.
edit: 30% improvement, still 100x slower than e.g. Rust.
If you need a fast loop in Python then switch to Cython.
That's fine for a tight loop. Performance might still matter in a bigger application. This benchmark is measuring the overhead, which is relevant in all contexts; the fact that it does it with a loop is a statistical detail.
Using a cython binding compared to the Ctypes one gives a speedup of a factor of 3. That's still not very fast, now putting the whole thing into a cython program. Like so:
Actually yields 597 compared to the pure c program yielding 838.
> If you need a fast loop in Python then switch to Cython.
If you need a fast loop do not use Python.
I am a Python hater, but this is unfair. Python is not designed to do fast loops. Crossing the FFI boundary happens very few times compared to iterations of tight loops.
(I have very little experience using FFI, but I am about to - hence keen interest)
The point is Python's exception mechanism is a particularly heavyweight way to do loop control. This benchmark is heavily dominated by that overhead in a way other interpreted languages aren't.
Can be optimized by assigning plusone = libplus.plusone, before using it as plusone(i).
Otherwise it will do an attribute lookup in each loop iteration, Python has no way to assume zero side-effects of function calls, in case lib.plusone was overwritten to something new inside the plusone function.
C FFI takes 123 seconds?! That's pretty insane, but if you mean 123.2 ms, it's still very bad.
Doesn't feel like that would be the case from using NumPy, PyTorch and the likes, but they also typically run 'fat' functions, where it's one function with a lot of data that returns something. Usually don't chain or loop much there.
Edit: the number was for 500 million calls. Yeah, don't think I've ever made that many calls. 123 seconds feels fairly short then, except for demanding workflows like game dev maybe.
500 million calls in 123 seconds
I think that's the time to run the whole benchmark suite. Compare to the results for go, for example.