kyrra 5 years ago

There is discussion about C# and Java being faster than Go, but one interesting thing to note is that both C# and Java have to use C to interface with the kernel.

Java: https://github.com/ixy-languages/ixy.java/blob/master/ixy/sr...

C#: https://github.com/ixy-languages/ixy.cs/blob/master/src/ixy_...

Java needs a bit more C to make it work. C# only seems to need it for DMA access. But when you look at the Go code, they got away with being pure Go and using the syscall and unsafe package. So that's at least one plus for Go.

(the main readme calls this out, but at least one thing worth mentioning here too).

As a Java coder for my day-job, I do like the breakdown they have of the performance of the different GCs for their Java implementation. https://github.com/ixy-languages/ixy-languages/blob/master/J...

  • pjmlp 5 years ago

    You can perfectly do in same in C# just like they did with Go.

    We do it all the time in Windows low level coding within .NET, the reason why they didn't beats me, most likely not knowledgeable enough of .NET capabilities.

    As for Java, hopefully with projects Valhala, Panama and Metropolis Java will finally have the performance language features that should have been part of Java 1.0.

    • voltagex_ 5 years ago

      Page 15 of https://www.net.in.tum.de/fileadmin/bibtex/publications/pape.... They couldn't access mlock in .NET. The amount of C involved is tiny.

      • pjmlp 5 years ago

        The document has 14 pages and the only information I see is the mlock description, nothing about how they were unable to use System.Runtime.InteropServices, unsafe and DllImport to make use of it.

        • voltagex_ 5 years ago

          Sorry, wrong link: https://www.net.in.tum.de/fileadmin/bibtex/publications/thes...

          >"As C# cannot call mlock or get a raw pointer from a memory mapped file, DMA memory allocation is performed in C and called with the C# P/Invoke mechanism. Fortunately, this is the only instance of the driver calling a C function and the total amount of C code is only around 30 lines."

          • optimiz3 5 years ago

            Odd that they couldn't call mlock using PInvoke; it sounds like the PInvoke export metadata wasn't there rather than C actually being required.

            On Windows even for really obscure FNs you can almost always PInvoke if you know the offsets, and if you really want to be evil you can traverse the PEB. There isn't much in low level terms that is beyond the reach of C# since you can manipulate memory directly. I've also accessed hidden COM interfaces by traversing V-Tables using similar direct memory techniques as you would in C.

          • pjmlp 5 years ago

            So something like

                [DllImport(..., EntryPoint = "mlock")]
                extern int MLock(UIntPtr addr, uint len);
            
            And then getting the pointer either from MemoryMappedViewAccessor or AllocHGlobal?
  • emmericp 5 years ago

    Neither C# nor Java have C in the hot path; C# uses unsafe mode, Java sun.misc.Unsafe. JNI/C# native calls are either in the initialization or for an alternate implementation to compare them.

  • dep_b 5 years ago

    I can't really believe the Swift implementation needs to be that slow. Objective-C used to be 100% C compatible and Swift more or less has complete bridging to C because of the need to use these API's.

    Objective-C was often called slow because iteration NSArray was much slower than doing it in C. Well, if you needed to do it fast in Objective-C you wouldn't do it using the user friendly and safe (for 1984) higher level objects.

    I think only Rust really allows you to write really safe and still really fast code though.

    • emmericp 5 years ago

      Yes, we could write most of the critical part in C and it would probably be faster. But then it wouldn't be a Swift driver.

      • skohan 5 years ago

        Are these benchmarks single-threaded? I took a brief look at the Swift codebase, and I noticed that you are using semaphores, but there doesn't seem to be any parallel execution anywhere in the project.

        • emmericp 5 years ago

          The Semaphore is only used during initialization, never in the critical path, see profiling results in the main repo

      • jmull 5 years ago

        I don't think you need to write the critical part of the driver in C to speed up the swift implementation.

        I clicked through to a performance analysis showing ARC taking about 3/4 the time (in the release build).

        You don't really need to be doing a lot of ARC in the inner loops if you don't want to.

        • emmericp 5 years ago

          Yes, we do if we want to have an idiomatic interface for the application on top of the driver.

          Pull requests proving otherwise are welcome

          • jmull 5 years ago

            I don’t think it’s idiomatic to do a bunch of unnecessary memory management in inner loops, right?

            • mpweiher 5 years ago

              The memory management is being done by the language.

              • jmull 5 years ago

                ARC is implemented at the language level but it's not required. You can use it when and where you like.

                One fundamental aspect of swift is the distinction between reference types -- which are reference counted -- and value types -- which are not. Generally in Swift you'd use a value type over a reference type unless you have reasons not to. E.g.: https://developer.apple.com/documentation/swift/choosing_bet...

                I mean, I don't know what the right approach for this library is. The authors are going to have to fix their own code. IMO, coming up with a demonstrably poor solution and trying to defend it as "idiomatic" is pretty weak.

      • dep_b 5 years ago

        Well if the C# and Java implementations use C they're definitely not C# and Java drivers. But you don't need to use C directly, just use the C features that are supported by Swift itself.

        • EpicEng 5 years ago

          C isn't in the critical path of either of those two implementations. GP is saying that it would be in a Swift version. Whether or not that's accurate I have no idea.

    • metroholografix 5 years ago

      > I think only Rust really allows you to write really safe and still really fast code though.

      Snabb (https://github.com/snabbco/snabb/) is written in LuaJIT . I assume an equivalent project in Rust would be a lot more expensive in implementation time and also lines of code.

      There is also Common Lisp and SBCL in particular, which can produce extremely fast code without compromising on safety.

  • oaiey 5 years ago

    The Techempower Benchmarks covering a more professional state on that. When coded right, the frameworks typically are on-par in performance in the Plaintext area (where only processing matters). In the end, these drivers were mostly thesis documents.

  • kartickv 5 years ago

    It would be good to have them measure aspects other than performance, like how long it took to build each, was there a learning curve because it's an unfamiliar language, how secure the resulting code is, etc.

    • mr__y 5 years ago

      I'm pretty sure that an exhaustive answer to your last question is "the history will tell" with additional assumption that all those drivers would be deployed in many production environments.

      • kartickv 5 years ago

        An approximate answer is useful too.

    • Arnt 5 years ago

      Can you suggest ways to measure those things?

      • kartickv 5 years ago

        To measure how long it took to build it, track how many days it took to implement in each language.

        To measure how secure the resulting code is, have a test suite of malformed packets or other input and see how many of them the code in each language handles.

        • Arnt 5 years ago

          I'm not qualified to judge the latter. But I can about the former.

          That measures how well a particular developer takes, with some noise induced by the working environment (say there's construction noise during python week, not during java week, or more meetings one week than another, or the developer's has relationship troubles at home). Randomness happens.

          Deducing a number that's more generally valid requires having n workers doing the same work and then doing statistical analysis. Happily that also takes care of the single-worker problem. Still, the cost of the experiment easily rises by a factor of twenty or a hundred, depending on how well the noise can be controlled and how much accuracy is needed.

          Asking for improvements that would increase the cost of an experiment by many thousand per cent is a %$#@!%#@! $%#@!%@#$ thing to do. IMO.

          • kartickv 5 years ago

            The perfect is the enemy of the good. I'll take a rough estimate over no data at all. If the Python guy took 10 days and the Java guy took 20, then the conclusion is that Python is more productive than Java. Is it more productive exactly by 100%? No, maybe 60%, or maybe 120%. But whether the benefit is 60% or 120%, I know which tool I'll choose next time.

tylerl 5 years ago

If you can't see or interpret the graphs (mobile browser, etc.) here's a quick description of the relative performance in terms that might be useful even without the graphs.

Bidirectional forwarding, Packets per second: Here, the batch size matters; small batches have a lower packet rate across the board. Each language has increasing throughput with increasing batch size up to some point, and then the chart goes flat. Python is by far the slowest, not even diverging from the zero line. C is consistently the fastest, but flattens out at 16-packet batch at 27Mpps. Rust is consistently about 10% slower than C until C flattens out, then Rust catches up at the 32-packet batch size, and both are flat at 27Mpps. Go is every so slightly faster than C# until the 16-packet batch size where they cross (at 19Mpps), then C# is consistently about 2Mpps faster than Go. At the 256-packet batch size, C# reaches 27Mpps, and Go 25Mpps. Java is faster than C# and Go at very low batch sizes, but at 4 packets per batch Java slows down (10Mpps), and quickly reaches its peak of 11 to 12 Mpps. OCaml and Haskell follow a similar curve, with Haskell consistently about 15% slower than Java, and Ocaml somewhere between the two. Finally, Swift and Javascript are indistinguishable from each other, both about half the speed of Haskell across the board.

Latency, at 90, 99, 99.9, 99.99.. etc., percentile. 1Mpps: All have zero-ish latency at the 90 percentile point, then Javascript latency quickly jumps to 150us, then again at 99.99%ile jumps again to 300us. C# is the next to increase: at the 99%ile mark there's a steady increase till it hits 40us at 99.99%ile. Then a steady increase to about 60us. Haskell keeps it at about 10us until 99.99%ile, then a steady increase to about 60us, and a sudden spike at the end to 250us. Java latency remains low until 99.95%ile, then it quickly spikes up reaching a max of 325us. Next OCaml spikes at around 99.99%ile, reaching a max of about 170us. Next comes Swift, with a maximum of about 70us. Finally, C, Rust, and Go have the lowest latency. Rust and C are indistinguishable, and Go latency diverges to about 20% higher than the other two at the 99.999%ile mark, where it sways, eventually hitting around 25us while C and rust hit about 22us.

  • gnode 5 years ago

    The Rust page also compares the performance of the Rust implementation using prefetching, which slightly outperforms C for some batch sizes. https://github.com/ixy-languages/ixy.rs#performance

    It would be a bit of a cheat, as it isn't portable, but it would be nice to see prefetching in the C implementation for the sake of comparison.

userbinator 5 years ago

Cross-language comparisons are always interesting to look at; if I had the time, I'd really like to write one in Asm and see how it compares.

I've written NIC drivers for some older chipsets, and IMHO it's not something that's particularly "algorithmic" in computation or could necessarily show off/exercise a programming language well; what's really measured here is probably an approximation to how fast these languages can copy memory, because that's ultimately what a NIC driver mostly does (besides waiting.) To send, you put the data in a buffer and tell the NIC to send it. To receive, the NIC tells you when it has received something, and you copy the data out. Nonetheless, the astonishingly bad performance of the Python version is surprising.

Although I haven't looked at the source in any detail, I know that newer NICs do a lot more of the processing (e.g. checksums) that would've been done in the host software, so that would be another way in which the performance of the host software wouldn't be evident.

One other thing I'd like to see is a chart of the binary sizes too (with and without all the runtime dependencies).

  • emmericp 5 years ago

    Real NIC drivers spend most of their time fiddling with bit fields. It's mostly about translating a hardware-agnostic version of a packet descriptor (mbuf, sk_buffs, ...) into a hardware-specific DMA descriptor.

    If your driver copies memory you are doing something wrong.

  • bsder 5 years ago

    > Nonetheless, the astonishingly bad performance of the Python version is surprising.

    In the paper, they point out that the Python version is the only one they didn't bother to optimize.

    However, my takeaway is that practically everybody can handle north of 1 Gigabits per second (2 Million packets per second x 64 bytes per packet) even on a 1.6GHz core. I find THAT quite a bit more astonishing actually.

    • fgonzag 5 years ago

      I don't see why it's that surprising. We've been stuck on 1Gbps for the better part of 20 years. What's surprising to me is that wired networking was sorta left behind the tech wave, sure 10Gbps exists but it's still not that affordable or widespread.

      • Goz3rr 5 years ago

        I wouldn't say it was exactly left behind, because the average consumer will not really benefit from anything over 1Gbit. 1Gbit is already enough to saturate most consumer harddrives.

        I run 10gbit inside my home and it didn't even cost me that much (if you go with 10Gbit fiber instead of copper) with the sole reasons of getting quicker transfers between my PC and NAS. My NAS has 4 SFP+ ports and functions as a switch. I bought second hand PCIe SFP+ NICs for $40 each and matching transceivers for $15 each. 10M of fiber costs less than $10.

        There's no point in going higher, because 10Gbit is already way past the sequential writing speed of the drive array in my NAS, and it's pretty much saturating the NVMe cache drive in the NAS or the NVMe storage in my PC.

        That's not to say you can't go faster, because 100, 200 and 400Gbit are very much possible and in use in datacenters and the like.

        • cure 5 years ago

          > I wouldn't say it was exactly left behind, because the average consumer will not really benefit from anything over 1Gbit. 1Gbit is already enough to saturate most consumer harddrives.

          That hasn't been true for a long time. Even one single spinning rust hard drive made in the last decade can do sequential reads at ~120-150MiB/sec, which is easily enough to saturate a 1 Gbit/s link.

          SSDs have way, way higher throughput for sequential read and write. Good SSDs will also beat that number handily for random read/writes.

          And of course, any machine with more than 1 hard drive can easily saturate a 1Gbit/s network.

          I also find it surprising that wired networking has been 'stuck' on 1Gbit/s for decades.

      • bsder 5 years ago

        > What's surprising to me is that wired networking was sorta left behind the tech wave

        Lack of necessity.

        Since the telcos are a gigantic bottleneck to everything in the cloud, and now that everything is in the cloud, there is no need for >1Gbps home networking.

      • Arnt 5 years ago

        Because >1Gbps on 1.6GHz means <1.6 cycles per transmitted/received bit, or <8 cycles/byte if you prefer to count bytes.

        That's not shabby for a language like python.

  • ummonk 5 years ago

    Yeah, for real life applications as well, I shy away from Python for anything where performance might one day be an issue. Most languages can at least get within an order of magnitude of state of the art (at which point ergonomic considerations can matter more), but Python is just incredibly slow in practice.

    • mr__y 5 years ago

      >I shy away from Python for anything where performance might one day be an issue if there are only a few bottlenecks that have a significant impact on the performance you could rewrite only those parts to C/Rust. This might be a good aproach especially in a situation where performance is not an issue right now but might be in future. When it actually becomes an issue only the part that actually affects the performance could then be rewritten to C. Of course this approach doesn't always make much sense, but quite often there is a small part of the code that impacts the performance the most and only that part would need to be rewritten, while the rest of the code could still enjoy a language more productive to write in. Similarly a microservice(ish) architecture comes handy with this

      • ummonk 5 years ago

        In a language as slow as Python where even basic memory copying is slow, you don't really just have a few bottlenecks. You can optimize the most important 10% of the code and the rest of the code will still be slowing everything down. It's also a lot of work to have to rewrite parts in C/Rust and interface with Python code.

saurik 5 years ago

That JavaScript and Swift have essentially the same performance here is extremely telling: there are essentially four performance regimes (five if you count Python, but clearly from the graphs you should not ;P), and what would really be interesting--and which this page isn't bothering to even examine?! :(--is what is causing each of these four regimes. I want to know what is so similar about C# and Go that is causing them to have about the same performance, and yet much more performance (at higher batch sizes) than the regime of Java/OCaml/Haskell (a group which can't be explained by their garbage collectors as one of the garbage collectors tested for Java was "don't collect garbage" and it had the same performance). It frankly makes me expect there to be some algorithmic difference between those two regimes that is causing the difference, and it has nothing to do with language/runtime/fundamental performance.

  • emmericp 5 years ago

    Swift spends 76% of the time incrementing/decrementing reference counts; ARC is just very bad at pushing tens of millions of objects through it every second.

    There's some more evlauation for Swift here: https://github.com/ixy-languages/ixy.swift/tree/master/perfo...

    It's just a coincidence that JavaScript and Swift end up with almost the same performance; there is nothing similar between these two runtimes and implementations.

    • skohan 5 years ago

      This is also a clear optimization target. It is very possible to write Swift code which requires very little reference-counting overhead.

      • dep_b 5 years ago

        The problem is that 99% of all Swift developers use the language to create front-ends for powerful devices and you never need to squeeze the last drop of performance out of them.

  • masklinn 5 years ago

    > I want to know what is so similar about C# and Go that is causing them to have about the same performance, and yet much more performance (at higher batch sizes) than the regime of Java/OCaml/Haskell […] and it has nothing to do with language/runtime/fundamental performance.

    The authors specifically call out the issue of avoiding heap allocations when asked about Java v C# (as they're pretty similar languages), noting that they couldn't get under ~20 bytes allocated per forwarded packet in Java. C# (and Go) would have much better facilities to work entirely out of the stack, avoid memory copies and reuse allocations in the main loop.

    I expect Haskell and OCaml have similar issues.

  • Gibbon1 5 years ago

    > want to know what is so similar about C# and Go that is causing them to have about the same performance.

    I looked at the C# thesis. I think with care the programmer was able to reduce the amount of heap allocations and memory copying enough that it's similar to the mix in the go driver. I think also the modern processors ability to execute multiple instructions/code paths in parallel tend to negate the advantage efficiently compiled languages like C and go. So cache misses, heap allocation, and garbage collection tend to dominate over raw numbers of instructions executed.

  • skohan 5 years ago

    > JavaScript and Swift have essentially the same performance here

    One thing that I notice about the Swift version is that he's making heavy use of classes, and in the performance section it mentions that there is quite a lot of time spent on retain-release. There's probably a lot of room for performance optimization in this implementation.

  • bsder 5 years ago

    For Java it could be binary wrangling. You have to do some weird things to unpack binary blobs in Java due to the lack of unsigned types. So, for performance you have to wrangle things through native ByteBuffers correctly or you will get killed.

  • quietbritishjim 5 years ago

    > JavaScript and Swift have essentially the same performance here

    Only for throughput. The latency difference is enormous.

  • gameswithgo 5 years ago

    c# and go both have value typea and reference types, java only has reference types. c# also has a variety of tools for controlling how memory isbused and accessed bs java. avoiding the heap may he the first order difference.

antoinealb 5 years ago

The author of this project presented it last year at CCC, here is the video: https://media.ccc.de/v/35c3-9670-safe_and_secure_drivers_in_...

  • ksangeelee 5 years ago

    Thanks, that was interesting. If anyone is excited enough to try driving peripherals in userspace via hardware registers, I can recommend starting with a Raspberry Pi, since it has several well documented peripherals (UART, SPI, I2C, DMA, and of course lots of GPIO), and the techniques described in this talk are transferable.

    A search for 'raspberry pi mmap' will yield a lot of good starting points.

kerng 5 years ago

Cool to see C# being up there close to C and before Golang.

I haven't used C# much over the last year due to job change but always felt like one of the most mature languages out there. Now working in Go and it's a bit frustrating in comparison.

  • tylerl 5 years ago

    Go isn't designed to feel mature, it's designed to be boring and effective. It's designed to keep code complexity low even as the complexity of problems and solutions increases. It's designed to allow large teams of medium-skill programmers to consistently produce safe and effective solutions. The most precise description ive heard to date is: "Go is a get shit done language."

    • kerng 5 years ago

      C# 1.0 was also boring and effective - modern versions are still effective, and I'd say they are more powerful.

      Golang in 15 years will likely converge and embrace many of the missing features of mature languages (its happening now already), especially if it wants to reach broader adoption.

      A reference like "Go is a get shit done language." very much reflects overall immaturity of the language that I see day to day.

      • grumpydba 5 years ago

        >Golang in 15 years will likely converge and embrace many of the missing features of mature languages (its happening now already), especially if it wants to reach broader adoption.

        Go is 10 years old already and picking new features at an extremely low rate, with no hints at a pace change.

        I think error management and generics should be the only major changes to expect within the next 5 years. C# is more complex by an order of magnitude... And thus its evolution was and is still way faster.

    • pjmlp 5 years ago

      Basically it is designed for writing boilerplate libraries and code generators to cover up lack of language features, which even well known projects are forced to make use of (k8s).

      I bet a G2EE variant isn't too far away.

      • geodel 5 years ago

        It is already there in some form as 'Go Cloud Development Kit' [1]. Though many claim it is not really enterprise scale until a petstore application can be created in it. And a flawless implementation of EJB 2.1 made enterprise fall in love J2EE. I am not sure Go can deliver anything remotely powerful as that.

        1. https://gocloud.dev/

      • grumpydba 5 years ago

        Yet in the infrastructure side, it's much more used than c#. Go figure.

        • pjmlp 5 years ago

          You again.

          What infrastructure, those riding the consulting and conference Docker and K8s 2019 wave fad?!?

          • grumpydba 5 years ago

            Right now I'm using prometheus and grafana to monitor around 8k database servers (sql server too BTW). We have pricing applications using influxdb. Docker. Openstack. Minio. We also have mattermost.

            All of this in a conservative big bank. My friends in the banking sector tell the same story.

            True there are lots of c# enterprisey web apps.

            However given the amount of boilerplate you describe, I cannot understand how such useful and reliable tools can be delivered in Go.

            A hint:just because a language is not to your liking, it does not mean that it is not useful, performant and reliable.

            • joelfolksy 5 years ago

              "However given the amount of boilerplate you describe, I cannot understand how such useful and reliable tools can be delivered in Go."

              I don't follow your logic. Multitudes of useful and reliable tools were built with assembly languages - is that evidence that assembly code doesn't have a lot of boilerplate (relative to modern languages)?

            • pjmlp 5 years ago

              One anecdote doesn't make the IT industry.

              • grumpydba 5 years ago

                I'm talking about the whole banking sector in France.

                • pjmlp 5 years ago

                  And yet I haven't seen any of that on our Fortune 500 French clients, which naturally includes banks, go figure.

                  • grumpydba 5 years ago

                    Are you working in operations and infrastructure? My take is that writing enterprisey applications you are not exposed to those tools. I'm in ops.

                    • pjmlp 5 years ago

                      Not personally, but we do have mixed teams.

                      AWS, Azure, actual hardware racks, plain old VMs, JEE containers, .NET packages, Ansible, Puppet, Chef, whatever scripting stack, but surely not one line of Go related code.

        • gpderetta 5 years ago

          As someone that is neither a C# or Go programmer, I would be very surprised if that's true.

          • travisjeffery 5 years ago

            Here's a list of prominent Go infrastructure projects and by no means a complete list: Kubernetes, Docker, Etcd, Consul (and the rest of Hashicorp's projects), CockroachDB, Prometheus, TiDB. Maybe I'm just blind to C# but I don't remember coming across a single similar project that's written in C#.

            • pjmlp 5 years ago

              Azure, Orleans, ASP.NET, Windows,SQL Server, Bing, IIS, Kestrel,...

              Yep, they aren't pure C#, still way more relevant to the world IT infrastructure than anything Go.

              • Thaxll 5 years ago

                None of this is relevent and or even close to be as popular as the previous tools mention in Go, just k8s and Docker crush your entire list. Another very popular one is Grafana also written in Go.

                • pjmlp 5 years ago

                  In what? Only if we are talking about Github stars or Silicon Valley coffee shops.

                  Fortune 500 prefer to care about actual delivered business value.

                  • Thaxll 5 years ago

                    I worked in two F500 compagnies both are using k8s, Docker / grafana and Go. They also use C# but not in the cloud / operations / infra world.

                    Go talk to some SRE team in F100 and F500 and ask them what they think about C# infra side lol.

                    The fact that C# was running on Windows only until 2 years ago explains why.

                    • pjmlp 5 years ago

                      I only work for Fortune 500s, on project scale where license costs are a tiny drop regarding overall project costs.

                      Plenty them do run production servers on Windows.

                      You forgot there are 498 left to check.

                • kerng 5 years ago

                  That's a very distorted and interesting view of the software industry as a whole.

              • grumpydba 5 years ago

                Sql server written in C#? Please. None of it.

                • pjmlp 5 years ago

                  Then better learn how to use it properly, specially .NET stored procedures, OLAP engine and SSMS.

                  • grumpydba 5 years ago

                    You also have python and R stored procedures. Is it written in python?

                    • pjmlp 5 years ago

                      Yes, the modules that make up the API surface for Python and R respectively.

                      • grumpydba 5 years ago

                        I think saying 'written in X' != 'has API wrappers for X' .

                        • pjmlp 5 years ago

                          In correct English one would state fully, completely written in X.

                          Modern stacks are seldom pure blood language X, thus if 5% of it is written in Y, the product is written in a mix of X and Y.

                          Which I also mentioned on my comment, "Yep, they aren't pure C#", naturally overseen when one intends to champion its language as "Year of Desktop Linux" on IT infrastructures.

                          • grumpydba 5 years ago

                            Playing games with the commonly accepted meaning of words is a known sophism.

                            The Linux kernel is 0.2% of shell scripts, yet no one would say it's written in shell. Same for windows or sql server. .Net is marginal there.

                            Btw I do code in Go, but I mostly use go apps and enjoys their small memory footprint and ease of deployment.

                            I must not be the only one as I see go apps pretty much everywhere in ops teams.

                • kerng 5 years ago

                  Depends on what parts, entire products of the Sql server product line are in C#, like Sql Server Reporting Services for instance.

                  The core sql server rdbms engine is hosting the .net runtime for a few things - but there its maginal compared to C++.

                  • grumpydba 5 years ago

                    Exact. It's marginal. Saying "written in .net" about sql server made the Microsoft PFEs laugh during our coffee break though.

                    • pjmlp 5 years ago

                      Happy to be able to help going through the day in an happier mood.

    • mlindner 5 years ago

      > It's designed to keep code complexity low even as the complexity of problems and solutions increases.

      Honestly I think that's pretty wrong. Go is designed to let you get coding quickly. It's not designed to make your ultimate solution well designed are easily refactorable. In a few years there's going to be a ton of Go code that becomes almost as bad as C where it becomes untouchable because people quickly threw something together and didn't think about long term design.

    • dev_dull 5 years ago

      I didn’t read the thesis for each implementation, but it would have been cool to see how long it took an engineer to write each network driver. I bet the Go version, while definitely not the fastest, was one of the fastest to finish.

chrisaycock 5 years ago

A specific finding from this research is on the front page:

https://news.ycombinator.com/item?id=20944403

Rust was found to be slightly slower than C because of bounds checking, which the compiler keeps even in production builds.

  • mlindner 5 years ago

    Except their answer is wrong, because Rust (LLVM rather) does eliminate bounds checks. They're comparing GCC vs LLVM here more than they are comparing C vs Rust. They should have compiled their C code in LLVM. Their implementation is littered with uses of "unsafe" which means its almost impossible for the compiler to eliminate the bounds checks.

    • GrayShade 5 years ago

      There's a per-packet bounds check here [1] which probably can't be eliminated by the compiler because it cycles over the array. I imagine that's noticeable.

      [1]: https://github.com/ixy-languages/ixy.rs/blob/master/src/ixgb...

      • ChrisSD 5 years ago

        So the bounds check is:

            queue.bufs_in_use[rx_index]
        
        If so the bounds check could possibly be safely eliminated by the programmer because I think `wrap_ring` ensures that rx_index will always be in bounds?
        • GrayShade 5 years ago

          Yes. It wouldn't be too unidiomatic to use get_unchecked in those two places, perhaps with a debug_assert! in place.

          It would be really nice if this wasn't needed, but it's a valid use of unsafe code.

    • csande17 5 years ago

      > Their implementation is littered with uses of "unsafe" which means its almost impossible for the compiler to eliminate the bounds checks.

      Does `unsafe` actually impede optimization in this way? I thought it just disabled certain type checks and error messages but didn't affect anything on the LLVM level.

      • dbaupp 5 years ago

        You're essentially right. The grandparent may be meaning that they're using a lot of raw pointers (which requires unsafe), and raw pointers means less aliasing information, and so less precise optimisations. This can affect bounds checks, because a raw pointer could potentially alias the length field of a slice or vector, and so LLVM has to be conservative around writes to them.

        • leshow 5 years ago

          Are their uses of unsafe necessary? After briefly looking over the implementation it seems like they could just be using references in a bunch of places. Take for example:

             let queue = &mut self.rx_queues[queue_id as usize];
             rx_index = queue.rx_index;
             last_rx_index = queue.rx_index;
          
             for i in 0..num_packets {
                 let desc = unsafe { queue.descriptors.add(rx_index) as *mut ixgbe_adv_rx_desc };
                 let status =
                     unsafe { ptr::read_volatile(&mut (*desc).wb.upper.status_error as *mut u32) };
          • dbaupp 5 years ago

            I've got no idea. One would have to understand the code fairly deeply to tell that. For instance, are the volatile reads/writes meant to be atomic ones (which could be done safely) or are they truly volatile?

    • gameswithgo 5 years ago

      llvm cannot always eliminate bounds checking and gcc vs llvm does not explain all of the difference

chvid 5 years ago

So why the difference in "language" speeds?

You have some the results not quite following the conventional expectation. For example the Swift implementation is as slow as JavaScript. JavaScript is a lot faster than Python. Java is considerable slower than the usually very similar C#.

The implementation is fairly complex; so it is a bit hard to see what is going on. But it must be possible to pin the big performance differences implied by the two graphs to something?

  • ygra 5 years ago

    Python is interpreted bytecode. This means that for every small instruction on the bytecode there's a round trip to the Python interpreter that has to execute that instruction. This is faster than parsing and interpreting at the same time, such as shells often do, but it's still a lot slower than JIT compilers.

    Now, a just-in-time (JIT) compiler transforms the code into machine code at runtime. Usually from bytecode. Java, C# JavaScript all use this model predominantly these days. This takes a bit of work during runtime and you cannot afford too complicated optimizations that a C or C++ compiler would do, but it comes close (and for certain reasons is even better sometimes). So that's the main reason why JavaScript is faster than Python. Theres a Python JIT compiler, PyPy, that might close the gap, though. And for Python in particular there are also other options to improve speed somewhat, one of them involves converting the Python code to C. Not too idiomatic, usually, though.

    As for Java and C#, that's a point where it can sometimes show that C# has been designed to be a high-level language that can drop down to low levels if needed. C# has pointers and the ability to control memory layout of your data, if you need it. This turns off a lot of niceties and safeties that the language usually offers (you also need the unsafe keyword, which has that name for a reason), but can improve speed. Newer versions of C# increasingly added other features that allow you to safely write code that performs predictably fast. But even value types and reified generics go a long way of making things faster by default than being required to always use classes and the heap.

    Java on the other hand has few of those features where the develop is offered low-level control. It has one major advantage, though, in that its own JIT compiler is a lot more advanced and can do some crazy transformations and optimizations. One might argue that Java needs that much magic because you don't have much control at the language level to make things fast, so as far as performance goes between C# and Java this may be pretty much the tradeoff between complicated language and complicated JIT compiler.

    As for which benchmark shows Java being faster than C# depends a bit on how the code was written, but recently .NET has become a lot better as well and popular multi-language benchmarks show C# often faster than Java.

  • csande17 5 years ago

    I'd imagine Python is so slow in this benchmark because it doesn't have any kind of optimizing compiler. All the other languages are either compiled ahead of time or just-in-time compiled into more efficient machine code.

    I wonder how PyPy would do on this benchmark...

  • jsiepkes 5 years ago

    I find the performance of Java rather suspicious. It starts out fast for the smallest batch's sizes but then kind of falls flat for the rest.

AlEinstein 5 years ago

Surprisingly good performance for the C# implementation!

  • fgonzag 5 years ago

    Unsurprising if you've kept up with .net The new primitive types (spans) allow direct low level manipulation of memory slices. A NIC driver, at it's core, really only copies data to and from shared buffers, so it gets a tremendous benefit from this new type.

    C# recently getting new low level memory types definitely gave it the edge there, it does not reflect real world scenarios very accurately.

    • creato 5 years ago

      > C# recently getting new low level memory types definitely gave it the edge there, it does not reflect real world scenarios very accurately.

      In my experience, C# is by a large margin the most performant "managed" language vs. Java, Python.

      • fgonzag 5 years ago

        If you go down to bytecode engineering you can create some seriously fast JVM code. But I agree that making .net go fast seems easier than getting the JVM to go fast (if anything because many of the standard libraries are anything but fast, and you have to go hunting for high performance ones or roll your own).

    • GordonS 5 years ago

      Was going to say much the same thing.

      I ported a C hash function to C# recently, and using the low-level feature that C# offers, performance was very close to the C version.

      C# is really nice to work with for this kind of thing.

      And - .NET Core 3.0 introduced support for hardware intrinsics, so I could probably bridge the gap if I spent some time on vectorisingthe C# code.

  • acchow 5 years ago

    Astonished to see it perform so much faster than Ocaml. I had to double-check to see that they compiled a native binary (ocamlopt)

    https://github.com/ixy-languages/ixy.ml/blob/master/app/dune...

    • emmericp 5 years ago

      The OCaml version is probably our most optimized implementation, it was scrutinized (and improved) by lots of people at the MirageOS retreat earlier this year.

      • e12e 5 years ago

        Any ideas as to why it (ocaml) ends up being so slow?

  • jcranmer 5 years ago

    For me, that was the line that surprised me the most. The .NET VM has had a reputation as being a worse variant of the JVM, but it seems that now the tables have turned.

    • fortran77 5 years ago

      Really? To me it was always a runtime VM done right! The .net CLR is much more stable, leak-proof, and performant in my experience. I have .net services that run on servers for years without ever being restarted.

      Given that C# and "Rust" are neck and neck, I'd rather have a nice GC language to work with.

      • thethirdone 5 years ago

        Go and C# are pretty much neck and neck until the batch size gets large. Rust is always ahead of C#.

      • littlestymaar 5 years ago

        C and Rust are really close, but if you look at latency C# is still lagging behind (Go performs way better in that regard).

    • manigandham 5 years ago

      Where did you see that? I've never come across that reputation before, only that the JVM has more usage (in high-perf scenarios) because it was cross-platform and had a bigger community.

      .NET always had very good performance and the new .NET Core cross-platform framework and runtime is now consistently among the fastest in various performance benchmarks for all kinds of applications.

    • Rapzid 5 years ago

      Oh man, yes they have. For a couple years now the CLR has surpassed the JVM in straight-line performance for many workloads. A lot of effort has been going into providing escape hatches without sharp edges for getting closer to the hardware with the likes of Span<T>, Vec, and the upcoming hardware intrinsics. It always started faster..

    • kasey_junk 5 years ago

      I’d guess (cause I’m lazy and don’t want to verify the code) that it’s because the CLR has much better unmanaged support. It’s a first class feature.

      • Someone1234 5 years ago

        Indeed. It was designed to interact with C APIs directly from day 1 and they've made it more powerful fairly recently with Span<T> and Memory<T> (essentially "safe pointers").

        You can literally develop entire applications in unsafe mode, with C's level of unsafety. Nobody does, but you could.

    • gameswithgo 5 years ago

      it has been around five years since that was even vaguely true

    • meddlepal 5 years ago

      > but it seems that now the tables have turned.

      I'm not sure that's the conclusion to draw from a niche benchmark.

  • BuckRogers 5 years ago

    As someone on Team C#, I bet my career on it after careful consideration and comparison with every other option I had on the table, I had the exact same thought. But it's Microsoft, in my opinion they know software better than anyone around. They cover a lot of ground and fail sometimes but overall I consistently have high expectations from them. I use their platform everyday. Good work, Microsoft!

    • romanovcode 5 years ago

      If only the MS stigma would go away people would realise how amazing C# language is.

      • BuckRogers 5 years ago

        I agree, but to me that's an odd phenomenon. I didn't grow up with that stigma and not sure what age/group it affects more. I started on a Commodore, which used Commodore BASIC (based on Bill Gate's Microsoft BASIC). As far as business criticism against them, I view the underlying economic system that allows and perpetuates it as the culprit. Not a single entity like MS, Amazon, Oracle etc. I'm not a cherry-picker, not productive and it's not rational. Bad actors are smacked down and it would be all of them if they thought they could get away with it. I'm in favor of changing the business model, not harping on the "new bad guy" every few years. Today it seems to be Google.

        I think it's mostly a lack of insightfulness and thought that leads to cherry-picking bad guys when the system is structured in a way where unscrupulous behavior is worth risking. I could be wrong, maybe Microsoft is evil incarnate but at the level I operate on, I don't see it.

        But yes, no reason to not embrace the good parts from Microsoft, or anyone else. That's how I do it. I chose C# to stick with because I believe in what they're doing around it. Huge fan of Blazor, appreciate their focus on long-term support, their product integration, and their excellent tooling. The language is good, and I can get a job doing it anywhere in the country without being in a major metropolitan area. In some of these metrics, I personally don't think you can beat the C# ecosystem.

      • OrangeMango 5 years ago

        The MS stigma isn't going to go away as long as MS insists on shipping telemetry with .Net core.

molyss 5 years ago

That's a very interesting experiment in many levels. Haven't taken the time to look at the paper yet, but I'm curious of how you got your number of pps vs Gb/s in the README :

"full bidirectional load at 20 Gbit/s with 64 byte packets (29.76 Mpps)". sounds like 20Gb/s should be closer to 40Mpps than to 30Mpps. Did you hit CPU limits on the packet generator, or am I missing some packet header overhead ?

Did you try bigger that 64-byte packets ? I'm curious how various runtimes would handle that.

And how long did you run the benchmarks ? I couldn't really figure it out from the github or the paper. Mostly wondering if java and other Gc'd language showed improvement or degradation over time. I could see the JITs kicking in, but I could also see the GCs causing latency spikes.

  • benou 5 years ago

    > am I missing some packet header overhead ?

    Yes: Ethernet adds 20 bytes: 8 byte preamble/start of frame delimiter + 12 byte interframe gap

    => the "on-the-wire" size is actually 84-bytes

    => 20Gbps/84-bytes = 29.76Mpps

    > Did you try bigger that 64-byte packets ? I'm curious how various runtimes would handle that.

    In typical forwarding, packet size does not impact forwaring that much until you hit some bandwidth limit (PCIe, DDR and/or L3 cache) because you only touch the packet header (typically the 1st 64-bytes cacheline in the packet). The data transfer itself will be done by NIC DMA.

    • emmericp 5 years ago

      PCIe bandwidth also decreases with increasing packet size as there's a lot of overhead per packet. Memory isn't used, it's all handled in cache, hitting main memory is super slow.

  • yaantc 5 years ago

    You're only considering the MAC level size of 64 bytes but there is also the physical layer overhead, which pushes the effective size of a packet to 84 bytes (see [1]) with a 7 bytes preamble, a 1 byte start of frame delimiter and 12 bytes of inter-packets gap. If you use 84 bytes with 20 Gbps you get the 29.76 Mpps.

    [1] https://en.wikipedia.org/wiki/Ethernet_frame

azhenley 5 years ago

The fact that Go is slower than C# really amazes me! Not long ago I switched from C# to Go on a project for performance reasons, but maybe I need to go back.

  • zeeboo 5 years ago

    It’s only slower at the highest batch sizes. I’d say their performance on throughput here is comparable, except the Go version has much better latencies (don’t be confused by the first graph like I was: that green line at the top is actually javascript).

    • ummonk 5 years ago

      Thanks. I was looking at that and scratching my head as to why Go's latency was so bad here...

  • apta 5 years ago

    What made you come to the conclusion that golang was faster than C#? The hype and claims we see in blogs that are not backed up by anything?

    Both C# and Java are faster than golang.

    • hermitdev 5 years ago

      Usually where I see C# slow down is not because of the language, but because of over engineered "enterprisey" solutions that Java has a bad rep for e.g. having things like a FactoryProviderFactory type idioms.

      A lot of the projects I work on, for instance, heavily utilize dependency injection for no gain. There's only one implementation, theres no test mocks. Its just overengineered and obsfuscated for no reason.

      Coming from a predominantly C++ background, we eschew virtual wherever possible, favoring compile time polymorphism to runtime whenever possible, because we're cognizant of the overhead of the indirect dispact and likely loss of optimized opportunities to inline trivial calls.

      For sure, one can write C# or Java that can keep up, or even outperform C++ in some circumstances, but youre not going to do it with "enterprise" patterns hiding behind interfaces and factories and dependency injection.

      • pjmlp 5 years ago

        That isn't how Turbo Vision, OWL, VCL, MFC, Motif++, PowerPlant, C Set++, ATL, Qt, Unreal, COM/UWP, wxWidgets, JUCE look like.

        There are the CppCon talks, the Modern C++ advocacy, and then there is the code that everyone at most corporations actually write.

        • blt 5 years ago

          Virtual dispatch is particularly well suited for constructing dynamic GUIs at runtime. Doesn't mean that "everyone" is writing code like that.

          • pjmlp 5 years ago

            COM/UWP is not only for GUIs, it is the full area of modern Windows APIs.

            Then there are ORM like the ill fated POET.

            Yeah just like not everyone is writing code that "eschew virtual wherever possible, favoring compile time polymorphism to runtime whenever possible", specially on large corporations with mixed language teams.

            Beyond C++ conference talks, I am yet to see stuff like SFINAE and tag dispatching in the C++ code I occasionally deal with. Grated those are libraries that get called from Java/.NET projects.

            • hermitdev 5 years ago

              I have written a fair amount of C++ template metaprogramming and policy based libraries. One library I wrote, in particular, was a templated generic matching engine primarily used in the self-clearing of trades. Through template policies, it could be configured to do one-to-one, on-to-many, many-to-many matching based upon template args, for example. I also did a bit of SFINAE in writing a home-grown ORM lib. I haven't really written any libs using tag dispatching, but I've certainly used my fair share (looking at you, Boost MultiIndex).

              You don't usually see these sorts of types wrapped for Java or .Net, and if they are, you usually have some sort of proxy in between to hide the templates.

    • gouggoug 5 years ago

      Saying "The hype and claims we see in blogs that are not backed up by anything?" followed by "Both C# and Java are faster than golang" without substantiating the claim is the pot calling the kettle black.

      If you do have links to share, however, please do.

    • Thaxll 5 years ago

      I mean this benchmark itself proves it's as fast or faster, the latency is well lower than C# and Java and probably the memory usage too.

      Now for some "popular" benchmarks:

      https://www.techempower.com/benchmarks/

      https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

      You see that Go is sometimes faster sometimes slower but the memory usage and latency is way bellow both of those languages.

      • apta 5 years ago

        I know of the techempower benchmarks. You'll notice that the go results that end up scoring highly are using "atreugo", a customized low allocation implementation. That's how it gets its speed. Java and C# on the other hand are using fully fledged framework implementations (e.g. Vertx or ASP.NET).

        You end up with a highly customized implementation not suited for wide use to get the higher performance benefits (and it still doesn't beat java on benchmarks like single/multiple queries and JSON serialization).

        All these benchmarks should be taken with a large grain of salt. The golang compiler doesn't even pass function variables in registers (they're all stack allocated as far as I know), let alone do any of the advanced inlining and optimizations the JVM does.

        • Thaxll 5 years ago

          This is not accurate, if you look at the dependencies of the project: https://github.com/savsgio/atreugo/blob/master/go.mod it uses well known libs like github.com/valyala/fasthttp.

          • apta 5 years ago

            Which is still a specialized library, and not widely used like golangs standard library. And comes with its own set of disadvantages

            • Thaxll 5 years ago

              Vertx is not part of the standard library either, saying that Fast HTTP is a spezialized lib is very missleading since it's widely used.

    • thethirdone 5 years ago

      > Both C# and Java are faster than golang.

      In this case Golang outperforms (in terms of throughput) Java on batch sizes > 4 and does so by nearly 2x at batch size of 256.

  • non-entity 5 years ago

    Not sure of this was a web project, but I imagine when you add an entire framework and web server, you may see less performance than a small binary, regardless of the respective language speed

non-entity 5 years ago

Is there a compelling reason to write high level user mode drivers like this over traditional kernel drivers? I remember finding this repo a few years back and being fascinated.

Shorel 5 years ago

Rust has definitely earned my respect.

Someone add D lang to this test! I want to know!

mister_hn 5 years ago

It misses C++

  • pjmlp 5 years ago

    While C++ is way better than using C, it doesn't forbid "writing C with C++ compiler", does rendering useless all the safety features it offers, if one isn't allowed to tame the team via static analysis tooling.

Katzenjammer 5 years ago

Rust comes off looking good here, which to me is no surprise. C#'s really good showing was a surprise to me though. Microsoft has done some impressive work.

yc12340 5 years ago

I am calling in question validity of this project as a benchmark.

The author asserts, that "it's virtually impossible to write allocation-free idiomatic Java code, so we still allocate... 20 bytes on average per forwarded packet". This sounds questionable, — does that mean, that he actually performs a JVM memory allocation for _every_ packet?! Furthermore, the specifics of memory management look murky. One implementation uses "volatile" C writes [1] (simply storing data to memory). Another implementations of the same thing uses a full CPU memory barrier [2]. Which one is right?

In my opinion, significant inconsistencies between implementations render any comparison between them invalid. And when a whole cross-language test suite is written by one person, you can be sure, that they don't really excel in many of those languages.

This is why I like Benchmark Game — all benchmarks are submitted by users, so they are a lot closer to how a real-world decent programmers can solve the problem. Still not perfect, but at least that counts as an attempt.

1: https://github.com/ixy-languages/ixy.java/blob/fcad50339e537...

2: https://github.com/ixy-languages/ixy.java/blob/fcad50339e537...

  • emmericp 5 years ago

    Java reaches 52% of C speed in the benchmark game ("fastest measurement at the largest workload" data set, geometric mean), we reach 38%. Seems like our implementation is within a reasonable range for something that's usually not done in Java.

    A full memory barrier is not required, but some languages only offer that. For example, go had the same problem. It's not a bottleneck because it goes to MMIO PCIe space which is super slow anyways (awaits a whole PCIe roundtrip).

    And no, it obviously wasn't written by only one person but a team of 10.

    No, we are not saying that we allocate for every packet. We say that we allocate 20 bytes on average per packet.

  • masklinn 5 years ago

    > The author asserts, that "it's virtually impossible to write allocation-free idiomatic Java code, so we still allocate... 20 bytes on average per forwarded packet". This sounds questionable

    The code is public, I'm sure they'd be happy to have your insight and fix this issue, it doesn't seem like they were happy about it.