TruffleC: A C implementation on top of JVM (2014)

dl.acm.org

82 points by sigsev_251 3 years ago

JVM was originally designed for set top boxes (STB) I believe, the problem there being a variety of architectures, thus a Virtual Machine solved the problem of write code to the virtual machine and not the physical machine and applications could run on various STB's.

OpenTV provided such a 'middleware' where the main language was C on such a VM, and was (possibly still is) widely used.

captaincaveman 3 years ago

Downvoted, I suspect because people don't believe Java was originally invented for STB aka Interactive TV https://www.javatpoint.com/history-of-java there are other references to this on the web too.
I am possibly conflating JVM with Java, however I was under the impression they was designed as one to begin with.
- TimTheTinker 3 years ago
  
  The language, the compiler, and the JVM were developed at Sun starting in the early 90s at Sun under the leadership of James Gosling.
  
  salmo 3 years ago
  
  Gosling started more abstractly based on virtual machines he had built in the past for some pretty bizarre hardware. This was to be general-purpose.
  Set top boxes were a very early use case. It coincided with the boom of these types of systems that were growing past the traditional “embedded” boundaries. Gosling and the team were working on that when it was still Oak and hadn’t been branded.
  There’s an amazing video of him telling the story. I think it’s on YouTube.
  
  exikyut 3 years ago
  
  What sort of bizarre hardware? That sounds interesting!
BirAdam 3 years ago

Yeah, the information superhighway was NOT originally the web. Many influential people, like BillG, thought interactive TV and other such innovations would be the “big thing” until Netscape came out and began to completely transform the industry. Then, after a short bit, Netscape began supporting Java Applets and the world was made slightly crappier than it might otherwise have been.
EDIT: I say crappier due to security flaws that were present with JVM and Applets early on

tpoindex 3 years ago

NestedVM was another early attempt (2004) to run C, C++, Fortran, etc. on the JVM, but at substantial performance degradation. This was a clever hack that ran a Java coded MIPS emulator and loaded a compiled binary into a large array. Performance was 20-40% of native speed, but still fast enough for some applications, such as SQLite as a JDBC driver.

Website, source repo, and paper seem to still be available, try archive.org for dead links:

http://nestedvm.ibex.org/

drdebug 3 years ago

Is there an implementation that can be used for experimenting or verifying performance claims? I could not find anything. edit: Found a reference there in case it helps others https://www.graalvm.org/22.0/community/publications/

rschatz 3 years ago

I don't think the original TruffleC is available anymore. But TruffleC evolved into the GraalVM LLVM runtime. We're now running the C programming language on GraalVM indirectly, by compiling it to LLVM bitcode first. The core idea is still the same from back then, but it's a lot less work to implement.
The code is available at https://github.com/oracle/graal/tree/master/sulong

th0ma5 3 years ago

Only 7% slower seems amazing. To me at least.

pantulis 3 years ago

The JVM runtinme is not easy to beat performance-wise for what it does, wonder how many man-hours have been put by Sun, Oracle et al into the platform itself.
- languageserver 3 years ago
  
  The JVM is an absolutely beautiful constructed software and protocol. I hope it stands for millennia, just like the colosseum, even if not in active use.
  
  MaxBarraclough 3 years ago
  
  I think it will do better than just stand. In recent years there's been remarkable progress in low-pause garbage collectors, and ahead-of-time compilation. I'm not sure about the status of the RISC-V port, but that is (or will be) another nice addition.
  
  samus 3 years ago
  
  The RISC-V post is gonna be shipped in Java 19.
  
  BenoitP 3 years ago
  
  Nice!
  Is it still interpreter only, or will we have C1 and C2 JIT support as well?
  
  samus 3 years ago
  
  All of them. Basically, JEP 422 is merely merging it into OpenJDK. Caveat: only the RV64GV (general-purpose 64bit with vector instructions) is supported.
  https://openjdk.org/jeps/422
  
  pantulis 3 years ago
  
  What's the protocol part you are referring to? The proper bytecode?
  
  yakorevivan 3 years ago
  
  Absolutely correct. And frankly I don't understand the hate people have against Java. Especially the new generation developers. Maybe the language part only. But have seen much hate towards JVM too.
  With projects like loom, valhalla, graalVM, JVM/Java is everything a modern language/runtime needs, plus a lot lot more.
  I frankly believe the only other commercially viable language that has similar philosophy to JVM development, is Rust.
  Opinions?
  
  pron 3 years ago
  
  I'm not sure I understand the comparison to Rust. The JVM is a virtual machine, analogous to the LLVM virtual machine that Rust targets. But that aside, I don't agree with the sentiment. Rust and C++ are languages that emphasise full low-level control. As such they offer different constructs the programmer chooses from for different characteristics -- e.g. virtual vs static dispatch -- in cases that are abstracted into a single abstraction in the JVM which then relies on the JIT compiler to profile the execution and pick the best implementation (virtual or static dispatch). Java, therefore, aims for a balance between ease of use and performance -- with an emphasis on _amortised_ performance -- whereas C++/Rust aim for maximum control with an emphasis on the worst case.
  As for the hatred to Java, some of it is due to experience with "old Java" which is then negatively compared to some "new X" (rather than comparing "new X" to "new Java"). Some just comes from popularity. In 20222 Java is the dominant server-side language, with no other language coming remotely close in that domain. In 2002 Java was also the most popular server-side language. Very few languages have ever achieved such success for such a long time (e.g. COBOL and PHP never did), and I believe the list includes just C, Java, JavaScript, and, to a lesser extent, Python. Of those four, only C, Java, and JavaScript are commonly/mostly used in large projects, and people often hate large codebases. Those three languages receive roughly equal amounts of hate. C++, which isn't as popular but is also mostly used in large projects, also receives a lot of hate.
  I think people believe that some language X easily fixes some flaw they see in Java, but so far it's come at the cost of other shortcomings they don't see, which explains why X never becomes as popular -- which makes those people disappointed -- which, in turn, means that it rarely has big, old codebases, and so is never hated but just fades to remain fondly-remembered, or lingers as a niche language (although sometimes a not very small niche). This is somewhat like the Betamax vs. VHS debate. A relatively small group of ardent fans liked Betamax because it was superior in a metric they cared about, but was inferior in metrics other people -- a larger group than the first -- cared about.
  In short, I think it's a combination of unfamiliarity with "new Java", the language's extended popularity, and its use in large codebases.
  
  pjmlp 3 years ago
  
  Forgetting about .NET?
  
  pron 3 years ago
  
  No, but it's hard to define what .NET is, because while there has been something called .NET for 20 years, it hasn't quite been the same thing. Is .NET now (.NET Core) the same platform as .NET of 2005?
  Also .NET is not as popular, and its adoption is also a little weird. It's quite popular overall, but doesn't dominate any domain. It's used on the server but not nearly as much as Java; it's used on the client, but not nearly as much as JavaScript. I think it's still largely confined to Microsoft shops.
  So both in longevity and popularity it's not quite in the same class as those others.
  
  pjmlp 3 years ago
  
  Java surely hasn't taken any bite out of .NET desktop development, or the game industry, with exception of Minecraft.
  Regarding defining what Java is, what about Android Java, real time Java, Java on mainframes, forks like microEJ, Graal, OpenJ9, Azul....
  Just as hard to define.
  
  pron 3 years ago
  
  I never said Java took a share of .NET's desktop, but the dominant client-side platform these days is neither Java nor .NET. And by Java I mean the 27-year-old Java SE specification (real-time Java makes up for a negligible portion of Java's market share, and while JavaCard is incredibly popular, I count it separately).
  
  pjmlp 3 years ago
  
  You're right, it is Android Java, which isn't Oracle's Java.
  On a .NET/Java shop and we have like 50% of both stacks in every, single, project.
  Nowadays I tend to spend most of my time on the .NET side, because I get to enjoy Valhalla, Vectors, async/await, TPL, since years now, instead of waiting for what might never come.
  I get to be on both sides of the coin, which would be advisable to the Java team to better know the competition.
  
  pron 3 years ago
  
  Android "Java" isn't Java, full stop, because Java is a specification for a platform and there has never been a Java platform specification that Android conformed to. But the important aspect for this particular discussion is that both longevity and popularity apply to Java SE.
  As for competition, we follow what's going on in .NET, JavaScript, Python, Go, and even far less popular languages such as Haskell and Erlang. But regardless of your personal preferences -- some share them while for others Java's superiority in GC, monitoring, compilation, backward compatibility, and language simplicity matter more -- C#/.NET is not yet in the same club as C, Python, JS, and Java in terms of longevity and popularity, and that's what I'm talking about here.
  
  languageserver 3 years ago
  
  > C#/.NET is not yet in the same club as C, Python, JS, and Java in terms of longevity and popularity
  C# is more than 20 years old, it runs on everything and everywhere. It has multiple toolchains and IDEs
  
  pron 3 years ago
  
  It's not really more than 20 years old because the current platform is not backward-compatible with the old one, and even if it were it doesn't have the same status and/or popularity as those other languages.
  
  hawk_ 3 years ago
  
  > And frankly I don't understand the hate people have against Java.
  There are two kinds of programming languages : ones that people complain about and ones that no one uses.
  
  doctor_eval 3 years ago
  
  For my part the problems with the JVM are manifold:
  - it’s a heavyweight blob that you have to ship with your binary. It can be many times the size of the thing I’m trying to run.
  - it’s a word salad of technologies that apparently I’m supposed to care about
  - many of the claims of how great it is are really just ongoing claims of how great it’s going to be.
  To the latter point, value types (discussed in this thread) have been discussed since I was still using Java - a quick Google shows results from 2014. I haven’t written Java code commercially since 2018. I can’t find anything suggesting it’s here yet.
  Keeping track of all these technologies and trying to understand when they will arrive and make my day to day life better was a nightmare.
  Compared to that, I can build a Go binary, scp it to some machine and run it. That’s not possible in the JVM world, at least not without a level of calisthenics I’m simply no longer willing to deal with.
  GraalVM and friends may be wonderful technologies, but honestly I don’t care. I don’t want to have to thread the needle of this ecosystem every time I have work to get done.
  
  edg5000 3 years ago
  
  The JVM is a beautiful place for code to run, and the language is great as well. The only and main weakness is interfacing with the OS and libraries. TCP and file access is builtin and is no problem, but to access a serial device you'd need JNI. To interact with OpenGL, you need JNI. To interact with the Linux kernel, JNI. C/C++ library: JNI. JNI is fine if you have no other choice (e.g. to invoke Android's Java-only SDK's, e.g. for Bluetooth), but voluntarily, much less so. It is much less of a pain to invoke Linux or Windows C libraries (or Apple's Obj-C APIs) from, say, C++ than it is from Java.
  
  origin_path 3 years ago
  
  Panama is giving Java/JVM a much better FFI. You won't need JNI anymore after that. That said, it won't let you call C++ classes. But that's of course normal.
  
  pjmlp 3 years ago
  
  What I miss from Panama, is that it is still early steps, even with jextract it is a bit more involved than using P/Invoke.
  
  samus 3 years ago
  
  TCP and file system access use JNI under the hood as well. For most other use cases there are already libraries that provide wrappers. And after Panama is shipped, they will eventually be upgraded to use that faster FFI. Btw, aren't serial devices treated as files under Unix?
  
  pantulis 3 years ago
  
  My opinion: people dislike the Java platform mainly for the language, perhaps too ceremonial and verbose for today's trends. Also older developers remember Java from the J2EE/Struts/applets days as something overarchitected, slow and with cumbersome tooling, but that would not apply to the younger cohorts. Maybe a new programmer just sees Java and thinks "legacy", and we love to feel we are on the edge of technology.
  
  nmhancoc 3 years ago
  
  I agree, it's mostly the language.
  However, remember that newer developers get onboarded by older developers. I lead the charge a few years ago to get my then team to adopt Java8 style streams and Vertx, otherwise the stack was the typical SpringBoot annotation affair with the downsides you mention.
  In a similar vein, one of Java's touted strengths is the package ecosystem but those packages often are written in the same class-hierarchy-heavy style. When the code you see and use is written in that style, writing in that style becomes the default unless you actively choose to do something else.
  
  vips7L 3 years ago
  
  > I agree, it's mostly the language.
  Which I don't understand at all. Modern Java is far more expressive than Go which everyone seems to love. Streams, switch expressions, records, pattern matching, and variable type inference all make Java an extremely expressive and fun language to write while also maintaining readability for when you come back to the code later.
  
  pjmlp 3 years ago
  
  We also remember that when compared with CORBA and DCOM, it was a pleasure to work with.
  
  brabel 3 years ago
  
  What do you mean by similar philosophy? The JVM and Rust are quite different in every aspect, practical and philosophically speaking... except for both having lots of corporate backing perhaps?
  Perhaps you meant to say WASM, which is indeed very similar to the JVM in goals and philosophy.
  
  pjmlp 3 years ago
  
  > I frankly believe the only other commercially viable language that has similar philosophy to JVM development, is Rust.
  Not really, .NET ecosystem is the only match to Java in tooling.
  Hence why during the last 20 years I enjoy both platforms.
  
  Banana699 3 years ago
  
  I think you're (perhaps intentionally) overestimating the hate the JVM gets to paint Java critics as unreasonable, but the fact of the matter is that the JVM and Java the language are light years apart in quality and (subsequently) the attitude they get from their users. One is a decent VM with man-centuries of work and inspiration from academia (Self, whose VM heavily influenced hotspot), the other is an ugly mess of a language whose syntax and semantics is pure unfiltered paperwork and bureaucracy.
  I have elaborated plenty of times on why Java is a 1980s language that were obsolete as soon as it was released, the latest is my comment on https://news.ycombinator.com/item?id=32128271, which I reproduce in full at the very bottom of this comment to save you a ctrl-f.
  >loom, valhalla, graalVM
  Every single one of those has nothing to do with Java and everything to do with the JVM as a high-tech psedo-OS that challenges the classic preconceptions about performance-productivity tradeoffs. I don't want to attribute dishonesty to you, but again, I see absolutely no reason to confuse a VM with a (incredibly inferior and badly designed) human-level programming language just because it happens to be the first language to run on the VM.
  >frankly believe the only other commercially viable language that has similar philosophy to JVM development, is Rust.
  Come again? How is Rust similar or even comparable to the JVM ?
  ----------
  Reproduced Comment
  ----------
  >>>>I'm a Java hater. Here are the reason I hate it for
  - Baking the difference between primitives and objects into the language itself : an ugly mistake with far reaching consequences, made by a language designed in 1995 while another designed in 1980 (smalltalk), in 1991 (Python) and 1995 (Ruby) all didn't fall for it.
  The difference is an irrelevant VM-level optimization detail, there is no reason to uglify the human-level language with it. Once the initial mistake has been made, the correct response was NOT to make the even uglier hack of wrapper classes, but to make the primitives objects in the newer releases of the language, this won't break old code, as valid uses of objects are a superset of valid uses of primitives, except perhaps that objects need to be allocated explictely with "new", but this can be a special case for primtives (i.e. "int is a special kind of object that you don't need to allocate explicitly"). The compiler can figure out whether it needs to be represented as objects or as primitives, you can leave hooks and knobs for people to tell the compiler they need to the primitives to be represented as primitves, but it shouldn't be mandatory.
  - Baking in choices about object representations : Like the fact that objects are always passed by reference, or that they are always allocated on the heap. Why the "always" part ? why not give developers the choice between pass-by-value and pass-by-reference like C# does ? why not give developers the choice to allocate on the stack (and complain as loud as you want when they want to do something unsafe with it, like escaping from methods), which, unfortunately, even C# doesn't ?
  Everytime you see something like "foo deepCopy()" that's a failure of the language, forcing you to explicitely pay attention to the fact that foo objects need to be copied deeply everytime they are copied, instead of just once when you define the object by marking it as a "struct" or whatever word to signify that object has value semantics, and then deep copy is just assignment or passing as a parameter. Why make it the default to be inefficient with the heap when it's very easy to give developers the choice to be efficient in situations where it's always safe ?
  - No operator overloading : I get the hate, it's a powerful tool. But it's misguided to ban it, operators should not be special, languages like Haskell and Raku go even further and allow you to define new operators entirely and control their predence and other things. You don't need to go that far, why can't objects use the already built-in symbols the language support ? because it might be confusing ? anything can be confusing, you can write assembly in any programming language, and it will be even worse than assembly because of the more powerful and obscure abstractions.
  - Generics : The overall theme of forcing you to do things its way seems to a staple with java. Why do I need to use type-erased generics ? why shouldn't I get the choice to specify whether I need a new class generated for runtime efficiency or use the type-erased catch-all for size efficiency? there is no need to bake VM-level support for this, it can all be done at compile time (possibly with help of additional metadata files or special fields in the .class of the generic type).
  - Overall verbosity : Why "extends" and "implements" ? do you really need to know whether you're inheriting a class or an interface ? and can't those be lighter symbols like "<" and ":" perhaps ? why is "private/public/protected" a must in front of every method and field ? most people align fields and methods by their visibility, C++'s way is that you declare "public:" and then everything declared below that is public. In the worst case you can always recover Java's way by "public : <method> ; private : <method> ; public : <method>" and so on, but it's nice to at least have the choice of not repeating yourself.
  Why aren't any constructors generated ? there are at least 2 very obvious ones : the empty one, and the one that assigns all the non-defaulted fields (and can take optional arguments to override the default fields). Why aren't generated getters and setters available with a small and light request, like C#'s "get ; set ;" ? Java's design is just full of things like this. It feels like a weird sort of disrespect for your time, "yeah you must write those routine 25 lines of code all by yourself, you have anything better to do?", how about actually writing my application instead of pleasing your language with weird and unnecessary incantations ? It's like a modern COBOL.
  - Horrible OOP excesses : Not really the language's fault (except that it encourages verbosity and loves it) and already mentioned, but worth mentioning again.
  Overall, I treat java as assembly. I write kotlin in my spare time, and whenever I'm confused about the semantics of some construct I make intellij show the bytecode then hit "decompile" to see a Java rendition of the code, the exact semantics will be obvious but verbose. A language that took this literally is Xtend, a high-level augmented java which transpiles to java and is a strict superset of it, but with option that the Xtend compiler figures out all the verbosity for you. Groovy also takes the "Superset and Augment" approach but doesn't transpile. And off course Kotlin is very good with it's interoperability, every JVM language is but Kotlin's mixture of being close to Java semantics (unlike say, Scala or Clojure) and Intellij excellent support for mixed projects makes it at least somewhat special.
  I like the JVM and it's cutting edge research and performance, and these days the Java standard writers seem to show signs of finally waking up to reality after years of being behind every mainstream language, and they regularly augment and modernize the language. But you can't undo 20 years or so of bad design, not easily and not painlessly.
  >Indeed, tooling is the new syntax.
  Very much agreed, long long gone are the days when a compiler or an interpeter is the only thing expected out of a language. But it's not a panacea to treat any bad design, at best it's just a band-aid for bad designs that makes them barely berable. The language has to be designed from the start with the knowledge of "this is going to run in an IDE" baked in to make full use of the full range of fantastic things an IDE can do.
  ---------------------
  
  kaba0 3 years ago
  
  I won’t reply to all of these, but primitives:
  With the initial, interpreter-only operation of the JVM, performance considerations were very imminent. Generics, given the context made a correct choice, and the maintainer team’s vision even back then was quite right, Valhalla seems to be able to heal the rift between primitives and objects. So in this view auto-boxing was again a sane choice.
  Regarding stack/heap allocation: I believe the beauty and longevity of Java code is partially thanks to their avoidance of over-specifying language semantics. Sure, C# may have won a few percent better performance by introducing yet another feature, pushing the responsibility onto the developer. So only a select few programs are effected. While Java’s continued improvements to the JIT compiler, GC and escape analysis brought even decades old code bases written for the first versions of Java considerable performance upgrades for free. Of course, value types are still an important performance win left on the table, but it will be solved - yet again, in a not over-specified way. Primitive and value types as per the current view only specify semantics, leaving the allocation strategy up for the JIT compiler to optimize.
  Operator overloading is a difficult feature, we have seen plenty of failed attempts and a few where it seems to work fine. I also think that partial operator overloading (a la Rust, adding an Add, Multiply, etc interfaces with single methods corresponding to + and *) could be useful, but left uncontrolled it can be a catastrophe ( a <<! b, what does it mean?). Nonetheless, it is not that big of a pain point, using BigInteger::add 3 times from time to time is more than fine.
  Generics were executed very elegantly in my opinion given the constraints. It has some edge cases, but reification is overblown as a problem and overloading may come once primitives and objects are unified.
  Verbosity: come on, seriously? I didn’t benchmark it, but I am fairly sure I actually press less keys in a good IDE writing Java than writing the equivalent in a “more concise” programming language , while the end result will be that much more readable. Like, honestly you believe that programming takes long time because you have to write class A extends B instead of class A <: B? Like, you spend multiple orders of magnitude more time over a single line thinking. Writing is never the bottleneck.
  
  vips7L 3 years ago
  
  Have you read the JEPs? The primitive/object split is being fixed and value classes are coming. The rest of your post really just seems like preferences, personally I do not like operator overloading (scala ptsd), and I don't see the value in < or : over extends and implements, we all type > 120 word per minute, and I think Goetz's plan for withers over records will solve the properties problem.
  The fact that an experienced developer has to transcompile Kotlin to Java to understand some parts of it also doesn't seem like a plus to me.
  
  Someone 3 years ago
  
  I think having both int as a value type and Integer as an object, etc. is defensible given what the original Java had to run on. The thing I never understand, however, is why, at the same time, they went with a language that allows one to use synchronized on any object.
  Because object allocation and the use of an allocated object in a synchronized part can be far away from each other both in time and in source code (quite possibly even in different jars), I would think that creates object overhead that’s hard to optimize away.
  What would Java have lost if it required a synchronizable keyword on class definitions to allow callers to synchronize on instances of a class?
  
  batmanturkey 3 years ago
  
  I dislike everything the JVM stands for and is. There’s no real gentle way to put that. I don’t support the notion of a fat runtime platform. The LLVM IR code is a much better conceptualization of where and WHEN code executes. My feelings about Oracle and the entities around Java are also less than optimal. I only feel free to be so blunt as you asked for opinions. I gather that you disagree.
  Rust is rather unrelated. It’s a compiled static language with no runtime, in which you manage memory. The JVM is a thick VM and you basically only get to script it, which is fine and all considering it’s Turing complete, etc, but you really aren’t anywhere near the metal where rust can go bare metal with no OS or stdlib
  
  chrisseaton 3 years ago
  
  > The LLVM IR code is a much better conceptualization of where and WHEN code executes.
  Funny - because LLVM IR makes the 'when' (the required partial ordering of instructions) implicit and so both too loose and too constrained at the same time, because it's a linear IR, while the JVM's Graal and C2 IRs makes the 'when' completely explicit and a first-class part of the representation, because they're graphical IRs.
  
  kaba0 3 years ago
  
  Could you please expand on this/link me somewhere? I am not familiar with LLVM, and I am only familiar with the JVM spec (currently in the process of writing a templated interpreter), but not yet familiar with OpenJDK’s existing code base, nor a complete JIT compiler.
  
  chrisseaton 3 years ago
  
  Given this C code:
  int foo(int a, int b, int c, int d) { return a * b + c * d; }
  LLVM gives you a single total-ordering of all operations in this code, which makes it appear like computing %5 has to happen before %6, but in reality it doesn't - they could be swapped.
  define dso_local i32 @foo(i32 noundef %0, i32 noundef %1, i32 noundef %2, i32 noundef %3) local_unnamed_addr #0 !dbg !7 { %5 = mul nsw i32 %1, %0, !dbg !18 %6 = mul nsw i32 %3, %2, !dbg !19 %7 = add nsw i32 %6, %5, !dbg !20 ret i32 %7, !dbg !21 }
  Java's IRs instead tell you that %5 and %6 need to be computed before %7, but don't apply an ordering between them otherwise. They can do this because they use a graph of instructions, not a linear list of instructions.
  https://chrisseaton.com/truffleruby/basic-graal-graphs/
  
  kaba0 3 years ago
  
  Thank you very much!
  
  origin_path 3 years ago
  
  It's highly ineffective to be close to the metal for most software. Progress comes through abstraction. Even video games is like that these days. Not many companies writing their own game engines anymore: they all ship with giant "runtimes" like Unreal.
steeleduncan 3 years ago

If the JVM is AOT compiling the bytecode back to native code there is a good chance that it ends up executing as object code that is more or less what a C compiler would have generated in the first place.

JavaOnlyGuy 3 years ago

How are pointers implemented in a language that doesn't support them?

suprjami 3 years ago

I don't think this is how it works.
The JVM is a specification which describes a pretend computer and its instruction set.
This TruffleC doesn't translate C to Java and run a Java program. This compiles C to bytecode which operates on the JVM.
Whatever Java does or doesn't support is irrelevant to this compiler. TruffleC has nothing to do with the Java programming language at all.
Just like you can compile C and get a memory address of a stack or heap location on any physical computer supported by a C compiler, likewise you can compile C with TruffleC and get a memory address within the stack or heap of the pretend computer called the JVM.
This must be how it works, unless the JVM itself has no concept of memory addresses, which seems very unlikely to me. Let me know if I am wrong?
- chrisseaton 3 years ago
  
  > This compiles C to bytecode which operates on the JVM.
  No, it compiles C to an AST, which it then interprets. The AST, which is also the interpreter in the Truffle design, are then partially evaluated to produce machine code. No bytecode is generated at any point, and in fact you can run it on a JVM that doesn't use byteocde, and then there is no bytecode anywhere.
  
  chaosite 3 years ago
  
  I learned most of what I know about Truffle and Graal from your blog posts, so you obviously know more about this than me. However, I was under the impression that Truffle is quite closely integrated into GraalVM, that is, you can't use Truffle on a different JVM. Is that not true?
  
  origin_path 3 years ago
  
  Not so. Truffle is just a Java library like any other. You can therefore run Truffle languages on any JVM. However, they will run slow as they are just interpreters, then. To get the speedups you need to use Graal, which recognizes Truffle as a library and treats it specially.
  
  chaosite 3 years ago
  
  Well, OK, sure, but Truffle without partial evaluation is just an interpreter written in a very particular way...
  I see what you mean though, thanks!
  
  chrisseaton 3 years ago
  
  > Well, OK, sure, but Truffle without partial evaluation is just an interpreter written in a very particular way..
  That's what it was to start with. Partial evaluation came later.
  
  rschatz 3 years ago
  
  Truffle and partial evaluation also works on native-image. You could say this is a VM where there are no bytecodes anymore.
  
  chaosite 3 years ago
  
  Oh, of course, but native-image is still a Graal feature, and I was asking about Truffle without Graal.
  
  entropicdrifter 3 years ago
  
  native-image was created as part of the Graal project but I think it's a separate JVM implementation from GraalVM
- dzaima 3 years ago
  
  the JVM bytecode does not have any memory address type. Just various width integers & floats, and references to managed heap objects. Arbitrary pointers would have to be done with 'long's one way or another.
  
  rschatz 3 years ago
  
  You can still use pointers. It's a bit hidden, but there are things like `Unsafe.allocateMemory`, `Unsafe.getByte` and so on ;)
  
  dzaima 3 years ago
  
  right; at which point the subset of jvm you're using is a subset of any other IR/VM, the 'j' in 'jvm' being only useful as an implementation/runtime.
  
  chaosite 3 years ago
  
  Sure, but don't discount all of the JIT optimizations that were implemented in the JVM and the huge number of engineer years invested in that particular implementation/runtime...
quietbritishjim 3 years ago
I guess a really brute force way would be to have a huge dictionary mapping from "memory address" (really just an arbitrary number) to JVM object. malloc() would add to the dictionary and free() would remove an entry. Pointer dereference would look up in it but would need to be able to find the nearest lower entry (for when you have an array and dereference an entry in it, or use a pointer to a field in a struct).
I would hope that there's a much more efficient way to do it, this idea is just evidence that it could be done in principle. But I don't see what that more efficient way would be. You certainly need to keep a secret reference to each JVM object somehow because C doesn't require you to keep any pointer to an object e.g.
```
    intptr_t x = (intptr_t)malloc(sizeof(int));
    *(int*)x = 99;
    bool did_subtract_50 = false;
    if (x > 50) {
        did_subtract_50 = true;
        x -= 50;
    }
    // Now there is no pointer or even integer that contains the address
    
    // ... later ...
    // Retrieve the address and use and free it
    int* y = (int*)(x + 50 * did_subtract_50);
    printf("value: %d\n", *y);
    free(y);
```
chrisseaton 3 years ago

A class wrapping a long value with the pointer address in it.
- sitkack 3 years ago
  
  Ha! Your paper is a "highly influential citation"
  https://www.semanticscholar.org/paper/TruffleC%3A-dynamic-ex...
  
  chrisseaton 3 years ago
  
  The side-bar says 'highly influential' but the badge lower down says 'highly influenced' which sounds like a bad thing doesn't it?
  
  forgotpwd16 3 years ago
  
  Probably meant as "[this paper has] highly influenced [citing paper]".
  
  sitkack 3 years ago
  
  Semantic Scholar is calling out when it thinks the researchers were using drugs.
- MaxBarraclough 3 years ago
  
  How is the C memory modelled? One big Java array, or are there multiple data-structures?
  For instance, what happens when you call a function-pointer?
  
  chrisseaton 3 years ago
  
  > How is the C memory modelled?
  Using a combination of native memory and JVM managed memory, depending on what the memory is needed for.
  > For instance, what happens when you call a function-pointer?
  This is a good example - because TruffleC can inline-cache a function-pointer, inlining the called function!
  All this is in the linked paper, of course.
  
  aardvark179 3 years ago
  
  It can be done in a few different ways. Native memory can be managed as plain native memory (under the hood you can use Unsafe to access that memory) but the real advantage is that pointers to many objects can be kept as managed pointers and not converted to a native value most of the time. For example Ruby C extensions often use VALUEs to refer to Ruby objects which are normally tagged pointers. In TruffleRuby we use ValueWrapper objects to represent these, and maintain a fast map between native values and these objects when necessary.
samus 3 years ago

Well-behaved usages of pointers according to the C standard can be implemented by whatever means fit best. Fat pointers with metadata about the destination and a huge block of memory for generic cases come to mind. The rest is undefined behavior where the runtime can just nuke the program, aka segfaulting.

notorandit 3 years ago

It's good only if you can end up in compiling JVM with it.

realitysballs 3 years ago

Way over my head here , but can someone perhaps explain the value and/or use-case of running c within a jvm?

aardvark179 3 years ago

So, many dynamic languages have a C API that allows you to write methods in C, and for those methods to do things to objects of the higher level language. This is normally a big optimisation boundary because you have to assume any value could have been altered during a C call. However if you run your C extensions through the same framework as you implement your hogh level language then you can remove that optimisation boundary entirely.
kaba0 3 years ago

I believe it’s more about truffle’s llvm interpreter (which is perhaps the successor of this project), but one motivation is that many scripting languages (python, ruby) use C (and fortran) libraries extensively through FFI.
Truffle can give these scripting languages a huge boost in performance (TrufflyRuby is 3x faster than the second fastest implementation), but the JVM “doesn’t like to” rely on FFI all that much - and also, truffle is polyglot with the ability to optimize between different languages. So by creating an LLVM interpreter, ruby or python calling into that can be also optimized by e.g. inlining, in certain cases bettering the performance compared to native FFI.
Other than becoming truly cross-platform, running on top of a singular runtime gives it the ability to observe these parts as well (which in itself a huge advantage because the JVM has some killer observability tools), so for example project loom might be applicable to a python script using C libs for IO, putting the whole on a virtual thread and making its blocking calls unblocking magically.
rvieira 3 years ago

I could be wrong, but can't Truffle languages freely interoperate within the JVM? (https://www.graalvm.org/22.0/reference-manual/polyglot-progr...)
Tiddles-the2nd 3 years ago

As an interpreter, it could be very powerful for development, prototyping, and developing POCs.
captaincaveman 3 years ago

Benefit of the JVM without the bloat of the Java language. I suspect particularly for resource constraint systems, RAM in particular. Not particularly convinced myself!
- pjmlp 3 years ago
  
  Graal is a JVM implemented in Java....
dan-robertson 3 years ago

The equivalent of link time optimisation with Java, perhaps?
EmilyHughes 3 years ago

Mostly migration of legacy programs probably, so companies can say they ported their stuff to java. It's stupid but companies do it.
- kgeist 3 years ago
  
  It's easier to gradually refactor a legacy program to a different language using the Strangler pattern when they run on the same runtime and there's easy and performant interoperation between old and new code. I wouldn't call it stupid.
  
  EmilyHughes 3 years ago
  
  yeah but in most cases it's better to just rewrite everything fresh. it's more of quick fix to me.
  
  kgeist 3 years ago
  
  We are actively in the process of rewriting a legacy system using the Strangler pattern, 50 devs and 2 years later it's still less than 50% because business wants new features as well, they can't afford waiting 1-2 years with no business value added in the meantime.
  
  pantulis 3 years ago
  
  Rewriting everything fresh is not always an option in terms of cost and opportunity.
- pawelmurias 3 years ago
  
  Why is just running say your Ruby existing program on Truffle with a big performance gain stupid? A lot of the languages use C extensions and running them on the JVM is faster then using the FFI to call them from the JVM.