brap 3 days ago

I know you probably don’t want an LLM in your decompiler, but assigning meaningful names could be a good task for an LLM.

  • Hackbraten 2 days ago

    There’s some prior art on this [0] [1], and it’s worked decently well for me on obfuscated JS.

    [0]: https://thejunkland.com/blog/using-llms-to-reverse-javascrip...

    [1]: https://github.com/jehna/humanify/blob/main/README.md#exampl...

    • viraptor 2 days ago

      It's also good as picking up patterns which are common enough, but may not be known to everyone. For example I couldn't tell that some function was doing CRC via a lookup table - but Claude knew.

      • BobbyTables2 2 days ago

        Often googling the first few entries of such tables will show CRC implementations. Same for SHA hash constants…

        • viraptor 2 days ago

          The tables depend on the CRC parameters and in my case there would be no Google hits - a unique setup was used.

  • cogman10 2 days ago

    That'd make sense if the jar is obfuscated. Java preserves method and class names by default.

  • p0w3n3d 2 days ago

    One day I was using ghidra to decompile something to find out how it works, and the LLM helped a lot. It was a game changer in refactoring of the decompiled assembly-that-looked-like-c language.

asplake 3 days ago

> Fernflower is the first actually working analytical decompiler for Java and probably for a high-level programming language in general.

That really deserves a link. What is an “analytical” decompiler?

  • lbalazscs 2 days ago

    The link about Stiver has some details:

    > Stiver decided to write his own decompiler as a side project. To overcome the weaknesses of existing alternatives, he took a different approach. After reading the bytecode, he constructed a control-flow graph in static single-assignment form, which is much better to express the program semantics abstracting the particular shape of bytecode. At the beginning of this project, Stiver knew little about static analysis and compiler design and had to learn a lot, but the effort was worth it. The resulting decompiler produced much better results than anything available at that time. It could even decompile the bytecode produced by some obfuscators without any explicit support.

    https://blog.jetbrains.com/idea/2024/11/in-memory-of-stiver/

  • jakewins 2 days ago

    Someone apparently had the exact same question in 2020: https://stackoverflow.com/questions/62298929/what-is-an-anal...

    Answer is pretty vague though, but sounds like it’s about not trying to “reverse” what the compiler did, but rather try and “analytically” work put what source code would likely have yielded the byte code it’s looking at?

    • rhdunn 2 days ago

      Yes, that's what it is doing.

      If you have a block of code a compiler will compile a language expression or statement into a particular set of assembly/bytecode instructions. For example converting `a + b` to `ADD a b`.

      A reversing decompiler will look at the `ADD a b` and produce `a + b` as the output. This is the simplest approach as it is effectively just a collection of these types of mapping. While this works, it can be harder to read and noisier than the actual code. This is because:

      1. it does not handle annotations like @NotNull correctly -- these are shown as `if (arg == null) throw ...` instead of the annotation because the if/throw is what the compiler generated for that annotation;

      2. it doesn't make complex expressions readable;

      3. it doesn't detect optimizations like unrolling loops, reordering expressions, etc.

      For (1) an analytical decompiler can recognize the `if (arg == null) throw` expression at the start of the function and map that to a @NotNull annotation.

      Likewise, it could detect other optimizations like loop unrolling and produce better code for that.

      • vbezhenar 2 days ago

        I'm not sure that @NotNull example is appropriate. Java compiler does not add any checks for @NotNull annotations. Those annotations exist for IDE and linting tools, compiler doesn't care. May be there are Java-like languages like Lombok or non-standard compilers which do add those checks, but I think that Java decompiler shouldn't do assumptions of these additional tools.

        • rhdunn a day ago

          I was trying to think of examples.

          A better example for Java would be something like lambda expressions on functional interfaces. There, the compiler is creating an anonymous object that implements the interface. A reversable decompiler will just see the anonymous class instance whereas an analytical decompiler can detect that it is likely a lambda expression due to it being an anonymous class object implementing a single method interface and is being passed to a function argument that takes that interface as a parameter.

          In C# yield is implemented as a state machine, so an analytical decompiler could recognise that construct.

          And yes, for JVM decompilers it could have language heuristics to detect (or be specifically for) Lombok, Scala, Groovy, Kotlin, etc.

          [1] https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaex...

        • rhdunn 2 days ago

          https://www.jetbrains.com/help/idea/annotating-source-code.h...

          > When you compile your project with IntelliJ IDEA build tool, the IDE adds assertions to all code elements annotated with @NotNull. These assertions will throw an error if the elements happen to be null at runtime.

          • vbezhenar 2 days ago

            That's not java compiler. That's intellij compiler. I'd say that's very weird anti-feature, because your build in IDE and maven will work differently.

            • wokkel 2 days ago

              When using Lombok it will use a compiler plugin for this so maven builds have @nonnull generated as if-statements. I dont know if intellij uses their own plugin but they do support Lombok in maven projects, so maybe thats where this is coming from. Afaik intellij has no built in compiler but relies on java .

              • Bjartr 2 days ago

                Lombok hijacks the compiler to do it's own thing, and violates the contract Java compiler plugins are supposed to follow.

                See this comment by an OpenJDK tech lead: https://news.ycombinator.com/item?id=37666793

                • lisbbb 2 days ago

                  I was initially impressed with Lombok and then ran into all the downsides of it and it was institutionally abandoned at one particular firm I was with (100s of devs).

  • krackers 2 days ago

    As far as I can tell (although I"m a novice at RE), in the native world all non-trivial decompilers are "analytical", doing things like control-flow recovery and such. I guess the only reason why the first java decompiler was "non-analytical" is that the bytecode (at least in early days) was simple enough that you could basically pattern-match it back to instructions.

    So if I'd have to give a definition I pulled out of my ass:

    * non-analytical compiler: "local", works only at the instruction or basic-block level, probably done by just pattern matching templates

    * analytical: anything that does non-local transformations, working across basic-blocks to recover logic and control flow

userbinator 3 days ago

The correct name is Fernflower, not FernFlower.

I found this amusing, from a Java perspective. The 3-character command-line options are also very "not Java-ish". However, since this one is also written in Java, a good test is if it can decompile itself perfectly and the result recompiled to a matching binary; much like how bootstrapping a compiler involves compiling itself and checking for the existence of the fixed-point.

mudkipdev 3 days ago
hunterpayne 2 days ago

I'm using this decompiler in my project right now. Its the best of the bunch and Jetbrains actively maintains it with good support.

p0w3n3d 2 days ago

Is it only me or fernflower does not put the code in the correct lines, and the debugging fails to navigate over the code in the IntelliJ IDEA?

  • bartekpacia 2 days ago

    This sounds like a bug – I'd appreciate it if you could share an example of such behavior.

    [I work at JetBrains]

  • gf000 2 days ago

    I mean, in the general case is it not impossible to "put the code in the correct lines"?

    Maybe I'm just misunderstanding you, but even if the bytecode sequence is reconstructed as the original code that produced it, stuff like whitespace and comments are simply lost with no ways to recover.

    (Also, local variable names, certain annotations depending on their retention level, etc)

nunobrito 2 days ago

Can the decompiled result be compiled again?

  • jeroenhd 2 days ago

    It's not a perfect decompiler, some obfuscated code gets decompiled into commented-out bytecode.

    However, most of the time it'll output perfectly valid Java code that'll compile if you just create the necessary maven/ant/gradle build configuration to get all of the sources loaded correctly.

  • dunham 2 days ago

    I've actually had this fix a bug before. An O(n^2) issue adding a character at time to a string inside a loop.

    I had decompiled the class, fixed the issue, checked in the original decompiled source and then the change. Then a coworker pointed out that the original decompiled source also fixed the issue.

    After a bit of digging, I learned that hotspot compiler had code to detect and fix the issue, but it was looking for the pattern generated by a modern compiler, and the library was compiled with an older compiler.

    (It's been a while, but I think it was the JAI library, and the issue was triggered by long comments in a PNG.)

BinaryIgor 2 days ago

...written in Java! Recursion going strong :)