Show HN: Panoptisch – A recursive dependency scanner for Python projects

github.com

45 points by r9295 a year ago

Hello all,

Very excited to share this project with you all!

Panoptisch scans your Python file or module to find it's imports (aka dependencies) and recursively does so for all dependencies and sub-dependencies. It then generates a dependency tree in JSON for you to parse and enforce import policies.

Supply chain attacks are no joke, and this is one way to transparently analyze your dependencies to see if any malicious imports are taking place. For example, your yaml parser, nor it's sub-dependencies should import socket, or sys.

Panoptisch is in early stages, with known limitations (for now). I welcome feedback, testing and contributions.

Also, happy to answer any questions!

woodruffw a year ago

First of all: I'm glad that more people are trying to tackle this problem!

That being said, I'm not sure if I would encourage this approach: this conflates modules (a property of the Python language) with dependencies (a thing that maps roughly to packages/distributions, which are a property of Python packaging). The two actually aren't that connected: there's no guaranteed 1-1 (or even 1-N) mapping between a dependency's package name and its importable modules, meaning that knowledge of a malicious package doesn't imply that you can derive how that package's module(s) get imported at runtime.

More perniciously: module names aren't static. It's pretty easy to construct a dynamic module object, or to rename (or alias) an existing module object to avoid this kind of detection.

Finally: walking a project's import tree isn't safe in the general case! Lots of packages have side effects when imported, and malicious dependencies definitely take advantage of that ability. Running this tool might find a malicious import by virtue of actually running malicious code, which isn't ideal.

If your goal is to detect malicious API patterns at runtime (which is effectively what you're doing when you walk the package import tree), I think runtime audit hooks[1] are probably a better fit. Those also aren't foolproof either, but they'll probably be more reliable (and don't require as much context awareness to determine maliciousness).

[1]: https://peps.python.org/pep-0578/

  • maweki a year ago

    > It's pretty easy [...] to avoid this kind of detection.

    And by rice's theorem, it is generally undecidable whether there are hidden modules loaded.

    Static (or in this case maybe not so static) analysis of arbitrary code will never lead to 100% safety. You'll always need some static restrictions on what code you're even allowed to write.

    • janalsncm a year ago

      Coming at this from a naive perspective, do we care if a module is provably, definitely imported? I think for supply chain attacks it should be sufficient to say this module might be imported and deserves attention. I assume Rice’s theorem would also say that even though malicious code is imported it might not run the malicious bits today or on this machine [1]. But that doesn’t mean I’m ok with having it there.

      [1] https://en.m.wikipedia.org/wiki/Stuxnet

      • maweki a year ago

        It's undecidable whether there even is an import of stuff. It's every non-trivial property that's undecidable.

        So the question whether a certain line is reached is as undecidable as the question what values can arguments have at that line.

        So it's undecidable whether a dynamic import statement is reached and what its import argument will be. And even if the value is static, then it's undecidable whether the content of the imported module has been just changed.

gegtik a year ago

I've been using pip-compile from https://github.com/jazzband/pip-tools for this use case; a standard project Makefile defines "make update" which pip-compiles the current requirements, and "make install" installs the frozen requirements list.

This way I can install the same bill of materials every time

  • r9295 a year ago

    I think we have different motivations. pip-compile can only fetch and install dependencies which have been declared.

    For example, let's say I have a malicious yaml parser package. It should not need requests as a dependency. The odds are that a project may have requests already installed as a sub-dependency of another dependency. I can then just try and import requests in a try catch block and if available, and fetch malicious artefacts, for example. Panoptisch would report this.

    Also, the usage of operating system or builtin modules such as socket, sys or importlib is not something which is analyzed by pip-compile.