points by saurik 12 years ago

The parent article is targeted not just at people writing shell scripts, but people writing libraries that are implementing cryptographic processes, generating keys, etc.: the article even goes so far as to state that people generating long-term keys should also use /dev/urandom over /dev/random.

Now, as a developer, you write a library or daemon, or some other piece of userland code. You do not know on what operating systems or at what point in the bootup process it will be used. You know it works fine if and only if /dev/urandom has been seeded, and so use /dev/urandom.

However, apparently, people install your daemon at boot and generate the keys at boot, on VMs that are created often from scratch. There's nothing inherently wrong with these computers: they have entropy sources, they just don't have any entropy at boot, the referenced "boot-time entropy hole".

Your code now "doesn't work". It isn't that it crashes, it doesn't block: in fact there's no way at all to tell that anything wrong happened, except that some point a long time later someone finds out that every encryption key you generated from it is from a limited, guessable, "weak" subset.

The argument here seems to be that this is the fault of the person who built the computer. Honestly, I somewhat agree. However, let's imagine the real-world conversation that would happen if the developer of this library/daemon actually stated this limitation up-front in their README somewhere.

README: "Note: this program only works correctly if the entropy pool is adequately seeded before running it."

User: "Wow, ok; can you tell me if Ubuntu is up to the challenge?"

Developer: "No. It isn't that simple: the system doesn't block boot for the entropy pool to fill, so whether your system is setup correctly is up to whether entropy fills fast enough; your chosen distribution is one aspect of this, but we will also need to look at what kinds of hardware you have attached and external factors such as whether the network you are using gets much traffic."

User: "Seriously? Is there some simple test I can run on my computer that would let me determine if my system is broken?"

Developer: "No. These are complex factors that are difficult to verify statically. Your attempts to analyze the data yourself will also almost certainly come long after the entropy pool has already received the critical mass required for /dev/urandom to become secure. Your attempting to run the test would also affect a lot of the entropy sources directly, including network and input."

User: "But it is possible if I spent more time on it?"

Developer: "Well, sure: there was a paper published where some researchers instrumented the kernel attempting to analyze these issues. They were able to determine that OpenSSH on RedHat was generating its host keys only just barely after the entropy cutoff on standard desktop computers. You could make similar patches to your kernel and attempt to see whether your usage seems safe."

User: "That sounds ludicrous... was there really no way for you to code around this?"

Developer: "Oh, sure, we could use /dev/random; but then sometimes the program would block under heavy load when it really didn't need to; rather than just changing to /dev/urandom, some of our users were responding by hardcoding the randomness of our program entirely, defeating all of the security in our system. To protect these people we decided to switch to /dev/urandom ourselves."

User: "Couldn't you have made that an option, choosing the most generally secure option by default, but providing /dev/urandom as a sane alternative 'when using under heavy load', thereby guiding people who have this issue to not do something insane? Maybe you could read /dev/random once when your program starts, to get the blocking behavior, and then switch to /dev/urandom for later repeated use?"

Developer: "No. This is apparently a black and white issue, and we have to decide emphatically whether /dev/random should always be used or should never be used. The idea that this might be a complex issue that requires nuance in thought and might have to be decided on a case by case basis, or for which there might be some solution that involves working around the issues with one by using the other momentarily, should not occur to us."

[edit:] (So, it sounds like maybe the correct solution at this point is that the user seeds /dev/urandom themselves using /dev/random. Of course, this solution only works if the user is smart enough to do this and has access to do it, neither of which are likely true on many Android devices. The developer could do this for the user, but that's back into the "nuance" discussion. Instead of being so emphatic that "/dev/random should never be used" people should be advocating developers use actual solutions, potentially "workarounds", to the problems that come up in real-world deployment :/.)

ScottBurson 12 years ago

I think this is the best comment I've seen in this thread yet.

It can be a tough call. In a previous job I worked on a product that used /dev/random. We never had a problem with it until one customer installed our product on a newly created VM, and it hung during startup.

We switched to urandom, but we definitely had misgivings about it, as this was exactly the case where it could matter. It's also a situation where reading from /dev/random only the first time wouldn't fix the perceived bug in our product.

Perhaps it would be useful to have a userland process monitor /dev/random and, in a case like this where there has never been sufficient entropy in the pool, send a notification to the console (though there's no guarantee it will be noticed). On Android it could pop up a window.

  • saurik 12 years ago

    > It's also a situation where reading from /dev/random only the first time wouldn't fix the perceived bug in our product.

    If this was the case, that the block was happening on the very first usage of random data, then you actually managed to build a system that wasn't secure: that first time /dev/random was blocking, when you switched to /dev/urandom, on a newly-created VM, meant that the customer was getting deterministic data from /dev/urandom, and no one ever realized because there is no error condition for this. (Maybe the system clock was accurate on these new VMs and not set later by ntpd; if so, there might be hope that at least there were a few bits of entropy involved ;P.)

    • ScottBurson 12 years ago

      Yes -- that's what I meant by "this was exactly the case where it could matter".

      • saurik 12 years ago

        Sorry, I had presumed you meant simply "we were doing key generation" not "we realized that this directly led to not actually getting random data", especially given the "misgivings", which seemed like a light statement.

        • ScottBurson 12 years ago

          Sorry I was unclear.

          I think we only had "misgivings" because the key we were generating had only relatively minor use in the product. Alas, I don't recall exactly what that was, and it's certainly possible, in retrospect, that we didn't care as much about the implications of using /dev/urandom as we should have. But that was pre-Snowden.

acqq 12 years ago

In your long answer you jump from being a userland code writer to being responsible for "is Ubuntu up to the challenge."

My question was about those that write user programs, not those that configure Ubuntu to boot right the first time. I claim that I as the writer of user code am not supposed to have the responsibility of the initial seeding, unless I'm writing the interactive code, where I can show you a nice looking UI which says "now type on the keyboard to provide more entropy" if I'm waiting for background thread to stop blocking on /dev/random.

The initial entropy of the routers etc which don't have keyboard are something that must be taken care of by the designers of the system, not by an application or library writer, especially if the later doesn't have access to any hardware/external entropy source.

  • saurik 12 years ago

    My alternative claim is that as the developer of userland code you should code defensively against mistakes that might be made by the people using your program, and should thereby work around this issue yourself, because the user of your code is likely not going to understand that it is a problem (especially given that it fails silently), and I'm backing this up with real-world evidence of large numbers of systems that are affected by this issue. If the developer of the software had realized "ok, Linux has this issue, and in practice no one knows that this is a problem", it becomes under their responsibility umbrella to fix the issue or educate their users. I believe that this becomes even more complex when you are a library developer building a component that is used in larger systems, where the developers using your library are between you and the final users.

    Put differently: why are we encouraging people to "run with scissors" and allow this mistake to even be possible? Either demand that Linux itself fixes the boot-time entropy hole, encourage developers to seed the entropy pool before relying on /dev/urandom, or encourage developers to only use /dev/random; but it seems "actively harmful" to encourage developers to only use /dev/urandom and then just pray that the system integrator or user understood entropy enough to make the system have a working /dev/urandom by the time it is needed. If I were a pen-tester this would be one of the first things I check for on every setup, as it is a really esoteric issue that almost no one understands well and that apparently developers of even popular software like OpenSSL/Dropbear completely ignores in the name of not accidentally blocking under load.

    • acqq 12 years ago

      Your assumption that the "hole" should be magically filled by every library writer like OpenSSL is simply wrong. If the operating system doesn't know how to provide enough initial boot entropy, OpenSSL can know that even less. Remember, OpenSSL can be initialized by practically unlimited number of applications practically unlimited number of times after one boot. Do you claim that all these invocations should actively wait on /dev/random? Even if OpenSSL manages to keep the global state, doing this only once after the boot (how can it know that?), then every other library would need its own global state and its own new /dev/random call for not knowing that other libraries already did this. See how it becomes always less reasonable?

      Once again, kernel can't manage to do it only once after the boot, in general. Library/server application writers could not be expected to do more or better. UI applications can at least collect mouse movements, if they actually know they have an access to a mouse! Enough initial entropy is a hardware problem not the question of "simply calling the right calls, and 'random' is righter."

      • saurik 12 years ago

        I seriously doubt all these instances of OpenSSL are generating keys; if they are, you are doing something else totally inane, and there is a better fix. The only situation where the blocking argument is a reasonable argument at all is something like a webserver: where you are generating keys over and over and over again (what I describe as "under load"), and in that case seeding it once per webserver start seems like a no-brainer. I don't understand why you are seriously advocating running with scissors when there is a sensible and simple alternative. If every single process that generated a key initialized /dev/urandom once, your desktop system or server is unlikely to flinch, and given that your router is probably only running three such programs (and one of them probably only happens the first time you turn it on), you are unlikely to notice either (especially given that the thing probably takes a while to boot anyway for unrelated reasons).

        • acqq 12 years ago

          > I seriously doubt all these instances of OpenSSL are generating keys

          If you seriously doubt that an encryption library like OpenSSL has to always generate keys then this discussion is between the two people who shouldn't have even started exchanging opinions.

          Ditto for your claim that blocking for entropy is something that doesn't happen or is trivial. It's not, the kernel writers know.

          If you still believe you're right, try to explain to the kernel writers how they can just initialize the urandom pool just once after the boot, and you solved the problem for everybody at once! You may even enter some hall of fame.

          • saurik 12 years ago

            > If you seriously doubt that an encryption library like OpenSSL has to always generate keys then this discussion is between two people who shouldn't have even started exchanging opinions.

            I use OpenSSL in a ton of programs for the purposes of computing SHA1 hashes. So yeah: I doubt most programs linked against OpenSSL are generating keys. Only processes that generate a key matter for the purpose of seeding entropy.

            What, exactly, do you think your computer is generating so many new keys for during a routine boot process? Can you tell me what these new keys are for?

            > Ditto for your claim that blocking for entropy is something that doesn't happen or is trivial. It's not, the kernel writers know.

            Clearly it does happen: the question is how much does it happen; a desktop system generates enough entropy that it is easily capable of keeping up with generating enough random data from /dev/random to seed /dev/urandom every now and then because a new SSL-enabled daemon spawned.

            > If you still believe you're right, try to explain to the kernel writers how they can just initialize the urandom pool just once after the boot, and you solved the problem for everybody at once! You may even enter some hall of fame.

            1) Apparently FreeBSD does this. Can you tell me how FreeBSD apparently does the impossible? 2) The kernel can clearly store a boolean; I mean, the kernel does all kinds of things only once... are you seriously thinking that the reason the kernel hasn't implemented this is because they can't?

            • acqq 12 years ago

              Using OpenSSL for SHA1 is shooting the fly with the canon. When using OpenSSL for, you know, encrypted communication there's actual need for the new keys all the time. Most of the keys aren't the permanent ones you save yourself or write on the paper.

              > a desktop system

              How is OpenSSL supposed to know if it's used on a desktop system?

              > Can you tell me how FreeBSD apparently does the impossible?

              I guess by pretending that problem doesn't exist: e.g. postulating a desktop system, like you do. But do ask them and pass the advice to the Linux kernel writers! They just waited for you to get that idea to ask the FreeBSD guys (and I'd be grateful if you post the results of that here, please, we all want to learn).

              > are you seriously thinking that the reason the kernel hasn't implemented this is because they can't?

              Yes, in general, they can't! Entropy on most of the systems still doesn't grow on the CPU trees. But do please prove me wrong.

    • ScottBurson 12 years ago

      I think we should go with option (1): demand that Linux itself fix the boot-time entropy hole.

      And it has to be done at the distro level. Distros have to be persuaded to make random seeding a standard part of their installation process. They should also give the user a command to run in newly cloned VMs.

      The new x86 RDRAND instruction would be very useful here, of course, if we ever decide we can trust it. One could argue that calling RDRAND at boot would still be better than having no entropy at all; the NSA may have a backdoor, but probably no one else does.

      • acqq 12 years ago

        Linux kernel can't "fix" it without the hardware support. You're right, just accepting RDRAND as the entropy source magically makes /dev/random nonblocking on the CPUs that have it, as it is quite fast. I'd personally use it as one of the valid entropy sources. The last time I've read about its use in Linux kernel, if I understood, it was not considered as such, being xored only after the entropy estimation of other sources with much much less bandwidth.

        So even if once Intel gets to be "less evil in the eyes of worried people" for a "sin" of providing RDRAND although the same people otherwise trust the company to let its CPU's execute all their programs, what should the poor kernel writers do on the systems without RDRAND?

        But you are completely right, having the VM's and distributions agreeing on protocols for providing initial randomness to the VM is certainly a step in the right direction and the proper solution to one real problem.

        Whereas promoting belief in regular user-space use of the /dev/random as the magic solution is, in my humble opinion, something like a cargo cult. I'm not commenting your statements here, just attempting to make my position in this whole discussion clear.

        • ScottBurson 12 years ago

          > So even if once Intel gets to be "less evil in the eyes of worried people" for a "sin" of providing RDRAND although the same people otherwise trust the company to let its CPU's execute all their programs

          This isn't fair. Most instructions compute deterministic functions of their operands. It's easy for people to tell if the CPU is computing the correct result. (Remember the Pentium FDIV bug?) RDRAND, by definition and design, is nondeterministic. Only by examining a long sequence of its results could one gain any confidence that it isn't backdoored ... and then one might still be overlooking something.

          • agwa 12 years ago

            > Only by examining a long sequence of its results could one gain any confidence that it isn't backdoored

            Actually, you can't. A properly backdoored RNG is indistinguishable from a non-backdoored RNG. Imagine a backdoored RNG that is simply a counter encrypted with AES. The output looks completely random, but it can be completely broken by the holder of the AES key.

            • ScottBurson 12 years ago

              I see. That is certainly a problem as far as our ability to ever come to trust RDRAND.

              But it supports the point I was making: trusting RDRAND is a very different thing from trusting the rest of the instruction set.

dlitz 12 years ago

What should actually happen is that /dev/random and /dev/urandom should provide the exact same interface: block at boot-time until they're seeded with "sufficient" entropy, and then never block again until the system is rebooted. Periodically, some entropy should be saved to disk, to be used to re-seed during the next boot.

Also, both should be a lot faster. With some buffering you should be able to use them to replace libc's rand() with little to no performance penalty.

Every application developer who wants "random" numbers should simply get cryptographically-strong random numbers, period. We should not expect them to choose, since it's just another opportunity to make a mistake.

  • acqq 12 years ago

    The present contract is "/dev/random" delivers the new "entropy" (which can also not exist on virtual machines, routers etc, and for which the caller can wait even indefinitely!) and the "/dev/urandom" delivers the cryptographically good pseudorandom stream. The title text simply points that programmers of normal applications and libraries typically never need the former. I understand your suggestion as changing the contract to satisfy those who falsely think they actually need "/dev/random" even if they don't.

    I however prefer having both and expecting from the programmers to actually understand what they need. I see it's not easy, in my discussions here you can see that people still believe that "random" is "righter" than urandom for their actual urandom needs and that they have to "do something and that something is waiting for the entropy" even when only the system designer or owner should be responsible for the proper urandom seeding. It's hard.

    I suggest changing the man file of these two streams to explain the basic idea better. Then those wanting to "do something" would at least lose that support. I know it's against the tradition of the man pages to be clear though.

    • Tomte 12 years ago

      No. /dev/random does not deliver "new entropy", it delivers the very same cryptographically good pseudorandom stream.

      Look at the sketches in the article.

      • acqq 12 years ago

        > /dev/random does not deliver "new entropy"

        Sorry, you're wrong. /dev/random in Linux is made to deliver the new entropy to you as the user, even if these "entropy" bits do pass through the block you marked as CSPRNG. That doesn't contradict with my claim. The fact they pass through the CSPRNG block doesn't change the fact that the system makes its own best attempt to collect the actual entropy and pass it to you as the user. That's also the reson why it blocks until it has collected as much new bits as you requested -- it does its best to not lie to you. These bits are not the "continuous CSPRNG stream" they are just the nicely polished entropy bits.

        Think about this this way: some system stream can be made that would deliver the unprocessed "source" entropy bits. But such bits would have to be additionally reprocessed (in effect, passed through something like CSPRNG or whatever, there are enough papers written about that) to be useful in any sensible way. Even very random natural processes (e.g. radioactive decay) can't be used in their pure form for cryptographic purposes. So the kernel does that additional processing for you. That's all. They are still entropy bits, only additionally shuffled with the "encryption" code to actually improve their characteristics for the cryptographic purposes.

        Now you may claim that the claimed entropy doesn't exist in the case that estimator estimates wrong, but it's still the best attempt there is. Accepting imperfections of the real world, /dev/random in Linux is there to give you the quality entropy bits, based not only on some known states, but the very real "entropy" bits not discussing here how these can be collected, estimated and their characteristics improved.

        The meaning of "entropy" here is loosely "the non-calculable randomness which can be observed by the functioning and the interaction of the system, not the direct calculable product of the system itself." The system running in the VM and without special inputs has much less chance to observe much of such randomness.

        I am aware that, according to the claims I've read here, by not using blocking, entropy estimations or whatever, FreeBSD is not delivering the entropy once it stops blocking. But it's just the FreeBSD's difference in the contract.

        In Linux, it is actually the other way around to your claim: not even /dev/random on Linux actually strives to return the new entropy, the /dev/urandom won't just return CSPRNG stream but will occasionally get the added benefit of the real random bits collected via the entropy collectors. It's just guaranteed that it won't block while again, doing its best.

        The source of kernel's random.c already linked in this discussion is actually very nice and clean:

        https://github.com/torvalds/linux/blob/master/drivers/char/r...

        Do compare random_read vs. urandom_read.

    • dlitz 12 years ago

      > I understand your suggestion as changing the contract to satisfy those who falsely think they actually need "/dev/random" even if they don't.

      Not exactly. /dev/urandom also sucks, because it's slow, and it never blocks, even when the system knows that it can't deliver cryptographically-strong entropy.

      There should only be one interface, and it should deliver cryptographically-strong random bytes at high speed and with high reliability. There is currently no character device on Linux that does this.

      > I however prefer having both and expecting from the programmers to actually understand what they need.

      I'd prefer it too, but it's unrealistic. Keeping up with the state-of-the-art in crypto is a job for specialists, and even halfway-decent crypto implementers make RNG mistakes on a regular basis.