30

Obviously, if you need a random number for cryptography, your code should use an api that gets it from hardware. However, not all hardware has a SRNG available. If you are working on a security critical application, and hardware RNG is not available then the appropriate course of action is to abort execution.

BUT, not every application would be deemed security critical. That is to say, while you want security, you are not worried about overly complex attacks because, 1. your application is not worth the effort of being cracked by sophisticated hackers 2. none of your data is considered strictly confidential (you don't want it breached, but it would not be the end of the world).

Assuming a none security critical application, running on a machine without hardware RNG or ECC Memory, would it be viable to allocate a very large amount of memory (perhaps in a long loop) and use the errors that eventually occur as a source of randomness? Would this be more secure than PRNG?

TheCatWhisperer
  • 469
  • 4
  • 9
  • 52
    My assumption would be that if you have memory that would have errors often enough to make it a usable source of entropy, well, you probably have problems running any program reliably...If you're talking about deliberately shutting off the dynamic memory refresh, and waiting a while for the memory capacitors to partially discharge, well, that has been proposed before; it wasn't well received (IIRC)... – poncho Nov 07 '17 at 20:22
  • 1
    @poncho well, you would have to do something like, for example, allocate 2GB of memory and checking it for errors 100000 times, or something along those lines. – TheCatWhisperer Nov 07 '17 at 20:25
  • 7
    What is your definition of "not security critical?" /dev/urandom is good enough for 99.9999% of not-security-critical numbers by my definition of that term. – Cort Ammon Nov 08 '17 at 15:58
  • 1
    the problem with defining an application as non-security-critical is you can't predict how people will use it and how much trust they will place on it. that's part of why HTTPS is being pushed universally even though some devs would argue their app doesn't need it. i do think it's a worthwhile question whether you can derive good quality entropy from memory errors for secuity-critical applications, but i don't like the idea of saying something as fundamental as an RNG is "good enough" for some security applications and not others. – user371366 Nov 08 '17 at 17:50
  • 3
    @TheCatWhisperer, I ran a RAM test on a newly-built computer last week. A 24-hour memtest run did the equivalent of your checking 2GB for errors 100,000 times, and found nothing. – Mark Nov 08 '17 at 22:15
  • would branch prediction faillures be an okay source? https://stackoverflow.com/questions/11437523/can-i-measure-branch-prediction-failures-on-a-modern-intel-core-cpu – Tschallacka Nov 09 '17 at 10:06
  • 1
    @Tschallacka I imagine that would be a very poor source of entropy. Branch prediction failures are reliant on the programs running on the computer.... most programs are deterministic. There is a difference between opaqueness and entropy. – TheCatWhisperer Nov 09 '17 at 14:43
  • @TheCatWhisperer Is there a reason you couldn't just use the low bits of the cpu temperature? as far as I know it's a pretty common technique... – Aaron Nov 09 '17 at 14:51
  • @Aaron, this was asked out of curiosity. Perhaps that would be a good source of entropy. – TheCatWhisperer Nov 09 '17 at 15:27
  • @TheCatWhisperer ACPI is a fairly common and standard interface that would be able to give temperature readings as a source of entropy short of user interaction, although as others have mentioned, if it's a server you'll have user interaction timing associated with network requests. – Aaron Nov 09 '17 at 19:39
  • @poncho Probably wasn't well received because DRAM degrades in seemingly random but quite predictable ways. – forest Dec 31 '17 at 06:02

8 Answers8

50

(..) would it be viable to allocate a very large amount of memory (perhaps in a long loop) and use the errors that eventually occur as a source of randomness?

No. Practical use of memory errors as a source of randomness would require manipulation of DRAM timing parameters, as envisioned in this answer and it's reference.

The rate of memory error in a correctly working computer at ground level is very low; anything larger than one error detected per week (as reported by machine with ECC logging) is reason for maintenance [#]. Hence the proposed method requires a terminally impractical delay to get any entropy, or an unreliable computer; and for one well built, that mostly happens at altitude (especially in the South Atlantic Anomaly). Plus, if the machine has ECC, errors are hard to detect portably (reporting of ECC errors tends to be a proprietary mess); and if it is not, and there are sizably many errors, it is to fear one will eventually prevent the program from running as intended.

More practical sources of entropy include

  • Audio or video input; more generally output of an ADC.
  • Time of events measured to high accuracy relative to CPU clock (e.g. by way of a performance counter as obtained by RDTSC instruction or API to that). Sources include:
    • key presses (value gives extra entropy)
    • mouse/pointing device movement (position gives extra entropy)
    • arrival of network packets
    • availability of data from a spinning hard disk
    • change of second in a real-time clock IC with independent crystal (low entropy rate, but quite reliably entropic)

Extracting the entropy is relatively easy: essentially, hash anything entropic you can get. The one extremely difficult thing is asserting a reliable lower bound of how much entropy was really gathered. Be extremely conservative, and don't assume that something that holds, will. Things change unexpectedly, especially when there's an active adversary, as we assume in crypto.


[#] Reliability studies on DRAM give widely varying estimates. However we are talking like 100 FIT par DRAM IC within a factor of 100, where Failure In Time is the expected number of failures per 109 device⋅hour. Taking the upper extreme 104 FIT, a computer with 4 DIMMs each with 18 DRAM ICs (enough for 32 GiB with ECC using DDR4 mainstream by late 2017), we get 1.2 expected failures per week.

My limited field experience with servers is that when there are ECC alarms more often than once per week, there's a cause, most often a mismatch in memory specs or settings (for new machines only) or a particular bit in one particular IC of a particular DIMM that's marginal (or worn out, for machines that have been humming quietly for some time). The error log's main virtue is to help identifying such condition, and what DIMM to replace rather than scrapping the whole lot.

There are techniques to purposely increase the rate of occurrence of DRAM errors; including lowering the refresh rate much below manufacturer specs, and/or using a RowHammer read/write pattern. However that's extremely dependent on DRAM model. Doing this reliably, portably and within an OS is very difficult; there's a reason MemTest wants to own as much of the machine as feasible.

fgrieu
  • 140,762
  • 12
  • 307
  • 587
  • I second the idea of using an audio input, when it's disconnected from anything. A disconnected audio input will pick up a lot of noise at a fairly low level - switch the input to "microphone" rather than "line-in" mode to activate the built-in preamp and you'll have a useful source of electrical noise that's probably about as random as any HRNG based on the principal of electrical noise. Be aware though that an attacker can of course record the same audio input as you and thus have access to the same random data. – micheal65536 Nov 08 '17 at 07:50
  • @MichealJohnson Could an attacker actually do that? If you use live-recorded audio (not a pre-recorded one), then how could anybody manage to record the exact same signal as the mic of your machine? Even an almost-identical mic would be in a different position and would record levels differently, from different distances to each source. Also, minute hardware differences come into play - even 1% tollerance could make any difference. – Simone Nov 08 '17 at 08:31
  • 3
    @Simone No you're misunderstanding. I'm not suggesting to use a microphone to record the sound, I'm suggesting to use a disconnected microphone input as a source of electrical noise (a disconnected input will always have noise and this noise is amplified by the internal pre-amp). An attacker who has malicious software running on the same computer as the target application could record the same audio input as the target application, and thus have access to the same noise source. – micheal65536 Nov 08 '17 at 08:35
  • @MichealJohnson True; however, they would need to have access to the target computer. In that case, the system is already compromised, and it's really hard to figure out a secure source of entropy (or information) in that scenario... – Simone Nov 08 '17 at 08:39
  • 3
    @Simone I imagine that some hardware random number generators will isolate access between different applications. For that matter, a hardware random number generator's value is always changing, so two applications can never query and get the exact same value because they're querying at different points in time. An audio input is always recorded into a buffer (however small) and the same data may quite easily be captured at one point in time and then placed into multiple buffers if multiple applications are recording at the same time. – micheal65536 Nov 08 '17 at 08:49
  • Do servers generally have audio inputs? – TheCatWhisperer Nov 08 '17 at 13:56
  • 4
    @TheCatWhisperer: uh, the servers I use most do not have audio input. They do have Intel's RDRAND. – fgrieu Nov 08 '17 at 14:11
  • 2
    @Simone When using RNGs, it's typically not obvious how those numbers will be used. In fact, that's a major design factor for RNGs. You don't know what aspects of the noise will be important in the end use. Thus, the goal is to minimize the likelyhood of anyone being able to emulate the stream, and typically that is done to a level far beyond the level of entropy that would be truly required, just in case it turns out that your estimator of entropy was very far off. Better safe than sorry. – Cort Ammon Nov 08 '17 at 15:56
  • @MichealJohnson - I'm not sure that a disconnected audio input is a good idea. Since the relevant pre-amps are inside the computer (and thus, most of the noise they pick up is from the computer itself), it's conceivable that an attacker could perform actions that swamp the pre-amps such as to make the RNG output more predictable. – Michael Kohne Nov 09 '17 at 12:43
  • A broken microphone might be better than an unplugged one, find a guide about reducing white noise and then do the opposite until you have a recording that sounds like this https://upload.wikimedia.org/wikipedia/commons/9/98/White-noise-sound-20sec-mono-44100Hz.ogg – daniel Nov 09 '17 at 12:59
  • @daniel The useful part is that pretty much any computer system will have a disconnected audio input (unless they user's actively using the audio input). You'll have a hard time finding a computer with a broken microphone connected. Let's remember that this is a "fallback" for systems without an HRNG or SRNG in "non-critical applications". By the time you've set up a broken microphone, you might as well get a CPU with an SRNG built in (seriously I can't think of anything other than an embedded system or a ridiculously outdated system that won't have an SRNG in the CPU). – micheal65536 Nov 09 '17 at 14:20
  • Although on the other hand, in the absence of both an HRNG and an SRNG, there are better sources of entropy than audio inputs (disk activity and user input are popular choices, although both are less secure and more open to attack/manipulation than an HRNG or SRNG). – micheal65536 Nov 09 '17 at 14:21
  • @MichealJohnson yup, some computers exist in a fairly deterministic world too (VMs). I'm agreeing with the other guy that an empty mic port may not be good. There's the whole hardware "My internal mic stops working when I plug in Headphones" problem on top of some unknowns. – daniel Nov 09 '17 at 14:29
  • @daniel There are a lot of unknowns (including your example of VMs, which typically won't have noise at their audio inputs). But we're looking at a "worst-case" scenario here, where a system lacks both an HRNG and an SRNG and it's too deterministic to use disk or user activity for entropy. – micheal65536 Nov 09 '17 at 16:27
  • @James K: see new last paragraph. If that answers your comment, you might want to remove it so as to cut on the clutter (I'll clear the present comment after you had the opportunity to read it). – fgrieu Nov 10 '17 at 06:21
  • The one extremely difficult thing is asserting a reliable lower bound of how much entropy was really gathered. This is a great point that is often overlooked. Calculating, or even estimating the "min entropy" (I believe that is the term from information theory) is a difficult problem. Anyone who doesn't believe this should check out the Wikipedia page for "min entropy". – Dan Nov 14 '17 at 19:55
11

It is hard to imagine that you could afford enormous quantities of RAM for your system, but can't afford a hardware RNG, such as a cheap model costing 0.25 USD, or down to 0.01 USD at the cost of some ergonomics, available in internationalized varieties in other countries too. You might even have one in your pocket.

But let's suppose you used ECC RAM which reported to your operating system the count of error events, which in turn you could query easily from your operating system. Consider a Poisson-distributed count of error events at a rate of one per week, as @fgrieu suggested as an upper bound on the rate before you throw it all out and buy new RAM. What is the entropy you attain from the count of errors over the course of a week? The probability of a count $k$ at rate $\lambda$ is $$\operatorname{Poisson}(k; \lambda) = \frac{\lambda^k e^{-\lambda}}{k!}.$$ There's no really nice closed form expression for the entropy of a Poisson-distributed variable, but we can use the series $$\lambda \cdot (1 - \log \lambda) + e^{-\lambda} \sum_{k = 0}^\infty \frac{\lambda^k \log k!}{k!}.$$ If we fix $\lambda = 1$, meaning we measure for a week, this simplifies to $$1 - \log 1 + e^{-1} \sum_{k = 0}^\infty \frac{\log k!}{k!} \approx 1.3\;\mathrm{nat} \approx 1.9\;\mathrm{bit}$$ if I did my calculation correctly.

That is, you get just under two bits of entropy by waiting around twiddling your thumbs for a week—thumbs you could have used to flip coins instead during a much shorter time for much greater entropy. And that's Shannon entropy, not min-entropy like cryptographers use! The min-entropy is exactly $1\;\mathrm{nat} \approx 1.4\;\mathrm{bit}$.

Exercise for the reader: Suppose you could measure the inter-arrival times at some clock resolution (say, days, hours, minutes, seconds), rather than the count at the end—maybe you get hardware interrupt on an event, or you just poll every minute/second/etc. What is the (Shannon, min-) entropy in that case?

Squeamish Ossifrage
  • 48,392
  • 3
  • 116
  • 223
  • 3
    Wouldn't you get an entropy from where an error happened? Assuming it is uniformly distributed you get 32 bit (assuming you have 4 GiB of RAM) - significantly more than from timing. – Maja Piechotka Nov 08 '17 at 04:14
  • 3
    @MaciejPiechotka Right -- and you could improve that further by incorporating information about when the error happened. If you're constantly scrubbing for errors, there's no reason you shouldn't be able to pin this down to the second (DDR4-2400 can manage a throughput of 76 GB/s, way more than you'd need) and so that's another ~19 bits or so, depending on how far you push it. – Charles Nov 08 '17 at 07:32
  • 4
    In fairness I should mention that the energy costs of scrubbing high-speed RAM constantly will far outweigh the cost of a hardware RNG. – Charles Nov 08 '17 at 07:33
  • @Charles given the existance of rdrandr the cost might be 0. The point was more that it's incredibly slow (bits/week). – Maja Piechotka Nov 08 '17 at 17:41
  • @MaciejPiechotka Right, and it remains slow if you add timing information. – Charles Nov 08 '17 at 17:55
  • @MaciejPiechotka: If you can pinpoint where the error happened, you could do that. That's additional cost. How much is the additional cost? Are bit flips 0$\rightarrow$1 as likely as 1$\rightarrow$0? Are they actually uniformly distributed across the memory, or are there likely to be clustered depending on manufacturing defects? All these questions have plausible answers that would reduce your entropy estimates, but I don't know what the answers actually are. This ‘answer’ was more an exercise in modelling the system and estimating entropy in the model; fgrieu's answer is the real one. – Squeamish Ossifrage Nov 08 '17 at 20:50
  • Your dime flipping references are witty to be sure, but if they form the bulk of your answer (for generating entropy), how's a rack mounted machine meant to flip 'em? They ain't got thumbs... – Paul Uszak Nov 08 '17 at 21:39
  • You can use it to generate a seed which is updated, and/or you can put down the couple extra cents to pay for a slightly more expensive hardware RNG like the ones found on every modern x86 CPU, most modern ARM SoCs, etc. – Squeamish Ossifrage Nov 08 '17 at 22:22
  • @SqueamishOssifrage You don't know that those chips have TRNGs though do you? You can't possibly check for yourself. You have to trust the US government that they do. I think I prefer to trust your dimes. – Paul Uszak Nov 09 '17 at 13:02
10

I'm assuming that you actually mean memory errors and not initial power up state.

It wouldn't work an iota. There are no memory errors. I say again, there are no memory errors. Modern RAM is pretty good. Just like microprocessors, it either works or doesn't. I've run MemTest86 loads of times. There are three scenarios I've found:

  1. Memory passes with zero failures after running all night.
  2. Memory fails with loads of errors. Remove memory cards, polish /sand the connectors with sandpaper + screwdriver or file and retest. Contact oxidation is a PITA. Go to 1 virtually every time.
  3. Memory fails with a smidgen of errors. Chuck chip away and/or get a refund.

Scenario 3 happens very rarely. That's why most computers (statistically speaking) run non ECC chips. Also Intel doesn't allow consumer ECC and most desktops run okay. Your phone, laptop or tablet won't have it either. And they work most of the time don't they? I have non ECC servers with 365+ days up time. I don't think that a long loop would find any. So it makes AMD's support for ECC rather questionable as a competitive strategy.

The following graph is an extract from a paper titled Memory Errors in Modern Systems. It's only one paper from many, so make of it what you will:-

FIT graph

In summary, the graph shows a monthly error rate for largish computers. You can see that per DRAM chip, the error rate is small. In one day, they got a maximum of 30 FITs /chip /month but that rate decreases with time. A single FIT is a single failure in 1 billion hours of a single device's operation. It is not 30 failures per day. So it only really matters in the exoscale machines they analysed. As your phone proves.

The upshot is that you cannot rely on errors as an entropy source as there aren't any statistically significant amounts. And if you do find some, that's an error and can't be relied upon, so you can't release code to exploit them for entropy distillation. You would have no indication of which machine had errors or how many there were. It seems from Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field that it's very difficult to identify a machine that might be suitable for DRAM entropy extraction. They studied the entire Facebook server estate for over a year. Their findings indicate that:-

  1. Errors are very concentrated on specific individual computers. 1% of servers were responsible for 97.8% of the errors. Fix those servers, and you have virtually no entropy. So on the flip side there's a 99% probability that the server you're pointing to at random produces virtually no errors.

  2. It depends on what the machine is doing specifically. Workload type can influence server failure rate by up to 6.5 times.

  3. The majority of errors (unspecified %age) are actually caused by the memory controller and not the DRAM cells.

  4. A final good result from the hypothesis' perspective is that there is a clear correlation between error rate and cell and CPU density. We can all look forward to much less reliable memory (and more entropy) in the future.

Finally as an exercise, how would you use memory errors? It's not possible. MemTest can find them as it's a tiny application running exclusively on a machine. Coming across a randomly located error doesn't crash MemTest. I guess that errors could happen within its program space, but you'd never know as it would just crash and not report it. On a multi-tasking operating system with virtual machines and massive IO, the whole thing might just crash if significant errors occurred. I don't think that you could have the situation where memory errors only occurred in convenient and accessible areas of the memory map.

There are better ways to get entropy from a computer.

Paul Uszak
  • 15,390
  • 2
  • 28
  • 77
  • 1
    I remember reading some where that if you transfer enough data from a hard drive, over the bus, something on the order of several TB you are going to get errors, which makes working with extremely large files difficult. – TheCatWhisperer Nov 08 '17 at 13:53
  • @TheCatWhisperer: You're not going to get errors merely from shuffling data around, not even terabytes of it. (If that were the case, just backing up a significant amount of data would be next to impossible.) It's just that if you already have flaky hardware, long periods of hard work increase the chance it'll flake out. – cHao Nov 08 '17 at 16:19
  • @cHao: Filesystem developers (especially of ZFS which checksums your data) do say that scrubbing disks is useful. https://blogs.oracle.com/7000tips/disk-scrub-why-and-when. I'm really not sure whether undetected errors over SATA and memory buses are a thing in non-broken desktops, though, or if this is mostly just to detect data corruption early from hardware that's getting flaky, before it corrupts all your files. – Peter Cordes Nov 08 '17 at 18:40
  • 2
    @cHao: also: https://en.wikipedia.org/wiki/Dynamic_random-access_memory#Error_detection_and_correction says that in one year, there's about a 1/3rd chance that any given computer will suffer a memory error. If it doesn't use ECC memory, it will be a real error. It only takes hours or days to copy terabytes, not a year, so I agree that it's generally safe to copy around your data. It's not a bad idea to store checksums (or even a small amount of par2 forward error correction data), though. – Peter Cordes Nov 08 '17 at 18:49
  • 1
    @Peter: Don't get it twisted: Wikipedia is a secondary source. It says that a study claimed that in one year, yada yada. Other studies gave different results. It also says a number of studies reported that most errors were "hard" errors (ie: due to flaky hardware rather than, say, cosmic rays). – cHao Nov 08 '17 at 19:22
  • @cHao: ok right, but that kind of error rate seemed plausible to me. Non-zero, but enough to be a problem for a backup server, especially if doing data compression with a large block size (so a bit error corrupts the rest of the block) – Peter Cordes Nov 08 '17 at 19:24
  • Yeah, i would hope for some kind of error-correction capability. Particularly considering i don't do backups very often -- so if i'm doing a backup, chances are it's because the hardware is already a bit iffy and i'm looking to replace it. :) – cHao Nov 08 '17 at 19:34
  • Why "30 FITs /chip /month"? I read the article as reporting errors of 30 FIT/chip. Thus 30 FIT in the graph is 1 chance in 46 thousands that there is an error in a given chip in a month of operation. Independently: I have never seen advise to use sandpaper or screwdriver for cleaning of DIMMs that was meant to be taken literally; for DIMMs and Smart Cards alike, I use this kind of plastic rubber, which leaves the thin precious metal plating unharmed (this is typically gold, rhodium, or palladium, sometime with nickel). – fgrieu Nov 09 '17 at 13:12
  • @fgrieu Fig.3's caption is from the paper. I didn't type it and they clearly & confusingly state it's "per month". What can I say? There's a lot of interchangeable timescales in that paper. Your rubber won't get into the DIMM socket. That's where the screwdriver comes in. As gentlemen though, we should always use the tool gently. – Paul Uszak Nov 09 '17 at 15:01
  • @Paul Uszak: I'm reading the graph of figure 3 as a the variation of error rate averaged over a month, but still expressed in FIT, that is (AFAIK universally) errors per $10^9$ device⋅hour, not per device⋅month. This is consistent with Figure 4 and Table 3 also giving rates of like 30 FIT, without indication of month; and some other references cited. Plus, 30 errors per device⋅month is an awful lot (one error every 10 minutes on each of Hopper's 6000 nodes, since each of these has 32x18 DRAM devices), and that's at least 100 times more than I would consider bearable. – fgrieu Nov 09 '17 at 17:18
6

Not from memory errors, no.

The other answers cover this in detail.

But there's at least a paper that discusses the idea of using SRAM memory contents of RFID chips just after power-on to obtain both a fingerprint of the device and some random bits.

The summary is that the SRAM cells are symmetric structures of 6 transistors, which are meta-stable at one of two configurations, 0 or 1. At power-on, the two sides of the SRAM cell compete for dominance. Which one wins depends on small intrinsic differences between the transistors, and on noise.

In some cells, these manufacturing differences dominate, and the cell reliably powers up as 0, or 1. These can be used to derive a fingerprint.

In other cells, these differences are so small that noise dominates the outcome, and these cells may power up as either value. These can be used to obtain some entropy.

This an image from the paper. On the left, 0/1 skew drawn as a diagram. On the right, a visual representation of a memory area. Black cells are bits that reliably initialize to 0; white cells reliably initialize to 1; and the gray cells may initialize to either state:

diagram from the paper

The paper is Initial SRAM State as a Fingerprint and Source of True Random Numbers for RFID Tags, by Daniel E. Holcomb, Wayne P. Burleson, and Kevin Fu.

It focuses on RFID tags (which often lack the hardware for proper RNG), but I imagine it should be viable on many other platforms. It also targets SRAM, because of its symmetrical structure. I'm not sure if the technique generalizes to DRAM or other memory types, but many computing devices contain some SRAM somewhere that may be used for this (e.g.: processor caches).

marcelm
  • 586
  • 3
  • 6
  • DRAM works by storing bits in capacitors. These discharge over time. In other words, the bits naturally reset. That is why DRAM needs a periodic refresh. Given a low-enough hardware access, you could stall the refresh and determine which bits are the fastest to reset. – MSalters Nov 09 '17 at 12:52
4

No, as already indicated, waiting for memory errors is not a good idea. First of all, you would very much rely on the specific memory type. If somebody would replace the memory with Error Checking & Correcting (ECC) memory then you would already run into problems. The number of memory errors would be minimal to non-existent on modern computers, and allocating/scanning the entire memory would be very CPU and memory bandwidth consuming. The number of memory errors may also be environmentally sensitive. An attacker could, for instance, lower the temperature to try to minimize the entropy generated by memory errors.

What is sometimes performed is to allocate memory and read it out without wiping it, reading back any info that might be present in such memory. This needs to be an OS function because the OS could very well wipe the memory itself. This is usually only performed to add a bit of entropy to the RNG, not as the main source. Furthermore, it has been proven dangerous: the Debian and Ubuntu Linux distributions were immediately affected when a static code analysis tool discovered the reading of uninitialized memory and a developer removed this source of entropy - unfortunately including with all the other entropy sources.


Using small timing differences (in nano-seconds if a good timer is available) seems to be one of the best ways to collect entropy if a good True Random Generator isn't available. It still has some problems such as running in a headless environment or VM though. Obviously with the SSD's becoming prevalent using HDD's characteristics for a random source is not such a good idea anymore.

R1w
  • 1,952
  • 4
  • 20
  • 45
Maarten Bodewes
  • 92,551
  • 13
  • 161
  • 313
  • Can you actually get ns timing information on DRAM? I know of no way to actually get to the DRAM timers outside of a controller. When we make ICs, we have access to these because we need to blow the rows on bad DRAM blocks; however, I don't know of anyone ever doing late latching. Do you have a good reference for this? – b degnan Nov 07 '17 at 23:22
  • 1
    @bdegnan: I believe that the idea is to look at the cycle counter on the CPU; that is, small differences in how long it takes the DRAM to respond will show up as delta in the number of CPU cycles... – poncho Nov 07 '17 at 23:39
  • I think you got the "uninitialized memory read" explanation wrong. As I recall it, the risk was that the compiler spotted an uninitialized memory read, and propagated that to the RNG state itself. The problem there was, from an optimizer viewpoint<uninitialized> XOR 1 = <uninitialized>, so the XOR can be safely removed. – MSalters Nov 09 '17 at 13:20
4

It has already been pointed out that the memory errors are too infrequent to be useful. There's another problem, though--memory errors due to imperfections in the chip (as opposed to being in the path of a cosmic ray) aren't very random. They're the result of data bleeding somewhere. Your keys will show a very non-random distribution and anyone attacking it will probably know that and prioritize which keys to try first.

Besides, if you need randomness without a high security situation, bang on the keyboard or wiggle the mouse.

  • I don't think that's a problem. Obviously you'd whiten the entropy with a hash or similar. – Martin Bonner supports Monica Nov 08 '17 at 10:21
  • 2
    @MartinBonner That's actually a huge problem, depending on how the error is used. For example if the position of the error is used (as proposed in another answer), and the actual errors only happen at a few certain positions, then the real entropy is much, much lower than estimated - and hashing can not increase entropy (no deterministic algorithm can). – tylo Nov 08 '17 at 12:22
2

Yes, this is possible, see for example this paper from ETH Zurich and Carnegie Mellon University.

We propose D-RaNGe, a mechanism for extracting true random numbers with high throughput from unmodied commodity DRAM devices on any system that allows ma- nipulation of DRAM timing parameters in the memory con- troller. D-RaNGe harvests fully non-deterministic random numbers from DRAM row activation failures, which are bit errors induced by intentionally accessing DRAM with lower latency than required for correct row activation.

It can provide sustained MB/s of randomness, if a large amount of RAM is used (the amount used can be changed on the fly, it doesn't interfere with the rest of the RAM) even hundreds of MB/s.

Nobody
  • 123
  • 6
  • Meta-stability is commonly leveraged for TRNGs, but do you think that D-RaNGe can be made to run on standardised kit rather than via FPGAs? – Paul Uszak Mar 05 '21 at 13:20
  • @PaulUszak I added a quote from the conclusion section of the paper that should answer your question. So some systems would need to be changed, but it's small cheap changes that don't require an FPGA per se. – Nobody Mar 05 '21 at 13:24
  • @PaulUszak I just realized that my own summary of the technique was actually wrong, which is why I deleted it in favour of the quote. They actually reduce latency, not increase it, which is also why such high bandwidths can be attained. They don't give the sense amplifiers enough time to work, that's where the randomness comes from. They don't let charge dissipate so that the sense amplifiers randomly go either way, even though that might also work. – Nobody Mar 05 '21 at 13:36
  • The catch is "on any system that allows manipulation of DRAM timing parameters in the memory controller". How realistic is this for the practitionner? – fgrieu Mar 05 '21 at 13:46
  • 1
    @fgrieu I think it's more targetted at developers of SoC/chipsets/CPUs. They can easily enable such a function. A consumer likely cannot. – Nobody Mar 05 '21 at 13:49
1

I'm not sure about memory errors, but I did successfully write a shuffling algorithm using the unknown execution order of parallel computing. I would move cards(ints) from one deck(list) to another by simultaneously passing every card to different threads and had all the threads 'race' to reinsert them back into another deck. When performed repeatedly, constantly for a few seconds, the deck seems completely shuffled and I never got the same result twice even though I was running the same code. over and over even in the same computer and same environment. But I guess it heavily depends on the environment its executing in whether it's random enough.

Kevin
  • 21
  • 1
  • 1
    I would be very concerned with using this for entropy... The OS that is scheduling the threads is still deterministic. I suspect that the entropy generated by this technique would be quite low – TheCatWhisperer Nov 09 '17 at 14:33
  • @TheCatWhisperer I'm not so sure. I've been playing with the same thing. Whilst the instruction to instruction steps are deterministic, given the billions of ops and branches & interrupts, there's a certain level of unpredictable chaotic behaviour at the macroscopic scale. And Java's SecureRandom uses this method to self seed when necessary. It might be usable entropy. – Paul Uszak Nov 09 '17 at 21:38
  • @TheCatWhisperer Actually it's surprisingly high and pretty much confirmed. I've proved it with jitter in Java's nanotime(), Shipilёv has, and there is the Haveged daemon which really boosts /dev/random. It's called deterministic chaos. – Paul Uszak Mar 05 '21 at 14:07