Understanding "randomness"

drblast · on Oct 18, 2010

The top answer is an excellent demonstration of the central limit theorem and why the question's assumption that rand() and rand() * rand() are "equally random" is wrong.

The central limit theorem is one of the most initially surprising and fascinating things in all of math. Please support the central limit theorem by learning about it on Wikipedia, or wherever fine theorems are discussed.

Edit to add: If you haven't learned about this enough to know that implementing systems that depend on randomness, like cryptographic systems, is really, really difficult and error-prone, please just use a standard library instead.

It's the most counter-intuitive thing in the world that rand() * rand() is less secure than rand(); shouldn't it be twice as unpredictable? But it's not. Cryptography and related fields are rife with counter-intuitive gotchas.

hvs · on Oct 18, 2010

My favorite quote from that page:

Don't take Statistics and Intuition to the same party ....

Eliezer · on Oct 18, 2010

It's my favorite quote too, but intuitions, unlike probability theory, can be changed.

onedognight · on Oct 18, 2010

Here's some intuition about why rand() * rand() is different than rand().

Let's say you are doing rand(2) which produces either 0 or 1, each equally often, then rand(2) * rand(2) will produce 0 much more often than 1.

Addition has a similar problem.

If you use the right range and wrap the result. e.g. if you do mod(rand(2^n) + rand(2^n), 2^n) then addition will not affect the distribution of your numbers.

danger · on Oct 18, 2010

Or you could argue formally by pointing out that the maximum entropy distribution on the interval [0,1] is the uniform distribution, so modifying that distribution will only decrease its entropy, which means that it's "more predictable".

http://en.wikipedia.org/wiki/Entropy_(information_theory)

http://en.wikipedia.org/wiki/Maximum_entropy_probability_dis...

pjscott · on Oct 18, 2010

I remember when I first learned about the central limit theorem, having this sudden realization:

So that's where all those normal distributions come from!

In the statistics classes that most people take, they just dump this weird equation on you, and say that the normal distribution often arises in the real world. But they don't explain why, or in what kinds of situations you're likely to see things being normally distributed. It's like suddenly finding a walrus in your living room: everybody assures you that it's supposed to be there, but where the hell did it come from?

The central limit theorem de-mystifies the normal distribution and makes the world easier to understand.

cdavidcash · on Oct 18, 2010

It's the most counter-intuitive thing in the world that rand() rand() is less secure than rand(); shouldn't it be twice as unpredictable?*

Under some reasonable assumptions, rand() + rand() mod 1 is twice as random! If both calls return independent samples, and at least one of the calls to rand() returns a uniform sample in [0,1], then the sum will be uniform on [0,1].

So if you assume you get at least one good shot and that the other is not specifically correlated to it, you'll be improving your randomness.

thesz · on Oct 19, 2010

I think, you meant "rand() + rand() mod 2".

This is used to get more randomness from not-so-random source.

Just an example. If our rand() returns 0 2/3 of time and 1 1/3 of time, our rand() + rand() mod 2 will return:

0 + 0 mod 2 = 0 2/32/3 = 4/9 times, 1 + 0 mod 2 = 1 1/32/3 = 2/9 times, 0 + 1 mod 2 = 1 2/31/3 = 2/9 times and 1 + 1 mod 2 = 0 1/31/3 = 1/9 times.

So we will get 0 5/9 times and 1 4/9 times. Now we get random source that is much more uniform.

I think it is possible to obtain random bits even from source with correlated samples.

jpr · on Oct 18, 2010

A link for the lazy: http://en.wikipedia.org/wiki/Central_limit_theorem

sophacles · on Oct 18, 2010

Oh man, I would totally upvote you in thanks, but I'm too lazy. Not to mention the moralist in me wanting to instead berate you for keeping the lazy that way. Luckily the karma whore in me has trumped all that and convince the rest to reply with snark :)

samd · on Oct 18, 2010

Anyone interested in understanding randomness should endeavor to understand what probability actually is. As a Bayesian I think that probability is not a property of any object but rather a subjective measure of your own confidence, your own knowledge. To say something is random is just to say that you lack enough information to predict any one outcome over another. But everything could be predicted perfectly (barring quantum events) with enough knowledge. If I knew that a coin was weighted to favor heads my degree of belief in the outcome of the toss would become stronger. The probability for me would become, say, 75% heads, while for the observer without such knowledge it would be 50%.

Here's a good article on probability theory for those interested: http://plato.stanford.edu/entries/probability-interpret/

iskander · on Oct 18, 2010

>@Thilo No, it doesn't... ? If a random variable is uniformly distributed in the range (0,1), and you sample the variable n times, and take the sum, it will just be uniformly distributed in the range (0,n). – user359996 12 hours ago

It's so easy to sound certain and be wildly wrong.

danger · on Oct 18, 2010

Why has nobody in this thread or the stack overflow thread linked to the Wikipedia page on entropy?

http://en.wikipedia.org/wiki/Entropy_(information_theory)

danger · on Oct 18, 2010

Update: finally a good answer. If you want to do the people at stackoverflow a favor, upvote this answer: http://stackoverflow.com/questions/3956478/understanding-ran...

cdavidcash · on Oct 18, 2010

This is because Shannon entropy is basically useless when it comes to proving anything about randomized algorithms or cryptography.

danger · on Oct 18, 2010

How is that related to the SO question? The post is asking about how to tell if one distribution is "more random" than another, which is what entropy is all about.

pjscott · on Oct 18, 2010

Consider a random number generator that produces integers between 0 and 15. Here's a really crappy algorithm:

1. Start with a truly random seed between 0 and 15.

2. Increment it each time to generate a random number, modulo 16.

Suppose you start at 11. Your sequence of random numbers will be 11, 12, 13, 14, 15, 1, 2, 3 ....

This is obviously not very random. But look at the entropy of it. All values between 0 and 15 are equally likely, so this will have the maximum possible entropy: 4 bits.

The problem here is that, for entropy to be an accurate measurement of information content, you have to assume that you're measuring independent identically distributed random variables. The outputs from a pseudo-random number generator are not independent.

danger · on Oct 18, 2010

If we are interested in a _sequence_ of pseudorandom numbers, then we should be talking about the _joint_ entropy of the sequence. But information theory is plenty capable of describing "how random" a sequence is.

The generator you are talking about is far from producing a uniform distribution (which would have maximum entropy) jointly over the n-dimensional hypercube that would represent a sequence of n draws from your generator, so it has less entropy and is thus "less random" than the distribution you'd get from n independent draws from a perfect rand().

pjscott · on Oct 19, 2010

You're right (have an upvote); I was warning more against rash misuse of the concept of entropy, which I see quite a bit.

GavinB · on Oct 18, 2010

More interesting to grasp is that even rand() * rand() / rand() is less evenly distributed than rand().

It does seem like it should be possible to remap the numbers onto an even distribution if you knew how uneven to expect it to be. That would probably lead to some loss of fidelity at the limits of accuracy of the variable, however.

chrisaycock · on Oct 18, 2010

I like the comments about dice. With one die, an outcome of 4 or 2 is equally likely. With two dice, an outcome of 4 is three times more likely than 2.

So while one and two dice games are both random (as in unpredictable), only the one-die game has uniformly distributed outcomes.