Wednesday 14 April 2010

Real Number Generator

Du you like numbers? I sure do! I was reading up on the number Aleph-naught recently and thought it would be fun to implement Cantor's diagonal theory as a C# enumerator over doubles.

As much fun as it was to write, the generator also turned out to be a very versatile tool that I'm finding new uses for each day. But, before going into detail on its many many uses, I'll give a short demonstration of how it works.

The code

Let's start with the simple stuff, our class needs to implement IEnumerable<double> and IEnumerator<double> so we can use it in a foreach branch:

class RNG : IEnumerable<double>, IEnumerator<double> {
}

First off, our class need some state. I assume you have already read the diagonal theory, so it is obvious we need two integers n and m which are basically row and column coordinates on an infinite table where each row is a list of all integers:

class RNG : IEnumerable<double>, IEnumerator<double> {
    private int n;
    private int m;
    void IEnumerator.Reset() {
        n = 0;
        m = 0;
    }
}

Now we get to the important bit, the IEnumerator.MoveNext() which advances the state one tick:

bool IEnumerator.MoveNext() {
    if (n == 0) {
        n = m + 1;
        m = 0;
    } else {
        n -= 1;
        m += 1;
    }
    return true;
}

If you try this out for yourself a bit on a subset of the infinite table on paper, you will see how this diagonal method of movement will eventually touch each and every table cell you can think of. Since we can never run out of integers, the call always succeeds by returning true.

Now on to the other slightly tricky bit, we need to get this state represented as a double (real) number. This is done by procedurally generating the contents of the cell in row n column m of the infinite table. Remembering that each real number consists of an integral part and a fractional part, it is immediately obvious that we can just let the numbers n and m "run out from" the decimal separator. I.e., n gets to be the integral part and a reversed representation of m gets to be the fractional part. In code:

double IEnumerator<double>.Current {
    get {
        string integralPart = n.ToString();
        char[] fracChars = m.ToString().ToCharArray();
        Array.Reverse(fracChars);
        string fractionalPart = new string(fracChars);
        // CultureInfo.InvariantCulture just means that this code won't crash
        // on you if your country uses something else than "." as the decimal
        // inductor.
        return double.Parse(integralPart + "." + fractionalPart,
                            CultureInfo.InvariantCulture);
    }
}

And we're basically done! After adding some boilerplate to get it to compile, the complete class file looks like this:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Globalization;

namespace RealNumberGenerator {
    class RNG : IEnumerable<double>, IEnumerator<double> {
        private int n;
        private int m;

        bool IEnumerator.MoveNext() {
            if (n == 0) {
                n = m + 1;
                m = 0;
            }
            else {
                n -= 1;
                m += 1;
            }
            return true;
        }

        double IEnumerator<double>.Current {
            get {
                string integralPart = n.ToString();
                char[] fracChars = m.ToString().ToCharArray();
                Array.Reverse(fracChars);
                string fractionalPart = new string(fracChars);
                return double.Parse(integralPart + "." + fractionalPart,
                                    CultureInfo.InvariantCulture);
            }
        }

        IEnumerator<double> IEnumerable<double>.GetEnumerator() {
            return this;
        }

        IEnumerator IEnumerable.GetEnumerator() { return this; }

        object IEnumerator.Current {
            get { return (this as IEnumerator<double>).Current; }
        }

        void IEnumerator.Reset() {
            n = 0;
            m = 0;
        }

        void System.IDisposable.Dispose() { }
    }
}

A simple usage of the RNG which outputs every number you can think of if you wait long enough:

static void Main(string[] args) {
    foreach (double n in new RNG())
        System.Console.WriteLine(n);
}

And that's the proof of the pudding! Feel free to steal this code and use it for your own projects.

The uses

So we've got our fancy new generator up and running, but for what purpose? As I mentioned, its uses are manifold, but I'll just describe the two biggest use cases:
  • You need a bunch of numbers in a hurry.
  • A cheap, efficient and secure random number generator.
I know all the points are not immediately obvious, so I'll touch on each of the points in detail:

You need a bunch of numbers in a hurry.

The most common use of a number generator is just retrieving arbitrary numbers for unit test data generation, example data, etc. You don't want your test data to have any linear patterns, because linear patterns tend to be too regular to give the tested code a good straining. Also, example data with linear patterns is proven to be hard to understand for most users. Going for a full random number generator is not good either, because since it produces random results, there is no easy way to re-run the exact same tests when you see a problem. Solution: The Real Number Generator.

A cheap, efficient and secure random number generator.

I know this point is highly controversial and hard to "get", but you should do yourself a favour and at least try to understand the following explanation on how it works out. I believe all programmers should strive to learn new things, even if they are a little difficult to grasp.

First off, some terminology. Since random number generators are often confusingly referred to as RNGs, let us call them RaNGs here to differentiate them from real number generators.

That the RNG is cheap and efficient compared to a RaNG is obvious, so I won't go into that here. I will rather explain how it qualifies as a random number generator. First, let us observe that the word "random" simply means "without uniformity". Since our generator can be mathematically proven not to generate uniform numbers, it is by definition a completely valid random number generator.

We usually expect more of our RaNGs, though; we expect them to satisfy the other definition of "random", namely "occurring without definite aim, reason, or pattern". While a normal user might see a pattern in the outputted numbers, if we simply drop 6 out of every 7 number from the generator, we will get a list of numbers that really have no obvious pattern. For instance, the first such 9 numbers are [1.0, 1.2, 5.0, 5.1, 6.1, 8.0, 1.7, 4.5, 8.2], which appears random enough for most users.

I know you - a well-educated programmer - will probably be able to guess the nature of this sequence and start predicting the next number, but remember that the person who will end up using your software is a normal human being that first off likely doesn't have your math skills, and most importantly have no interest in analyzing seemingly random numbers! To your users it will be random, and your users are what should matter. Countless old and wise developers constantly urge us to think like a user. Let us take their advice.

Should you for some reason really need truly random numbers, that can be fixed easily. Instead of skipping 6 out of every 7 numbers, use a RaNG to skip a random amount of numbers between each output number! Why not just use the RaNG directly instead? I'm glad you asked, because that brings us to the last bit; why this is a secure random number generator:

Note that plain RaNG can output any number at any time. This will occasionally cause problems when the same number is repeated twice in a row, or the same sequence of numbers is repeated after just a short interval. For example, this causes massive outrage when two lottery sequences are repeated twice in a short time. While one can always opt for a lengthy debate with the uninformed public on why such repetitions are bound to happen sooner or later, it is safer just to guarantee that the issue never comes up. Since the RNG provably never repeats itself, it has the terribly useful property that it is secure from repetition.

That's all for this time, thanks for reading! And remember, the code is free to copy and use for any purpose. :-)

If you felt anything was a bit hard to understand, just drop a comment below and I'll try my best to give a helpful response.

4 comments:

  1. First; Nice blog.

    2nd, this is as u say very useful. I am needing to implement an encryption scheme myself, and I'll try fit ur "ReNG" into it. The efficiency is a huge selling point, as the computational slowness of traditional encryption, based on RaNG, hinders the more widespread use. I think many technologies would of been based on encryption if it was faster. What do u think?

    ReplyDelete
  2. Hi, thanks for the comment!

    Unfortunately, I don't think this can be used for encryption.

    I must andmit I don't know encryption very well, but I know that basically you need prime numbers everywhere. Since my RNG isn't very intergrated with prime numbers, it can't be used directly.

    However, if you find out how to make it work please let me know! :)

    ReplyDelete
  3. congratulations

    you've successfully implemented http://dilbert.com/strips/comic/2001-10-25/

    ReplyDelete
  4. You put the Poe in Poe's Law. Excellent post.

    ReplyDelete