Sunday 25 April 2010

Safe memory usage in C# part 2/2

This is the second and last part of my article on safe memory usage in C# (you can find the first post here). While the previous post discussed the "why", this post will cover the "how".

As promised, this is where I will show you how to create, use and retrieve complex real-world structures whose data live entirely in the LinearMemory.  If you are of the impatient type, you can download the code and play around with it a bit before reading further. The code is open source, which means you can do whatever you want with it.

Representation of data structures

Given the nature of the LinearMemory, it is obvious that each data structure will have to be something very similar to C structs, i.e. a queue of fixed-size data like:

struct Apple {
    int Color; // always 2 bytes
    float Weight; // always 4 bytes
} // 6 bytes total

This is, however, not very object-oriented. What we really want to do is to use classes, but not actual objects. Objects are allocated on the heap, while we want some sort of object-like behaviour where the data are entirely contained in the LinearMemory.

If you're thinking "static class", well done!  A static class in which every method takes a LinearMemory base pointer (address of the start of the data structure) in addition to their "normal" parameters will fit our needs nicely. Each of these static classes will be tailored towards one specific data structure, and will contain the necessary logic to create, modify and access the values of the data structure. There is nothing innovative about this syntax, it is actually how normal objects methods are accessed in IronPython and similar languages already!

Avoiding serialization

Given that we already have string-to-byte and byte-to-string functions, using XML as object representation springs to mind. After all, C# has excellent support for serializing and deserializing objects to and from XML via System.Xml.Serialization.XmlSerializer. Using that, we now have everything we need to store any data in LinearMemory. The problem is, of course, that we cannot use the data without deserializing it. Since we need to allocate heaped memory to store the deserialized object before using it, we have lost all the safety we're working so hard to get!

Serialization is a no-go; we need to write functions to read and manipulate live data inside the LinearMemory. For the new structures I intend to introduce, we only need to extend the LinearMemoryTools a little so it can interpret integers in addition to strings. Each number is simply encoded in 5 big-endian bytes:

//PARTIAL LinearMemoryTools.cs
/// <summary>
/// Access the bytes of a int.
/// </summary>
/// <param name="i">The int.</param>
/// <returns>The bytes.</returns>
public static byte[] IntBytes(int i) {
    if (i > int.MaxValue) throw new OverflowException("int too big!");
    bool positive = false;
    if (i > 0) positive = true;
    i = Math.Abs(i);
    byte[] bytes = new byte[5];
    bytes[0] = (byte)(i / 16777216);
    i = i % 16777216;
    bytes[1] = (byte)(i / 65536);
    i = i % 65536;
    bytes[2] = (byte)(i / 256);
    i = i % 256;
    bytes[3] = (byte)(i);
    bytes[4] = positive ? (byte)'P' : (byte)'N';
    return bytes;
}

/// <summary>
/// Retrieve a int from memory.
/// </summary>
/// <param name="ptr">Ptr to int.</param>
/// <param name="len">Must be 5.</param>
/// <returns>The int.</returns>
public static int BytesInt(int ptr, int len) {
    int i = 0;
    i += 16777216 * LinearMemory.mem[ptr + 0];
    i += 65536 * LinearMemory.mem[ptr + 1];
    i += 256 * LinearMemory.mem[ptr + 2];
    i += LinearMemory.mem[ptr + 3];
    if (LinearMemory.mem[ptr + 4] == (byte)'N')
        i = -i;
    return i;
}

Note that the second parameter to BytesInt could strictly be removed. I decided to leave it in for two reasons: 1) I plan to extend the method to support integers of arbitrary length in the future, and 2) it forces the user to keep the length of the representation in mind.

Enter the structures

To keep motivated when coding this, I decided to create a database that we all dream of administrating sometime, a company! A company needs employees:

//Employee.cs
public static class Employee {
    private const int OFF_NAME____ =  0;
    private const int OFF_NAME_LEN = OFF_NAME____ + 20;
    private const int OFF_SALARY__ = OFF_NAME_LEN + 5;
    private const int OFF_PHONE___ = OFF_SALARY__ + 5;

    public static int Size = OFF_PHONE___ + 5;
    
    public static int Create(string name, int salary, int phone) {
        if (name.Length > 20)
            throw new OverflowException("name too long! max 20 chars");
        int p = LinearMemory.malloc(Size);
        LinearMemoryTools.Insert(p + OFF_NAME____,
                                 LinearMemoryTools.StringBytes(name));
        LinearMemoryTools.Insert(p + OFF_NAME_LEN,
                                 LinearMemoryTools.IntBytes(name.Length));
        LinearMemoryTools.Insert(p + OFF_SALARY__,
                                 LinearMemoryTools.IntBytes(salary));
        LinearMemoryTools.Insert(p + OFF_PHONE___,
                                 LinearMemoryTools.IntBytes(phone));
        return p;
    }
}

Easier than you feared, eh? The object doesn't do much of anything yet, but we'll get to that later. Now for the structure that ties our database together:

//Company.cs
class Company {
    private const int OFF_NAME____ = 0;
    private const int OFF_NAME_LEN = OFF_NAME____ + 20;
    private const int OFF_NUM_EMPL = OFF_NAME_LEN + 5;
    private const int OFF_EMP_PTRS = OFF_NUM_EMPL + 5;

    private static int Size = OFF_NUM_EMPL + 5 + 500;

    public static int Create(string name) {
        if (name.Length > 20)
            throw new OverflowException("name too long! max 20 chars");
        int p = LinearMemory.malloc(Size);
        LinearMemoryTools.Insert(p + OFF_NAME____,
                                 LinearMemoryTools.StringBytes(name));
        LinearMemoryTools.Insert(p + OFF_NAME_LEN,
                                 LinearMemoryTools.IntBytes(name.Length));
        // Start out with 0 employees.
        LinearMemoryTools.Insert(p + OFF_NUM_EMPL,
                                 LinearMemoryTools.IntBytes(0));
        return p;
    }

A bit trickier; you can see we're keeping track of a list of pointers to employees. If pointers confuse you, think of this as a wonderful opportunity to learn how to use them. As programming guru Joel says, real programmers need to understand pointers. To elaborate, what we store in the company structure in offset 30 onwards is just a queue of integers that each represent an address of (=a pointer to!) an employee structure. When looking at the LinearMemory at each of these addresses, we will find a complete working employee data structure.  See? No magic at all.

Accessors, and modifiers, and reports! Oh my!

Note that we keep the structure's offset values private. This is deliberate, since the code that uses the data structure should never need to know the details of the representation. Instead of writing "LinearMemoryTools.BytesInt(employeePtr, Employee.OFFSET_SALARY, 5)" all over the place, we can write "Employee.GetSalary(employeePtr)", which is much more readable. Conversely, we want to modify the object data with "Employee.SetSalary(employeePtr)". These setters and getters give us a tremendous advantage; we can now refer to objects via a single value - the address to the start of the object. Since we can refer to each object by a unique value, we are now fully object-oriented!

The GetXXX methods are called accessors and are coded as follows:

//Employee.cs
public static string GetName(int ptr) {
    i = LinearMemoryTools.BytesInt(ptr + OFF_NAME_LEN, 5);
    return LinearMemoryTools.BytesString(ptr + OFF_NAME____, i);
}

public static int GetPhone(int ptr) {
    return LinearMemoryTools.BytesInt(ptr + OFF_PHONE___, i);
}

public static int GetSalary(int ptr) {
    return LinearMemoryTools.BytesInt(ptr + OFF_SALARY__, i);
}

//Company.cs
public static int GetEmployee(int ptr, int idx) {
    return LinearMemoryTools.BytesInt(ptr + OFF_EMP_PTRS + 5 * idx, 5);
}

public static string GetName(int ptr) {
    i = LinearMemoryTools.BytesInt(ptr + OFF_NAME_LEN, 5);
    return LinearMemoryTools.BytesString(ptr + OFF_NAME____, i);
}

public static int GetNumEmployees(int ptr) {
    return LinearMemoryTools.BytesInt(ptr + OFF_NUM_EMPL, 5);
}

Not too hard.

Now for the modifiers. Modifiers are, as you'd guess, methods that modify the object data. They don't know what's stored already, they store what you tell them to store. We call them SetXXX:

//Employee.cs
public static void SetName(int ptr, string name) {
    if (name.Length > 20)
        throw new OverflowException("name too long! max 20 chars");
    LinearMemoryTools.Insert(ptr + OFF_NAME____,
                             LinearMemoryTools.StringBytes(name));
    LinearMemoryTools.Insert(ptr + OFF_NAME_LEN,
                             LinearMemoryTools.IntBytes(name.Length));
}

public static void SetPhone(int ptr, int phone) {
    LinearMemoryTools.Insert(ptr + OFF_PHONE___,
                             LinearMemoryTools.IntBytes(phone));
}

public static void SetSalary(int ptr, int amount) {
    LinearMemoryTools.Insert(ptr + OFF_SALARY__,
                             LinearMemoryTools.IntBytes(amount));
}

//Company.cs
public static void AddEmployee(int ptr1, int ptr2) {
    // Get number of employees.
    i = LinearMemoryTools.BytesInt(ptr1 + OFF_NUM_EMPL, 5);
    // Add 1 and store new number.
    i += 1;
    LinearMemoryTools.Insert(ptr1 + OFF_NUM_EMPL,
                             LinearMemoryTools.IntBytes(i));
    // Ptr to new employee ptr.
    j = OFF_EMP_PTRS + 5 * (i - 1);
    // Be sure not to write out of object bounds!
    if (j > Size) throw new OverflowException("too many employees!");
    LinearMemoryTools.Insert(ptr1 + j, LinearMemoryTools.IntBytes(ptr2));
}

public static void SetName(int ptr, string name) {
    LinearMemoryTools.Insert(ptr + OFF_NAME____,
                             LinearMemoryTools.StringBytes(name));
    LinearMemoryTools.Insert(ptr + OFF_NAME_LEN,
                             LinearMemoryTools.IntBytes(name.Length));
}

And now we basically have all we need!

To make our life easier during debugging, though, we might want to add reports. These are simply wrappers around the accessors that present the entire data structure in a human-readable format.  They basically serve the same needs as ToString() methods on native data structures:

//Employee.cs
public static void Print(int ptr, TextBox stdout) {
    stdout.Text += "\r\n--Employee: " + GetName(ptr);
    stdout.Text += "\r\n  Salary..: " + GetSalary(ptr);
    stdout.Text += "\r\n  Phone...: " + GetPhone(ptr);
}

//Company.cs
public static void Print(int ptr, TextBox stdout) {
    stdout.Text += "\r\n======= COMPANY =======";
    stdout.Text += "\r\nName: " + GetName(ptr);
    j = LinearMemoryTools.BytesInt(ptr + OFF_NUM_EMPL, 5);
    stdout.Text += "\r\n" + j + " employees:";
    for (i = 0; i < j; i++) {
        k = GetEmployee(ptr, i);
        Employee.Print(k, stdout);
    }
}

Oh my! So, what else do we need for our database? Nothing! You can now create, modify and read every piece of data. Everything on top of that is just a matter of user interface, the nuances of which will have to be a topic for another post.

Making it work

To save you the work of typing in all the code and creating a user interface, I have made a fully working Visual Studio 2008 downloadable solution for you to play with. When you start it up, you will see the mainframe:

From the mainframe, you have access to all implemented subprocesses via the button rack on the top. The large text area accumulates reports from each subprocess you run. The most interesting subprocess button, "Run Company", launches the Company Database interface. With that interface, you can edit the company name, add employees, and edit the Company database's Employee data structures:

You can also launch the visual debugger (available from the "debug" button on the mainframe), where you can view the current state of the LinearMemory. To aid in debugging memory "leaks", the allocated memory is colored differently from the unallocated memory:


In the screenshot above, I have allocated an entire company data structure with lots of empty space saved for pointers to employees, and two employee data structures.

And that's it! I hope the code is self-explanatory. If not, just toss me a comment below, and I'll do my best to enlighten you. :)

10 comments:

  1. It's like you're patting yourself on the back for discovering that lead paint tastes delicious.

    ReplyDelete
  2. this is quite breath teaking, you are essentially going back to the roots of programming and we can see how powerful it is. i i was reading on wikipedia about turing machine and correct me if im wrong but this looks like it the same deal, its really about some big array where you move around symbols, just with minor differences.

    ReplyDelete
  3. I like the memory viewer very good. Nice, professional. I think that says it all.

    ReplyDelete
  4. Anonymous #2:

    Thanks. :) Turing is truly one of the greatest heroes of WWII. As I recall, his Turing Machine was instrumental in breaking the German ENIGMA encryption and turning the war around.

    While I don't know exactly the algorithm for Turing's machine, I think the LinearMemory is well suited to encryption in general, sice you have free access directly to the bytes. You could even encrypt and decrypt an object in-place!

    ReplyDelete
  5. Nice, but I think you forgot a static method on the classes which deallocates the indicated object on the big byte array -- a Destroy(int ptr) to balance the Create(..).

    ReplyDelete
  6. @chris: Yes it is. A better idea, however, would be to add reference counting and some ReferenceInc() and ReferenceDec() methods. ReferenceDec() would then do the work of Destroy() when the reference crosses 0. This way we get automatic garbage collection for free. :)

    ReplyDelete
  7. Emulating C memory management for mission-critical apps, that was amusing. But this:

    "While I don't know exactly the algorithm for Turing's machine, I think the LinearMemory is well suited to encryption in general"

    HAHAHAH! You sir are a practical joker of top rank.

    ReplyDelete
  8. If you don't know something as simple as how to implement a Turing machine, you probably shouldn't be writing a heap allocator. This has to be one of the most misguided expeditions I've seen in a long time.

    ReplyDelete
  9. I really hope this is a joke:

    if (i > int.MaxValue) throw new OverflowException("int too big!");

    ReplyDelete
  10. Did you really use underscores to align your equal signs? All of this is atrocious. It has to be a joke.

    ReplyDelete