Wednesday, July 18, 2007

Serializing lots of different objects into a single file

Here's a neat trick I discovered a while back that I thought I'd share. Us .NET programmers are always doing serialization for one reason or another. The built in BCL binary serializer, System.Runtime.Serialization.Formatters.Binary.BinaryFormatter is a really easy way of persisting objects to disk or any other kind of binary stream. What I didn't realise until I discovered this trick is that you can serialize one object after another onto a single stream and then read them back one by one. You can create a file, serialize some objects to it, close it, then open it again and append a few more. Also the objects don't have to be the same type, the BinaryFormatter just reads to the next object boundary and then returns the object cast as object. Also you don't have to read all the objects back into memory at once. So long as you remember the position of the last object you deserialized, you can just continue at some later date. This is really efficient if you've got huge collections of things you want to store and process. Of course if you want to serialize a lot of independent objects (or object graphs) you could always insert them into some data structure like an ArrayList and then serialize the ArrayList, but this means creating all the objects in memory at once and reading them all back into memory at once which is fine with small collections, but isn't a good strategy for larger amounts of data. Here's a little demo. The meat of it is the functions WriteAnimalToFile and ReadAnimalFromFile. WriteAnimalToFile opens a file, writes one Animal object to it and then closes the file. In the demo we do this for 10 different types of Animal, note that Animal is an abstract base class that's specialized by Cat and Dog. ReadAnimalFromFile opens a file, seeks to the given position, reads one animal back and then closes it. It returns the animal and the new position. In the demo we read back all the animals we created with WriteAnimalToFile. Note that in ReadAnimalFromFile we don't have to tell the BinaryFormatter what kind of object to expect, it just reads to the next object boundry. If the position is at the end of the file, we just return null.
using System;
using System.IO;
using NUnit.Framework;

namespace SerializerTest
{
 [TestFixture]
 public class SerializerTests
 {
        [Test]
        public void SerializeLotsOfObjects()
        {
            // get the path for the file we're going to serialize into
            string path = @"c:\SerializedObjects.ser";

            // create the file we're going to use
            using(File.Create(path)){}

            // create some animals
            for(int i=0; i<10; i++)
            {
                Animal animal;
                string name = string.Format("Animal_{0}", i);
                int age = 5+i;

                // make even numbers dogs, odd numbers cats
                if((i % 2) == 0)
                {
                    bool trained = ((i % 3) == 0);
                    animal = new Dog(name, age, trained);
                }
                else
                {
                    int lives = 9-i;
                    animal = new Cat(name, age, lives);
                }

                // write each animal to a file
                WriteAnimalToFile(path, animal);
            }

            // read the animals back one by one.
            long position = 0;
            while(true)
            {
                Animal animal = ReadAnimalFromFile(path, ref position);
                if(animal == null) break;
                Console.WriteLine(animal.Introduce());
            }
        }

        private void WriteAnimalToFile(string path, Animal animal)
        {
            // create a new formatter instance
            System.Runtime.Serialization.Formatters.Binary.BinaryFormatter formatter = 
                new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();

            // open a filestream
            using(FileStream stream = new FileStream(path, FileMode.Append, FileAccess.Write))
            {
                formatter.Serialize(stream, animal);
            }
        }

        private Animal ReadAnimalFromFile(string path, ref long position)
        {
            // create a new formatter instance
            System.Runtime.Serialization.Formatters.Binary.BinaryFormatter formatter = 
                new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
            
            // read the animal as position back
            Animal animal = null;
            using(FileStream stream = new FileStream(path, FileMode.Open, FileAccess.Read))
            {
                if(position < stream.Length)
                {
                    stream.Seek(position, SeekOrigin.Begin);
                    animal = (Animal)formatter.Deserialize(stream);
                    position = stream.Position;
                }
            }
            return animal;
        }
 }

    [Serializable]
    public abstract class Animal
    {
        string _name;
        int _age;

        public Animal(string name, int age)
        {
            _name = name;
            _age = age;
        }

        public abstract string Introduce();

        public string Name{ get { return _name; } }
        public int Age{ get { return _age; } }
    }

    [Serializable]
    public class Dog : Animal
    {
        bool _isTrained;

        public Dog(string name, int age, bool isTrained) : base(name, age)
        {
            _isTrained = isTrained;
        }

        public override string Introduce()
        {
            return string.Format("I am a dog called {0}, age {1}, {2}trained.", Name, Age, 
                (_isTrained ? "": "not "));
        }

        public bool IsTrained{ get { return _isTrained; } }
    }

    [Serializable]
    public class Cat : Animal
    {
        int _lives;

        public Cat(string name, int age, int lives) : base(name, age)
        {
            _lives = lives;
        }

        public override string Introduce()
        {
            return string.Format("I am a cat called {0}, age {1}, with {2} lives", 
                Name, Age, _lives);
        }

        public int Lives{ get { return _lives; } }
    }
}
The output should look like this:
I am a dog called Animal_0, age 5, trained.
I am a cat called Animal_1, age 6, with 8 lives
I am a dog called Animal_2, age 7, not trained.
I am a cat called Animal_3, age 8, with 6 lives
I am a dog called Animal_4, age 9, not trained.
I am a cat called Animal_5, age 10, with 4 lives
I am a dog called Animal_6, age 11, trained.
I am a cat called Animal_7, age 12, with 2 lives
I am a dog called Animal_8, age 13, not trained.
I am a cat called Animal_9, age 14, with 0 lives
Note that you can't do this trick with the XML Serializer since we have to specify the type we're expecting. Also a single file with multiple XML documents would be malformed.

4 comments:

Anonymous said...

Great article. So useful in so many situations.

Mike Hadlow said...

Thanks James

Anonymous said...

Hi Mike,
Is there a way in .NET to delete a particular item after it's been serialized to a Binary formatted file. I've scowered the Internet for an example, but can't seem to come up with one. Maybe it's not possible; perhaps I'm thinking of this is too much like an XML file that can be updated/deleted by elements.
Using your example here, would you be able to locate Animal_4 by index, and delete that item by way of position within the file.
I know this can be done with a hash table, by deserializing the entire file and removing the item at a certain index, but as you pointed out you have to read and write the rest of data back into the file, which seems redundant.
If you could point me in the right direction that would be great. Thanks James

Unknown said...

Awesome, thanks.