dinsdag 4 januari 2011

Compress and Decompress Strings with C#

To reduce the data size in our SQL tables we compress the full received message for all messages older than 90 days.
To accomplish this I use a compression and a decompression function in C#.

I've come up with 2 different solutions.
One will use the System.IO.Compression namespace, while the other uses an external referenced library, namely SharpZipLib.

First I'll show you how you can do this by just using the build in .NET functions.
the function to decompress the String looks like this:


    public static string Zip(string value)
    {
        //Transform string into byte[]
        byte[] byteArray = new byte[value.Length];
        int indexBA = 0;
        foreach (char item in value.ToCharArray())
        {
            byteArray[indexBA++] = (byte)item;
        }

        //Prepare for compress
        System.IO.MemoryStream ms = new System.IO.MemoryStream();
        System.IO.Compression.GZipStream sw = new         System.IO.Compression.GZipStream(ms,
        System.IO.Compression.CompressionMode.Compress);

        //Compress
        sw.Write(byteArray, 0, byteArray.Length);
        //Close, DO NOT FLUSH cause bytes will go missing...
        sw.Close();

        //Transform byte[] zip data to string
        byteArray = ms.ToArray();
        System.Text.StringBuilder sB = new         System.Text.StringBuilder(byteArray.Length);
        foreach (byte item in byteArray)
        {
            sB.Append((char)item);
        }
        ms.Close();
        sw.Dispose();
        ms.Dispose();

        return sB.ToString();
    }

The Decompression will be done by following funtion:


    public static string DecompressData(string sData)
    {
        byte[] byteArray = new byte[sData.Length];

        int indexBa = 0;
        foreach (char item in sData)
            byteArray[indexBa++] = (byte)item;

        MemoryStream memoryStream = new MemoryStream(byteArray);
        GZipStream gZipStream = new GZipStream(memoryStream, CompressionMode.Decompress);

        byteArray = new byte[1024];

        StringBuilder stringBuilder = new StringBuilder();

        ;int readBytes;
        while ((readBytes = gZipStream.Read(byteArray, 0,byteArray.Length)) != 0)
        {
           for (int i = 0; i < readBytes; i++)             stringBuilder.Append((char)byteArray[i]);             }         gZipStream.Close();         memoryStream.Close();         gZipStream.Dispose();         memoryStream.Dispose();          return stringBuilder.ToString();     }


Another way to solve this is by using the SharpZipLib Library, which can be found here.
This will result in following methods:


    private static string Compress(string strInput)
    {
    try
    {
    byte[] bytData = System.Text.Encoding.UTF8.GetBytes(strInput);
    MemoryStream ms = new MemoryStream();
    Stream s = new DeflaterOutputStream(ms);
    s.Write(bytData, 0, bytData.Length);
    s.Close();
    byte[] compressedData = (byte[])ms.ToArray();
    return ConvertByteToString(compressedData);
    }
    catch (Exception e)
    {
    Console.WriteLine(e.ToString());
    return null;
    }
    }
    
    private static string DeCompress(string strInput)
    {
    byte[] bytInput = ConvertStringToByte(strInput);
    string strResult = "";
    int totalLength = 0;
    byte[] writeData = new byte[4096];
    Stream s2 = new InflaterInputStream(new MemoryStream(bytInput));
    
    try
    {
    while (true)
    {
    int size = s2.Read(writeData, 0, writeData.Length);
    if (size > 0)
    {
    totalLength += size;
    strResult += System.Text.Encoding.ASCII.GetString(writeData, 0,
    size);
    }
    else
    {
    break;
    }
    }
    s2.Close();
    return strResult;
    }
    catch (Exception e)
    {
    Console.WriteLine(e.ToString());
    return null;
    }
    }
    
    private static byte[] ConvertStringToByte(string strInput)
    {
        byte[] byteArray = new byte[strInput.Length];
    
        int indexBa = 0;
        foreach (char item in strInput)
            byteArray[indexBa++] = (byte)item;
        return byteArray;
    }
    private static string ConvertByteToString(byte[] compressedData)
    {
        System.Text.StringBuilder sB = new System.Text.StringBuilder(compressedData.Length);
        foreach (byte item in compressedData)
        {
            sB.Append((char)item);
        }
        return sB.ToString();
    }


Conclusion:
The second method, with the SharpZipLib library, results in a better compression than the first solution. So by using SharpZipLib you'll get the lowest data size for the resulting compressed string.

9 opmerkingen: