top of page

Hash Keys

Many years ago, I went to a course in developing better SQL select statements. On that course the teacher asked what a database is good at?

I always come back to the answer on that question in my thinking of database strategy. A database is good at two things:

  1. Finding data in a large amount of data.

  2. Math.

Number one explains why we bather to build structured data warehouses at all. Number two is why I start this blog with these lines.

When I first came across hash or checksum functions it was when checking the send result in an FTP program I wrote. I first calculated the hole text string I was supposed to send using a checksum algorithm, I they run the same checksum algorithm on the receiver side to see if I got the same result. If not, a glitch has destroyed the string and I had to retry sending it.

First time I used in a database was when I was supposed to load data from a system that did not keep any unique key value to its data. I remined me of the FTP program and tried out to string all data in each post and send the throe a hash algorithm to achieve a bit array which I could convert to a char (32). The result then became my key value. This way I could check for duplets and I had a key value for my table. This was an amazingly effective approach because of the fast maths in the SQL database I used.  

This way of finding unique rows in a large amount of data could of course be used to handle the big amount of data we now days get to the DWHs. Not only you get a unique key you could also calculate duplets.

Hash recipe (Swedish one pyttipanna)

Cut six medium size potatoes into 1 cm cubes, put in a hot frying pan with chopped onions. Cut ham into 1 cm cubes and put them in the same pan. Fry for 10-20 min. Serve with fried eggs and some sliced beetroots.

Hash C# recipe 

This is a simple C# code that calculates a hash value from a string (the InString).

using System;
using System.Security.Cryptography;
using System.Text;

namespace HashFunction
{
    class Program
    {
        static void Main(string[] args)
        {
            string InString = "Add a value to calculate hash here";
            Console.WriteLine(InString);
            Console.WriteLine(HashValue(InString));
            Console.ReadLine();
        }
        /// <summary>
        /// HashValue calculates InString to the Byte array b and 
        /// then via the StringBuilder function appends the value 
        /// to string hash that is returned 
        /// </summary>
        /// <param name="InString">Any string value</param>
        /// <returns>MD5 Hash value as a string</returns>
        static string HashValue(string InString)
        {
            MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider();
            StringBuilder hash = new StringBuilder();

            byte[] b = md5.ComputeHash(new UTF8Encoding().GetBytes(InString));

            for (int i = 0; i < b.Length; i++)
            {
                hash.Append(b[i].ToString("x2"));
            }
            return hash.ToString().ToUpper();
        }
    }
}

Result from this code is a 32 character long string value based on the string

"Add a value to calculate hash here" 

hash: [D0461BE2210183CCBD3AFEDCF3B2E2D9]

SQL Server Hash recipe

Uses the [HASHBYTES] function in SQL Server and then convert the result via [fn_varbintohexstr] function found in the master DB

DECLARE @Instring varchar(max) = 'Add a value to calculate hash here',
        @outString char(32)
        
/*select md5 hash of @Instring and return bytes*/
select HASHBYTES('md5', ISNULL(@Instring,'')) 'Binary value'

/*Use internal function convert varbinary to string so it could be stored as a key*/
SET @outString = (select SUBSTRING(UPPER(master.dbo.fn_varbintohexstr(HASHBYTES('md5', ISNULL(@Instring,'')))),3,32))

 

SELECT @outString 'Char32 value'

Result from this code is a 32 character long string value based on the string

"Add a value to calculate hash here" 

hash: [D0461BE2210183CCBD3AFEDCF3B2E2D9]

C# Hash Recipe
SQL Server Hash recipe
bottom of page