Hash Keys
Many years ago, I went to a course in developing better SQL select statements. On that course the teacher asked what a database is good at?
I always come back to the answer on that question in my thinking of database strategy. A database is good at two things:
-
Finding data in a large amount of data.
-
Math.
Number one explains why we bather to build structured data warehouses at all. Number two is why I start this blog with these lines.
When I first came across hash or checksum functions it was when checking the send result in an FTP program I wrote. I first calculated the hole text string I was supposed to send using a checksum algorithm, I they run the same checksum algorithm on the receiver side to see if I got the same result. If not, a glitch has destroyed the string and I had to retry sending it.
First time I used in a database was when I was supposed to load data from a system that did not keep any unique key value to its data. I remined me of the FTP program and tried out to string all data in each post and send the throe a hash algorithm to achieve a bit array which I could convert to a char (32). The result then became my key value. This way I could check for duplets and I had a key value for my table. This was an amazingly effective approach because of the fast maths in the SQL database I used.
This way of finding unique rows in a large amount of data could of course be used to handle the big amount of data we now days get to the DWHs. Not only you get a unique key you could also calculate duplets.
Hash recipe (Swedish one pyttipanna)
Cut six medium size potatoes into 1 cm cubes, put in a hot frying pan with chopped onions. Cut ham into 1 cm cubes and put them in the same pan. Fry for 10-20 min. Serve with fried eggs and some sliced beetroots.
Hash C# recipe
This is a simple C# code that calculates a hash value from a string (the InString).
using System;
using System.Security.Cryptography;
using System.Text;
namespace HashFunction
{
class Program
{
static void Main(string[] args)
{
string InString = "Add a value to calculate hash here";
Console.WriteLine(InString);
Console.WriteLine(HashValue(InString));
Console.ReadLine();
}
/// <summary>
/// HashValue calculates InString to the Byte array b and
/// then via the StringBuilder function appends the value
/// to string hash that is returned
/// </summary>
/// <param name="InString">Any string value</param>
/// <returns>MD5 Hash value as a string</returns>
static string HashValue(string InString)
{
MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider();
StringBuilder hash = new StringBuilder();
byte[] b = md5.ComputeHash(new UTF8Encoding().GetBytes(InString));
for (int i = 0; i < b.Length; i++)
{
hash.Append(b[i].ToString("x2"));
}
return hash.ToString().ToUpper();
}
}
}
Result from this code is a 32 character long string value based on the string
"Add a value to calculate hash here"
hash: [D0461BE2210183CCBD3AFEDCF3B2E2D9]
SQL Server Hash recipe
Uses the [HASHBYTES] function in SQL Server and then convert the result via [fn_varbintohexstr] function found in the master DB
DECLARE @Instring varchar(max) = 'Add a value to calculate hash here',
@outString char(32)
/*select md5 hash of @Instring and return bytes*/
select HASHBYTES('md5', ISNULL(@Instring,'')) 'Binary value'
/*Use internal function convert varbinary to string so it could be stored as a key*/
SET @outString = (select SUBSTRING(UPPER(master.dbo.fn_varbintohexstr(HASHBYTES('md5', ISNULL(@Instring,'')))),3,32))
SELECT @outString 'Char32 value'
Result from this code is a 32 character long string value based on the string
"Add a value to calculate hash here"
hash: [D0461BE2210183CCBD3AFEDCF3B2E2D9]