DW considerations BIconsiderations
Data Warehouse
What is a Data Warehouse and what is it good for?
According to the Cambridge Dictionary the definition is:
A large amount of information stored on one computer, or on a number of computers in the same place
This data could be used for reporting, analyzing data to see the future or the current state. Who do not want to have a system to adjust your figures against as well as look into the future to make better decisions?
Now we also have the possibility to use more comprehensive and faster methods to scan thru large amount of data non, semi and fully structure data.
Non structured data is data that has a lack of meta data, keys and other data that help us organize it. Like pictures, documents, e-mails or text from Facebook etc.
Semi structured data is data that has keys, datatypes and rules like data from a database driven legacy system.
Fully structured data is data prepared for business analysts or business consumption. Like Kimball star schemas or Data Vault.
What components do we have in a Data Warehouse?
Ingestion
First, we need to ingest the data from the legacy system we have. This could be done to a Staging area or a data lake where we keep our raw data. In my last assignment we where working in Azure Synapse and therefore use a data lake. This could be done using semi-structured files or non-structured files.
In a cloud-based lake its often rather cost effective to store large amount of data therefor it could be a start to build a Lakehouse. As Databricks writes in their blog this is a new paradigm in Data Warehouse technologies. It might be a buzz word, right now but I can see its potential. The idea is to use Machine learning to do fishing in the lake and do analyzing that way.
Structured Data Warehouse
To make it faster to develop and simplify analyzing its good practice to develop a well-formed Data Warehouse. There are majorly three different ways to do this Inmon, Kimball or Data Vault.
Structured data organizes into business object in the Warehouse. Usually this was done via an ETL (Extract Transform Load) flow more and more the business goes over to ELT (Extract Load Transform) depending on strategy and how much data they have.
To make the data even more accessible for analyzing, Data Marts is often used. If a Warehouse is built in a good manor is should not take a long time to build a specialized and easy to understand mart to the analysts.
The Mart is constructed to fit well with analysing tools. The standard used is in most cases Kimball Star schemas.
Business Intelligence or BI is the analysing methods and tools consuming the Datawarehouse. Sometimes you want to calculate things and compare data in a different way after the Mart. Then you could use a Cube.
Data Lakehouse is a new way of using lake data with AI or Machine Learning tools such as Databricks