Consider starting a new Data Warehouse project
You must consider a lot of things before you start a new DW project. The first and probably most important question is the purpose of the Warehouse.
​
-
Is the goal to only show operational report or are you going more advanced analytics?
-
Are you aloud to store data in the cloud?
-
Do you have plan to use big data ingestion?
-
Are you planning to use AI or Machine Learning towards the data stored in the warehouse?
Is it a small BI only for operational reports to check of the data in a legacy system? If the legacy system has their own history, you should check this out first. It could save both money and time to take this approach. Some transformations can also damage the data quality.
​
On other hand building a data warehouse could give you the benefit of combining data from more than one system. Any way a good idea is to inventory the company to gather business requirements and data realities before deciding methods, tools or platform.
​
For an example if the organisation requires agil methods it could be good to select a strategy that is modular. My experience is that if you chose a Kimball star schema approach this could be hard. It could be better to go with Data Vault that is much more modular and changeable. Or convince them to work with waterfall project methods instead.
​
I have taken over some projects where another consultancy firm has been that was specialized in a method or a tool. The result of this could be rely bad. So, a good idea is to check some basic requirement before hiring a consultant.
​
Platform considerations is also tricky and should also be based on the long-term requirements. An on-premise solution could be expensive compered to cloud. But gives you control over where your data is stored. Many organisations have requirements that do not allow data outside the country and that's also reason for on-premise solutions.
On other hand if you decide to go for a cloud environment there are many choices. I have experience of two Snowflake and Azure Synapse Analytics. I can see benefits and drawbacks with both. They are both scalable in both speed and growth. It could also be a more cost-effective way to keep a warehouse in the cloud. I will go deeper into Snowflake and Azure Synapse Analytics in a separate blog post.
​
If you consider AI or machine learning to create your analysis you do not necessary need structured data (Kimball star schema or Data Vault) you could use semi structured data in a Data Lake. Both Snowflake and Azure Synapse Analytics provides good solutions for that. A semi structured data storage is a file area like a blob storage where you store well-structured files. The files could be formatted as XML, JSON, Parquet, Avro, ORC etc. I will go deeper into that in a separate blog post. And you could build yourself a Lakehouse.