Happy New Year! It as been a while my friends… here goes 😉

ContosoRetailDW on Azure SQL Data Warehouse

This post assumes you are familiar or have already been introduced to Azure SQL Data Warehouse. If not I strongly encourage to read and follow-up on the SQL Data Warehouse Documentation to get you going.

Optionally if you would like to have a local on premise copy of the ContosoRetailDW database – A fictitious retail demo dataset used for presenting Microsoft Business Intelligence products. You can download here -> https://www.microsoft.com/en-ca/download/details.aspx?id=18279

As much as we love AdventureWorks, it is also worthwhile to explore and work with bigger data volumes, to develop and test new concepts! Hence the entire ContosoRetailDW database contains more than 34M rows of records. So I decided to make it happen…ContosoRetailDW on Azure SQL Data Warehouse.

Before we start the scripts (code) for this post are available here on GitHub -> https://github.com/sfrechette/contosoretaildw-azuresqldw

First thing let’s start by creating a blank SQL Data Warehouse database named ContosoRetailDW (Create a SQL Data Warehouse)

azure_sqldw


Next we need to create a slightly redesigned and modified ContosoRetailDW schema for the data to be imported and used in Azure SQL Data Warehouse. I have removed the following 3 columns from each tables; ETLLoadID, LoadDate and UpdateDate and in addition columns of data type geography and geometry which are not supported in SQL Data Warehouse.

After successfully creating the SQL Data Warehouse database connect with Visual Studio (Connect to SQL Data Warehouse with Visual Studio) to your newly created blank database and run the following dsql script to create the schema for the ContosoRetailDW database.

For detailed information on table design -> Table design in SQL Data Warehouse

ContosoRetailDW DDL schema script – ContosoRetailDW_ddl.dsql

Got Data!
We need to load some data! The GZip data files, 25 of them totalling approx. 413MB are located here on OneDrive -> https://1drv.ms/f/s!Al_yetnJltJSmUP0M0MBPTUSNOqa. You will need to load these files to an Azure blob storage. James Serra has a great post to help you get started: Getting data into Azure Blob Storage. For the purpose of this exercise I created a blob storage container name ‘contosoretaildw’ with a ‘data’ folder and imported the GZip files using the AzCopy Command-Line Utility (Transfer data with the AzCopy Command-Line Utility)

blob_contoso

Will be loading data from blob storage to SQL Data Warehouse using Polybase (Load data with PolyBase in SQL Data Warehouse) The following script assumes we already have defined and created a schema for the ContosoRetailDW database. You could also load data using the CTAS (Create Table As Select) feature only if you have not yet defined and created the ContosoRetailDW schema. The script for this scenario PolybaseLoadCTAS.dsql is available on GitHub.

Loading data from Azure Blog Storage – PolybaseLoadInsert.dsql

To conclude we now need to update statistics on our ContosoRetailDW SQL Data Warehouse. Without proper statistics, you will not get the performance that SQL Data Warehouse is designed to provide. Tables and columns do not have statistics automatically generated by SQL Data Warehouse and so you need to create them yourself. Will be using a stored procedure named dbo.prc_sqldw_create_stats to create statistics on all columns in a database.(Manage statistics in SQL Data Warehouse)

To create statistics on all columns in the table with this procedure, simply call the procedure:
exec prc_sqldw_create_stats;

dbo.prc_sqldw_create_stats

We are done! Explore, write some queries, scale compute resources up and down, connect with PowerBI and create awesome data visualizations. Just remember to pause your SQL Data Warehouse when not using it.
Enjoy!