Happy New Year! It as been a while my friends… here goes 😉
This post assumes you are familiar or have already been introduced to Azure SQL Data Warehouse. If not I strongly encourage to read and follow-up on the SQL Data Warehouse Documentation to get you going.

Optionally if you would like to have a local on premise copy of the ContosoRetailDW database – A fictitious retail demo dataset used for presenting Microsoft Business Intelligence products. You can download here -> https://www.microsoft.com/en-ca/download/details.aspx?id=18279

As much as we love AdventureWorks, it is also worthwhile to explore and work with bigger data volumes, to develop and test new concepts! Hence the entire ContosoRetailDW database contains more than 34M rows of records. So I decided to make it happen…ContosoRetailDW on Azure SQL Data Warehouse.

Before we start the scripts (code) for this post are available here on GitHub -> https://github.com/sfrechette/contosoretaildw-azuresqldw

First thing let’s start by creating a blank SQL Data Warehouse database named ContosoRetailDW (Create a SQL Data Warehouse)

azure_sqldw

Next we need to create a slightly redesigned and modified ContosoRetailDW schema for the data to be imported and used in Azure SQL Data Warehouse. I have removed the following 3 columns from each tables; ETLLoadID, LoadDate and UpdateDate and in addition columns of data type geography and geometry which are not supported in SQL Data Warehouse.

After successfully creating the SQL Data Warehouse database connect with Visual Studio (Connect to SQL Data Warehouse with Visual Studio) to your newly created blank database and run the following dsql script to create the schema for the ContosoRetailDW database.

For detailed information on table design -> Table design in SQL Data Warehouse

ContosoRetailDW DDL schema script – ContosoRetailDW_ddl.dsql

Got Data!
We need to load some data! The GZip data files, 25 of them totalling approx. 413MB are located here on OneDrive -> http://1drv.ms/1J2vZ6Y. You will need to load these files to an Azure blob storage. James Serra has a great post to help you get started: Getting data into Azure Blob Storage. For the purpose of this exercise I created a blob storage container name ‘contosoretaildw’ with a ‘data’ folder and imported the GZip files using the AzCopy Command-Line Utility (Transfer data with the AzCopy Command-Line Utility)

blob_contoso

Will be loading data from blob storage to SQL Data Warehouse using Polybase (Load data with PolyBase in SQL Data Warehouse) The following script assumes we already have defined and created a schema for the ContosoRetailDW database. You could also load data using the CTAS (Create Table As Select) feature only if you have not yet defined and created the ContosoRetailDW schema. The script for this scenario PolybaseLoadCTAS.dsql is available on GitHub.

Loading data from Azure Blog Storage – PolybaseLoadInsert.dsql

To conclude we now need to update statistics on our ContosoRetailDW SQL Data Warehouse. Without proper statistics, you will not get the performance that SQL Data Warehouse is designed to provide. Tables and columns do not have statistics automatically generated by SQL Data Warehouse and so you need to create them yourself. Will be using a stored procedure named dbo.prc_sqldw_create_stats to create statistics on all columns in a database.(Manage statistics in SQL Data Warehouse)

To create statistics on all columns in the table with this procedure, simply call the procedure:
exec prc_sqldw_create_stats;

dbo.prc_sqldw_create_stats

We are done! Explore, write some queries, scale compute resources up and down, connect with PowerBI and create awesome data visualizations. Just remember to pause your SQL Data Warehouse when not using it.
Enjoy!

A short tale of… data vizes. It’s about Tableau, R, and Shiny. How to prepare/build an existing and well-known Tableau visualization with R and provide web analysis interactivity. Will be using the raw data from the sample workbook named “Regional” that comes with Tableau Desktop. The following image below is a snapshot of the visualization that we will try to reproduce with R. You can also view and interact with this visualization on my Tableau Public profile page -> http://tabsoft.co/22IOMjD

Tableau_SATScore

You can download a copy of the dataset in CSV format available here on Dropbox

Let’s start by recreating the same visualization (static) with R. Open RStudio, copy and paste the following R code and run it.

The output will produce the following, a nice simple static plot… of course with no interactivity!
Rplot01

Then comes Shiny by RStudio which is a web application framework for R that turns your analyses into interactive web applications. Check out the Getting Started Guide for all the details.

Shiny apps have two important components:

  • a user-interface script (ui.R) which controls the layout and appearance of your app
  • a server script (server.R) which contains the instructions needed to build the app
  • You can actually run the app locally in RStudio by copying the following 2 R scripts

    I have published this app on Shiny for you to explore and interact with https://sfrechette.shinyapps.io/college_admissions

    shinyapps.io-college_admissions

    So from Tableau to R and Shiny we have *almost the same data visualization and interactivity
    *I am using the ggvis package and it seems impossible to add a horizontal line to a plot for the average score… feature apparently not available yet!

    Back to the future – Temporal Tables in SQL Server 2016 a small presentation given at the Ottawa SQL Server User Group (Ottawa PASS Chapter) on Thursday September 17, 2015.

    SQL Server 2016 CTP2 introduced support for temporal tables as a database feature that provides built-in support for provide information about data stored in the table at any point in time rather than only the data that is correct at the current moment in time.

    Topics covered: What is a Temporal Table?, Why Temporal? How does this work?, When to use (use cases) and demos…

    azurecloud
    Yes another exploration and exercise on graph modeling. This time about something I often use… the Cloud: Microsoft Azure. Thought it would be interesting to model the available Azure Cloud Service offerings by category, region, and location. The inspiration and data comes from the following source of information on the following web page http://azure.microsoft.com/en-us/regions/#services. A trivial one, but useful for demonstrating and understanding concepts. This is also a great example of how to store master data and discover new insights in existing data.

    The Cypher, CSV files, and database backup can be found here -> https://github.com/sfrechette/azureservices-neo4j

    You can also get a copy of the raw data in Excel here on OneDrive

    How to approach and model our graph

    Let’s visualize it! A Region ‘East US’ is in a Location ‘Virginia’, that Region offers a Service ‘Stream Analytics’ and that Service belongs in a Category named ‘Analytics’. Also a Service could be a parent or child of another Service. From that analysis we can create the following model.

    Azure Cloud Services Graph Model:
    graph_azure

    Graph model built using Arrows http://www.apcjones.com/arrows/

    Importing the Data using Cypher

    Using the CSV files will be using Cypher’s LOAD CSV command to transform and load the content into a graph structure.
    You can import and run the entire Cypher script using the neo4j-shell by issuing the following command:
    bin/neo4j-shell -path azureservices.db -file cypher/import.cql
    *make sure the Neo4j service is not running before you execute the script (bin/neo4j stop)

    The following is the Cypher script that creates the indexes, constraints, nodes and relationships for our graph.

    Once the script successfully executed, start Neo4j and load the web console (http://localhost:7474)

    Some cypher queries

    azure_europe

    azure_services

    azure_ml_cypher

    New to Neo4j and graph databases, follow this link to get you started http://neo4j.com/developer/get-started/