Airplane_silhouette A quick exercise on graph modeling. I recently got inspired from one of Nicole White’s post who did something similar with the Dallas Fort Worth Airport (DFW) data. So I decided to create a Neo4j graph database of the shopping places (boutiques, restaurants and services) at the Montreal-Pierre Elliott Trudeau International Airport (YUL). There isn’t any raw data to download on the site. So I will spare you the details of my scrapping efforts and instead share with you my findings and results.

 

 

You can view and download the data here from Google Drive. The code, sample queries and database backup can be found on Github.

Here is a sample from the Data sheet:
yul_data_sheet

Essentially from the raw data collected I was able to create individual sheets for each ‘entities’ and for ‘relationships’, just like tables you would model in your favourite RDBMS and from that exported them to individual CSV files ready for import;

  • area.csv
  • categories.csv
  • gatezones.csv
  • levels.csv
  • area.csv
  • placecategorygatezone.csv
  • places.csv

But first let take a look at how we will approach and model our graph, FYI at YUL airport there is no ‘Terminals’ but rather ‘Levels’…

Case in Point: A Place ‘Tim Hortons’ belongs in a Category ‘Cafés’ and that Place is located at a Gate ‘Gate 3′ in the following Area ‘Restricted area – Canada’ on a certain Level ‘First Floor/Departures’. Pretty straight forward and from that perspective we can create our model.

YUL Graph Model:
yul_propertygraph

Importing the Data using Cypher

With the extracted CSV files will be using Cypher’s LOAD CSV command to transform and load the content into a graph structure.
You can import and run the entire Cypher script using the neo4j-shell by issuing the following command:
bin/neo4j-shell -path yul.db -file cypher/import.cypher

import.cypher

// Create indexes for faster lookup
CREATE INDEX ON :Category(categoryName);
CREATE INDEX ON :GateZone(gateZoneName);
CREATE INDEX ON :Area(areaName);
CREATE INDEX ON :Level(levelName);

// Create constraints
CREATE CONSTRAINT ON (a:Area) ASSERT a.areaId IS UNIQUE;
CREATE CONSTRAINT ON (c:Category) ASSERT c.categoryId IS UNIQUE;
CREATE CONSTRAINT ON (g:GateZone) ASSERT g.gateZoneId IS UNIQUE;
CREATE CONSTRAINT ON (l:Level) ASSERT l.levelId IS UNIQUE;
CREATE CONSTRAINT ON (p:Place) ASSERT p.placeId IS UNIQUE;


// Create places
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:data/places.csv" as row
CREATE (:Place {placeName: row.PlaceName, placeId: row.PlaceID});

// Create categories
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:data/categories.csv" as row
CREATE (:Category {categoryName: row.CategoryName, categoryId: row.CategoryID});

// Create gatezones
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:data/gatezones.csv" as row
CREATE (:GateZone {gateZoneName: row.GateZoneName, gateZoneId: row.GateZoneID});

// Create levels
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:data/levels.csv" as row
CREATE (:Level {levelName: row.LevelName, levelId: row.LevelID});

// Create areas
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:data/areas.csv" as row
CREATE (:Area {areaName: row.AreaName, areaId: row.AreaID});


// Create relationships: Place to Category
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:data/placecategorygatezone.csv" AS row
MATCH (place:Place {placeId: row.PlaceID})
MATCH (category:Category {categoryId: row.CategoryID})
MERGE (place)-[:IN_CATEGORY]->(category);

// Create relationships: Place to GateZone
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:data/placecategorygatezone.csv" AS row
MATCH (place:Place {placeId: row.PlaceID})
MATCH (gatezone:GateZone {gateZoneId: row.GateZoneID})
MERGE (place)-[:AT_GATEZONE]->(gatezone);

// Create relationships: GateZone to Area
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:data/gatezones.csv" AS row
MATCH (gatezone:GateZone {gateZoneId: row.GateZoneID})
MATCH (area:Area {areaId: row.AreaID})
MERGE (gatezone)-[:IN_AREA]->(area);

// Create relationships: Area to Level
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:data/areas.csv" AS row
MATCH (area:Area {areaId: row.AreaID})
MATCH (level:Level {levelId: row.LevelID})
MERGE (area)-[:AT_LEVEL]->(level);

Start Neo4j and load the web console (http://localhost:7474), we are now ready to issue some Cypher queries…

Find all places and their categories located at Gate 57 (with Area and Level)

MATCH (c:Category)< --(p:Place)-->(g:GateZone)-->(a:Area)-->(l:Level)
WHERE g.gateZoneName = 'Gate 57'
RETURN c, p, g, a, l;

gate57

Find Boutiques in Restricted area – International on First Floor/Departures

MATCH (p:Place)-[:IN_CATEGORY]->(c:Category),
      (p)-[:AT_GATEZONE]->(g:GateZone),
      (g)-[:IN_AREA]->(a:Area),
      (a)-[:AT_LEVEL]->(l:Level)
WHERE c.categoryName = 'Boutiques' AND a.areaName = 'Restricted area - International' AND l.levelName = 'First Floor/Departures'
RETURN p.placeName AS Name, c.categoryName AS Category, g.gateZoneName AS Gate, a.areaName AS Area, l.levelName AS Level

yul_resultquery02

Return count of places in each Category

MATCH (p:Place)-[:IN_CATEGORY]-(c:Category)
RETURN c.categoryName AS Category, collect(distinct p.placeName) as Place, count(p) AS Count
ORDER BY Count DESC

yul_result03

Take time to explore the data and try to write your own queries!

Don’t forget the available online training from Neo4j: Getting Started with Neo4j and recently added Neo4j in Production http://neo4j.com/graphacademy/online-training/

This past May 4 – 8 in Chicago was the Microsoft Ignite 2015 Conference. I unfortunately was not able to attend but was certainly looking forward to all the session material being available and posted on Channel 9.

Some enhancements and new features coming SQL Server 2016 were announced:
– Stretch Database
– Native JSON support
– PolyBase
– Enhancements to Master Data Services and SSIS
– Always Encrypted
– Operational Analytics
– In database analytics with R-Integration
– Azure SQL Data Warehouse

I strongly recommend you keep updated by reading the following Data Platform Blogs and watch the Day 1 Microsoft Ignite Keynote

SQL Server 2016 public preview is coming this summer. To not be left out and stay informed please sign up here and don’t forget to also signup for Power BI Preview

So in no particular order I have quickly put together a list of sessions that are of interest to me and that will consume most of my upcoming evenings and weekends…


NoSQL on Microsoft Azure: An introduction
https://channel9.msdn.com/Events/Ignite/2015/BRK2555

Data Science with Microsoft Azure Machine Learning, Microsoft SQL Server and R
https://channel9.msdn.com/Events/Ignite/2015/BRK3550

What’s Next for Business Analytics in Microsoft Excel
https://channel9.msdn.com/Events/Ignite/2015/BRK2566

What’s Next for Visualizations in Microsoft Office
https://channel9.msdn.com/events/Ignite/2015/BRK3564

Microsoft BI Overview
https://channel9.msdn.com/Events/Ignite/2015/BRK2556

Overview of Advanced Analytics with Machine Learning and Data Orchestration
https://channel9.msdn.com/Events/Ignite/2015/BRK2569

Harnessing the Power of Hadoop: Cloud Scale with Microsoft Azure HDInsight
https://channel9.msdn.com/Events/Ignite/2015/BRK2557

In-Memory Technologies Overview for Microsoft SQL Server and Microsoft Azure
https://channel9.msdn.com/Events/Ignite/2015/BRK2563

Power BI Overview
https://channel9.msdn.com/Events/Ignite/2015/BRK2568

Microsoft Azure SQL Data Warehouse Overview
https://channel9.msdn.com/Events/Ignite/2015/BRK2571

Microsoft SQL Server BI Drill Down
https://channel9.msdn.com/Events/Ignite/2015/BRK2552

Microsoft Azure Machine Learning Marketplace: Real-World Experience to Solve Your Business Problems
https://channel9.msdn.com/Events/Ignite/2015/BRK2560

Microsoft Azure SQL Database: Overview and Customer Scenarios
https://channel9.msdn.com/Events/Ignite/2015/BRK2564

Next Steps in Polybase with Dr. David Dewitt
https://channel9.msdn.com/Events/Ignite/2015/BRK3565

SELECT Latest FROM Microsoft Azure DocumentDB
https://channel9.msdn.com/Events/Ignite/2015/BRK2551

Advanced Analytics: Navigating Your Way There
https://channel9.msdn.com/Events/Ignite/2015/BRK2567

Operational Analytics in SQL Server
https://channel9.msdn.com/Events/Ignite/2015/BRK4552

Big Data for the SQL Ninja
https://channel9.msdn.com/Events/Ignite/2015/BRK2550

Cloud-Based Machine Learning for the Developer
https://channel9.msdn.com/Events/Ignite/2015/BRK3560

Overview of Microsoft Analytics Platform System (APS)
https://channel9.msdn.com/Events/Ignite/2015/BRK1550

Build Hybrid Big Data Pipelines with Azure Data Factory and Azure HDInsight
https://channel9.msdn.com/Events/Ignite/2015/BRK3559

Microsoft Azure DocumentDB and Azure HDInsight: Better Together
https://channel9.msdn.com/Events/Ignite/2015/BRK3562

Real-Time Analytics at Scale for Internet of Things
https://channel9.msdn.com/Events/Ignite/2015/BRK3555

End-to-End Business Analytics Solution: Real-World Scenario & Demo
https://channel9.msdn.com/Events/Ignite/2015/BRK2559

Power BI for Developers
https://channel9.msdn.com/Events/Ignite/2015/BRK3551

Planning Your Big Data Architecture on Azure
https://channel9.msdn.com/Events/Ignite/2015/BRK2576

Optimizing Hadoop Using Microsoft Azure HDInsight
https://channel9.msdn.com/Events/Ignite/2015/BRK3556

Advanced Modeling and Calculations Using the Microsoft Power BI Designer
https://channel9.msdn.com/Events/Ignite/2015/BRK3573

Revolution Analytics Joins Microsoft: Strange Bedfellows or a New Power Tool for Advanced Analytics
https://channel9.msdn.com/Events/Ignite/2015/BRK3553

Azure SQL Data Warehouse: Deep Dive
https://channel9.msdn.com/Events/Ignite/2015/BRK3569

DNA of Data-Driven Organizations: Storytelling with R, Microsoft Azure Machine Learning and Power BI
https://channel9.msdn.com/Events/Ignite/2015/BRK2562

Scalable BI and Advanced Modeling with Microsoft SQL Server Analysis Services and Power Pivot
https://channel9.msdn.com/Events/Ignite/2015/BRK3561

Harness Predictive Customer Churn Models with Azure Machine Learning, Data Factory, and More
https://channel9.msdn.com/Events/Ignite/2015/BRK3570

What’s New in Master Data Services (MDS) and Integration Services (SSIS) in SQL Server
https://channel9.msdn.com/events/Ignite/2015/BRK2578

Microsoft SQL Server Unplugged
https://channel9.msdn.com/events/Ignite/2015/BRK1551

ColumnStore Index: Microsoft SQL Server 2014 and Beyond
https://channel9.msdn.com/events/Ignite/2015/BRK4556

Would you like to further explore and learn more about Neo4j… using the cloud! Provisioning a more powerful VM instance with multiple cores and additional memory, a sandbox development environment… Well this post “Creating a Neo4j Linux VM in Windows Azure” is just right for you.

Will be installing a Neo4j Community 2.0.1 distribution which runs on the Ubuntu 12.04 LTS kernel and available on the VM Depot

Follow these steps that outlines the basics on how to setup.

From the Windows Azure Management Portal select “VIRTUAL MACHINES” then select “IMAGES” and then “BROWSE VM DEPOT”

azureneo4j01

 
Find the Neo4j Community 2.0.1 community image under “UBUNTU”
Click the arrow to continue

azureneo4j04

 
Choose a storage account
Choose a region, create a new storage account and provide it a name (for example: neo4jlinuxvm) then click the check to continue.

azureneo4j05

 
The VM image will be copied to the new storage account. Once it is finished, proceed to the next step.
When the status of the VM image will show “Pending registration” click “Register”

azureneo4j06

azureneo4j07

 
Register and provide a name for your new VM image
Click the check to continue.

azureneo4j08

 
Create a new Virtual Machine (click the +) and select “FROM GALLERY”

azureneo4j09

 
Now will create a new VM from an image
Select “MY IMAGES” and the image you created earlier (in this case neo4j-linux-vm) and then click on the arrow to continue

azureneo4j10

 
Configuring your VM
Provide VM name, size it appropriately, create a user name and password (user name and password that will be used when you connect via SSH to Ubuntu) and then click on the arrow to continue

azureneo4j11

 
Create a new cloud service and provide name, and most importantly enter a new endpoint named Neo4j with value 7474 for public and private port and then click on the arrow to continue

azureneo4j12

 
Your new VM is being provisioned! This can take a couple of minutes…

azureneo4j13

 
When completed go to VM dashboard to see all relevant information.
In this case we need the DNS Name neo4jlinuxvm.cloudapp.net

azureneo4j14

 
Open up you favourite browser and type the following neo4jlinuxvm.cloudapp.net:7474
http://[DNSName]:7474

azureneo4j15

 
That’s it your done! Now start graphing

 
To connect and manage your VM through SSH you will need a tool like PuTTY if on Windows or use Terminal if on Mac OS X. In future posts I will go through how to upgrade to latest version of Neo4j, modify Neo4j configuration settings and upload CSV and graphdb backups.

Graphs are everywhere! In my efforts to spread the word and inform SQL Server Professionals about Neo4j and learn how to take a relational database schema and model it as a graph… AdventureWorks was a no-brainer! Let’s explore how to import a subset of the AdventureWorks database from SQL Server (RDBMS) into Neo4j (GraphDB).

But first some prerequisites. You should at least have a basic understanding of what is a property graph model and have completed the following modeling guidelines. Download and install Neo4j to be able to follow along with the examples.

The AdventureWorks database is a fictitious company database that has existed since SQL Server 2005 as a means to show new functionality in each new version released. AdventureWorks has become a vital aspect of learning new features/constructs within SQL Server.

AdventureWorks2008

For our demonstration I have decided to choose a subset of tables and select only offline sales transactions (OnlineOrderFlag = 0).
As a result the following CSV files will be used to import data into our graph structure:

  • customers.csv
  • employees.csv
  • orders.csv
  • productcategories.csv
  • products.csv
  • producsubcategories.csv
  • vendorproduct.csv
  • vendors.csv
  • You can find all the necessary code and files here -> https://github.com/sfrechette/adventureworks-neo4j

    If you still need to download a copy of the AdventureWorks sample database (I used the SQL Server 2014 version) you can find them here -> https://msftdbprodsamples.codeplex.com/

    Developing a Graph Model
    When deriving and defining a graph model from a relational model, you should keep the following base guidelines in mind:

  • A row is a node
  • A table name is a label name
  • For further understanding read the following GraphDB vs. RDBMS. This article explores the differences between relational and graph databases and data models. In addition, it explains how to integrate graph databases with relational databases and how to import data from a relational store.

    With this dataset, this simple AdventureWorks graph data model serves as a foundation:

    graphmodel_adventureworks

    The key difference between a graph and relational database is that relational databases work with sets while graph databases work with paths and relationships are first-class entities in a graph database and are semantically far stronger than those implied relationships reified at runtime in a relational store.

    Importing the Data using Cypher

    Now that we have extracted data from the AdventureWorks database, will be using Cypher’s LOAD CSV command to transform and load the content of these CSV files into a graph structure.

    First we create specific indexes and constraints

    // Create indexes for faster lookup
    CREATE INDEX ON :Category(categoryName);
    CREATE INDEX ON :SubCategory(subCategoryName);
    CREATE INDEX ON :Vendor(vendorName);
    CREATE INDEX ON :Product(productName);
    
    // Create constraints
    CREATE CONSTRAINT ON (o:Order) ASSERT o.orderId IS UNIQUE;
    CREATE CONSTRAINT ON (p:Product) ASSERT p.productId IS UNIQUE;
    CREATE CONSTRAINT ON (c:Category) ASSERT c.categoryId IS UNIQUE;
    CREATE CONSTRAINT ON (s:SubCategory) ASSERT s.subCategoryId IS UNIQUE;
    CREATE CONSTRAINT ON (e:Employee) ASSERT e.employeeId IS UNIQUE;
    CREATE CONSTRAINT ON (v:Vendor) ASSERT v.vendorId IS UNIQUE;
    CREATE CONSTRAINT ON (c:Customer) ASSERT c.customerId IS UNIQUE;
    

    Schema await – now create nodes:

    // Create products
    USING PERIODIC COMMIT
    LOAD CSV WITH HEADERS FROM "file:data/products.csv" as row
    CREATE (:Product {productName: row.ProductName, productNumber: row.ProductNumber, productId: row.ProductID, modelName: row.ProductModelName, standardCost: row.StandardCost, listPrice: row.ListPrice});
    
    // Create vendors
    USING PERIODIC COMMIT
    LOAD CSV WITH HEADERS FROM "file:data/vendors.csv" as row
    CREATE (:Vendor {vendorName: row.VendorName, vendorNumber: row.AccountNumber, vendorId: row.VendorID, creditRating: row.CreditRating, activeFlag: row.ActiveFlag});
    
    // Create employees
    USING PERIODIC COMMIT
    LOAD CSV WITH HEADERS FROM "file:data/employees.csv" as row
    CREATE (:Employee {firstName: row.FirstName, lastName: row.LastName, fullName: row.FullName, employeeId: row.EmployeeID, jobTitle: row.JobTitle, organizationLevel: row.OrganizationLevel, maritalStatus: row.MaritalStatus, gender: row.Gender, territoty: row.Territory, country: row.Country, group: row.Group});
    
    // Create customers
    USING PERIODIC COMMIT
    LOAD CSV WITH HEADERS FROM "file:data/customers.csv" as row
    CREATE (:Customer {firstName: row.FirstName, lastName: row.LastName, fullName: row.FullName, customerId: row.CustomerID});
    
    // Create categories
    USING PERIODIC COMMIT
    LOAD CSV WITH HEADERS FROM "file:data/productcategories.csv" as row
    CREATE (:Category {categoryName: row.CategoryName, categoryId: row.CategoryID});
    
    // Create sub-categories
    USING PERIODIC COMMIT
    LOAD CSV WITH HEADERS FROM "file:data/productsubcategories.csv" as row
    CREATE (:SubCategory {subCategoryName: row.SubCategoryName, subCategoryId: row.SubCategoryID});
    
    // Prepare orders
    USING PERIODIC COMMIT
    LOAD CSV WITH HEADERS FROM "file:data/orders.csv" AS row
    MERGE (order:Order {orderId: row.SalesOrderID}) ON CREATE SET order.orderDate =  row.OrderDate;
    

    Create relationships

    // Create relationships: Order to Product
    USING PERIODIC COMMIT
    LOAD CSV WITH HEADERS FROM "file:data/orders.csv" AS row
    MATCH (order:Order {orderId: row.SalesOrderID})
    MATCH (product:Product {productId: row.ProductID})
    MERGE (order)-[pu:PRODUCT]->(product)
    ON CREATE SET pu.unitPrice = toFloat(row.UnitPrice), pu.quantity = toFloat(row.OrderQty);
    
    // Create relationships: Order to Employee
    USING PERIODIC COMMIT
    LOAD CSV WITH HEADERS FROM "file:data/orders.csv" AS row
    MATCH (order:Order {orderId: row.SalesOrderID})
    MATCH (employee:Employee {employeeId: row.EmployeeID})
    MERGE (employee)-[:SOLD]->(order);
    
    // Create relationships: Order to Customer
    USING PERIODIC COMMIT
    LOAD CSV WITH HEADERS FROM "file:data/orders.csv" AS row
    MATCH (order:Order {orderId: row.SalesOrderID})
    MATCH (customer:Customer {customerId: row.CustomerID})
    MERGE (customer)-[:PURCHASED]->(order);
    
    // Create relationships: Product to Vendor
    USING PERIODIC COMMIT
    LOAD CSV WITH HEADERS FROM "file:data/vendorproduct.csv" AS row
    MATCH (product:Product {productId: row.ProductID})
    MATCH (vendor:Vendor {vendorId: row.VendorID})
    MERGE (vendor)-[:SUPPLIES]->(product);
    
    // Create relationships: Product to SubCategory
    USING PERIODIC COMMIT
    LOAD CSV WITH HEADERS FROM "file:data/products.csv" AS row
    MATCH (product:Product {productId: row.ProductID})
    MATCH (subcategory:SubCategory {subCategoryId: row.SubCategoryID})
    MERGE (product)-[:PART_OF_SUBCAT]->(subcategory);
    
    // Create relationships: SubCategory to Category
    USING PERIODIC COMMIT
    LOAD CSV WITH HEADERS FROM "file:data/productsubcategories.csv" AS row
    MATCH (subcategory:SubCategory {subCategoryId: row.SubCategoryID})
    MATCH (category:Category {categoryId: row.CategoryID})
    MERGE (subcategory)-[:PART_OF_CAT]->(category);
    
    // Create relationship for employee reporting structure
    USING PERIODIC COMMIT
    LOAD CSV WITH HEADERS FROM "file:data/employees.csv" AS row
    MATCH (employee:Employee {employeeId: row.EmployeeID})
    MATCH (manager:Employee {employeeId: row.ManagerID})
    MERGE (employee)-[:REPORTS_TO]->(manager);
    

    You can also import and run the entire Cypher script using the neo4j-shell: bin/neo4j-shell -path adventure.db -file cypher/import.cypher

    Once completed you will be able query and do stuff like this:

    cypher_subcat

    Querying the Graph with Cypher

    Let’s issue some cypher queries to our newly created graph database:

    // List the product subcategories and categories provided by each supplier.
    MATCH (v:Vendor)-->(:Product)-->(s:SubCategory)-->(c:Category)
    RETURN v.vendorName as Vendor, collect(distinct s.subCategoryName) as SubCategories, collect(distinct c.categoryName) as Categories;
    

    cypher_aw01

    // Which employee had the highest cross-selling count of 'AWC Logo Cap' and which product?
    MATCH (p:Product {productName:'AWC Logo Cap'})< -[:PRODUCT]-(:Order)<-[:SOLD]-(employee),
          (employee)-[:SOLD]->(o2)-[:PRODUCT]->(other:Product)
    RETURN employee.fullName as Employee, other.productName as Product, count(distinct o2) as Count
    ORDER BY Count DESC
    LIMIT 5;
    

    cypher_aw02

    // What is the total quantity sold of a particular product?
    MATCH (o:Order)-[r:PRODUCT]->(p:Product {productName: "Long-Sleeve Logo Jersey, L"})
    RETURN sum(r.quantity) as TotalQuantitySold;
    

    cypher_aw03

    Explore, learn Cypher and write your own queries
    Enjoy!