Graphs are everywhere! In my efforts to spread the word and inform SQL Server Professionals about Neo4j and learn how to take a relational database schema and model it as a graph… AdventureWorks was a no-brainer! Let’s explore how to import a subset of the AdventureWorks database from SQL Server (RDBMS) into Neo4j (GraphDB).

But first some prerequisites. You should at least have a basic understanding of what is a property graph model and have completed the following modeling guidelines. Download and install Neo4j to be able to follow along with the examples.

The AdventureWorks database is a fictitious company database that has existed since SQL Server 2005 as a means to show new functionality in each new version released. AdventureWorks has become a vital aspect of learning new features/constructs within SQL Server.


AdventureWorks2008

For our demonstration I have decided to choose a subset of tables and select only offline sales transactions (OnlineOrderFlag = 0).
As a result the following CSV files will be used to import data into our graph structure:

  • customers.csv
  • employees.csv
  • orders.csv
  • productcategories.csv
  • products.csv
  • producsubcategories.csv
  • vendorproduct.csv
  • vendors.csv
  • You can find all the necessary code and files here -> https://github.com/sfrechette/adventureworks-neo4j

    If you still need to download a copy of the AdventureWorks sample database (I used the SQL Server 2014 version) you can find them here -> https://msftdbprodsamples.codeplex.com/

    Developing a Graph Model
    When deriving and defining a graph model from a relational model, you should keep the following base guidelines in mind:

  • A row is a node
  • A table name is a label name
  • For further understanding read the following GraphDB vs. RDBMS. This article explores the differences between relational and graph databases and data models. In addition, it explains how to integrate graph databases with relational databases and how to import data from a relational store.

    With this dataset, this simple AdventureWorks graph data model serves as a foundation:

    graphmodel_adventureworks

    The key difference between a graph and relational database is that relational databases work with sets while graph databases work with paths and relationships are first-class entities in a graph database and are semantically far stronger than those implied relationships reified at runtime in a relational store.

    Importing the Data using Cypher

    Now that we have extracted data from the AdventureWorks database, will be using Cypher’s LOAD CSV command to transform and load the content of these CSV files into a graph structure.

    First we create specific indexes and constraints

    Schema await – now create nodes:

    Create relationships

    You can also import and run the entire Cypher script using the neo4j-shell: bin/neo4j-shell -path adventure.db -file cypher/import.cypher

    Once completed you will be able query and do stuff like this:

    cypher_subcat

    Querying the Graph with Cypher

    Let’s issue some cypher queries to our newly created graph database:

    cypher_aw01

    cypher_aw02

    cypher_aw03

    Explore, learn Cypher and write your own queries
    Enjoy!

    Tagged with →  
    • Michael Hunger

      Really great blog post, Stephane. Thanks so much for working on it.
      Would you also be able to share the Neo4j database as a zipped file?

    • Michael Hunger

      Testing with the Neo4j-Shell:

      Your first query runs in 7ms after warmup.

      for the second query, we have to help cypher a bit:

      MATCH (p:Product {productName:’AWC Logo Cap’})< -[:PRODUCT]-(:Order) WITH distinct employee
      MATCH (employee)-[:SOLD]->(o2)-[:PRODUCT]->(other:Product)
      RETURN employee.fullName as Employee, other.productName as Product, count(distinct o2) as Count
      ORDER BY Count DESC
      LIMIT 5;

      -> 183ms

      The last query takes 3 ms in the Neo4j-Shell after Warmup.

    • Pingback: From the Community: April 2015 - Neo4j Graph Database()

    • Omkar Poudel

      Really Nice work Stéphane.Thank you for your work in putting all things together.Really helped me setup and start working on neo4j.

      • stephanefrechette

        Thanks Omkar, glad you enjoyed!

    • Prashant Barve

      It is showing (no changes, no rows) when I am running these queries , my neo4j version is 3.0

    • Sachin Vairagi

      //select customer where customerId = 323
      MATCH (n:Customer {customerId : ‘323’}) RETURN n

      //select customer’s order where customerId = 323
      MATCH (n:Customer {customerId : ‘323’})-[r:PURCHASED]->(o:Order) RETURN n,o

      //select products purchased by customerId = 323
      MATCH (n:Customer {customerId : ‘323’})-[:PURCHASED]->()-[:PRODUCT]->(p:Product) RETURN p

      or

      MATCH (p:Product)(o2:Order)-[:PRODUCT]->(other:Product) RETURN other,o2

      //select products purchased by customerId = 323 and orderId = 50676
      MATCH (n:Customer {customerId : ‘323’})-[:PURCHASED]->(o2:Order {orderId : ‘50676’})-[:PRODUCT]->(other:Product) RETURN other,o2

      //select products where orderId = 50676
      MATCH (o:Order {orderId : ‘50676’})-[:PRODUCT]->(p:Product) RETURN p,o

      //select Category where productId = 721
      MATCH (p:Product {productId : ‘721’})-[:PART_OF_SUBCAT]->(sb:SubCategory) RETURN p,sb

      //select Products where subCategoryId = 22
      MATCH (sb:SubCategory {subCategoryId : ’22’})(sb:SubCategory) RETURN p,sb

      //select product and subcategory whose products purchased by customerId = 323
      MATCH (n:Customer {customerId : ‘323’})-[:PURCHASED]->()-[:PRODUCT]->(p:Product)-[:PART_OF_SUBCAT]->(sb:SubCategory) RETURN p,sb

      or
      //order by sell count
      MATCH (n:Customer {customerId : ‘323’})-[:PURCHASED]->()-[r:PRODUCT]->(p:Product)-[:PART_OF_SUBCAT]->(sb:SubCategory) RETURN p,sb,sum(r.quantity) as total ORDER BY total DESC

      // select product purchased by customerId = 323 where subcategoryId = 22
      MATCH (sb:SubCategory {subCategoryId : ’22’})<-[:PART_OF_SUBCAT]-(p:Product)<-[:PRODUCT]-(o:Order)-[:PURCHASED]-(c:Customer {customerId : '323'}) RETURN p

      or

      MATCH (p:Product)<-[:PRODUCT]-(o:Order)-[:PURCHASED]-(c:Customer {customerId : '323'})
      WHERE (:SubCategory {subCategoryId : '22'})<-[:PART_OF_SUBCAT]-(p) RETURN p

      // select product and their sell count purchased by customerId = 323 and subcategoryId = 22
      MATCH (sb:SubCategory {subCategoryId : '22'})<-[:PART_OF_SUBCAT]-(p:Product)(p:Product) RETURN p, sum(r.quantity) as TotalQuantitySold ORDER BY TotalQuantitySold DESC;

      //select total sell count of product where productId = 850
      MATCH (o:Order)-[r:PRODUCT]->(p:Product {productId : ‘850’}) RETURN p, sum(r.quantity) as TotalQuantitySold

      //select recommended product just from the same subcategory that the user has already purchased products from
      MATCH (p:Product),(sb:SubCategory {subCategoryId : ’22’})<-[:PART_OF_SUBCAT]-(p)
      WHERE NOT (sb)<-[:PART_OF_SUBCAT]-(p)<-[:PRODUCT]-(:Order)-[:PURCHASED]-(:Customer {customerId : '323'})
      RETURN p,sb LIMIT 5

      //select recommended product just from the same subcategory that the user has already purchased products from , order by total sell count
      MATCH (p:Product),(sb:SubCategory {subCategoryId : '22'})(p)
      WHERE NOT (sb)<-[:PART_OF_SUBCAT]-(p)<-[:PRODUCT]-(:Order)-[:PURCHASED]-(:Customer {customerId : '323'})
      RETURN p,sb,sum(r.quantity) as TotalQuantitySold ORDER by TotalQuantitySold DESC LIMIT 5

      //select recommended product just from the same subcategory that the user has already purchased products from , order by like count
      MATCH (p:Product),(sb:SubCategory {subCategoryId : '22'})<-[:PART_OF_SUBCAT]-(p)<-[:LIKE]-(:Customer)
      WHERE NOT (sb)<-[:PART_OF_SUBCAT]-(p)(p:Product) RETURN count(p.productId) as likes,p ORDER BY likes DESC

      • stephanefrechette

        Great stuff Sachin! thanks for sharing

        • Sachin Vairagi

          YW!+TY! Stephanefrechette