When I am on the hunt and scouting for some data, either to prepare and use for a demo or to help me learn new technologies these are the following resources, listing of public datasets I consult;

[Some contain redundant links, if you have others that you use please share and I will gladly add them to the list]

Enjoy!

  • datalibre.ca – Open Data (contains list of Canada – Open Data Cities, Provinces, Feds + International)
    http://datalibre.ca/links-resources/
    1. KDnuggets – Data: Government, State, City, Local and Public
      http://www.kdnuggets.com/datasets/government-local-public.html
    2. KDnuggets – Datasets for Data Mining and Data Science
      http://www.kdnuggets.com/datasets/index.html
    3. Kevin Chai’s Dataset list
      http://kevinchai.net/datasets
    4. Microsoft Azure Marketplace
      https://datamarket.azure.com/
    5. Yahoo! GeoPlanet Data
      https://developer.yahoo.com/geo/geoplanet/data/
    6. OSDC – Public Data Sets
      https://www.opensciencedatacloud.org/publicdata/
    7. figshare – store, share, discover research
      http://figshare.com/
    8. Quandl – Find and Use Data. Easily.
      https://www.quandl.com/
    9. Enigma – Navigate the world of public data
      http://enigma.io/
    10. Datahub – The easy way to get, use and share data
      http://datahub.io/
    11. Linked Data – Connect Distributed Data accross the Web
      http://linkeddata.org/
    12. IAPR – Public datasets for machine learning
      http://homepages.inf.ed.ac.uk/rbf/IAPR/researchers/MLPAGES/mldat.htm
    13. CRAWDAD – A Community Resource for Archiving Wireless Data At Dartmouth
      http://crawdad.org/
    14. The R Datasets Package
      https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html
    15. R Datasets
      https://vincentarelbundock.github.io/Rdatasets/datasets.html
    16. inside-R – Finding Data on the Internet
      http://www.inside-r.org/howto/finding-data-internet
    17. Gephi sample datasets
      https://wiki.gephi.org/index.php/Datasets
    18. Stanford Large Network Dataset Collection
      http://snap.stanford.edu/data/
    19. Tableau Public – Sample Data Sets
      http://www.tableausoftware.com/public/community/sample-data-sets
    20. Tableau Public – Viz of the Day
      http://www.tableausoftware.com/public/community/viz-of-the-day

      My Meetup presentation at the Ottawa SQL Server User Group(Ottawa PASS Chapter) – Graph Databases for SQL Server Professionals

      Graph databases are used to represent graph structures with nodes, edges and properties. Neo4j, an open-source graph database is reliable and fast for managing and querying highly connected data. Will explore how to install and configure, create nodes and relationships, query with the Cypher Query Language, importing data and using Neo4j in concert with SQL Server… Providing answers and insight with visual diagrams about connected data that you have in your SQL Server Databases!

      You are new or quite interested in exploring Graph Databases and would like to know how to start modeling and importing data in Neo4j?

      There are several ways that could get you started; the neo4j-shell-tools, Load CSV and Batch Import and many more. I strongly recommend reading and exploring the following resources, but to quickly get you started modeling and easily import a small subset of data (<1000 nodes/relationships) the spreadsheet approach is well suited for this.

      Let’s take a look at Graph modeling and importing the spreadsheet way
      You can find the sample Excel spreadsheet here and also a Google Sheet here

      The sheet (worksheet) is composed of two parts:
      For Nodes -> Columns A, B and C contain the data for the graph, using a “Node”, a “Name”, and a “Label”
      For Relationships -> Columns F, G, H, I and J contain the data for the graph, having a “From” (where the relationship starts), a “To” (where the relationship ends), and a “Relationship Type”. Columns G and J reference the nodes names for the “From” and “To” columns.

      So how to create the required Cypher statements from these nodes and relationships? With simple formulas using the columns mentioned above, we can generate the proper Cypher syntax.

      A closer look a the formula for generating the Cypher statement for generating nodes:
      =”create (n”&A2&”:”&C2&” {id:”&A2&”, name:'”&SUBSTITUTE(B2, “‘”, “\'”)&”‘})”
      outputs
      create (n1:Character {id:1, name:’Cleopatra’})

      Formula for generating the Cypher statement for generating relationships:
      =”create n”&F2&”-[:"&H2&"]->n”&I2
      outputs
      create n1-[:FELLOW_COMPAGNON]->n2

      Now the easy part is to simply highlight and copy both columns D and K into your Neo4j browser and execute to generate your graph model.

      neo4j_xl01

      neo4j_xl02

      The complete generated Cypher statement from column D and K, which you can copy and execute in your Neo4j browser

      create (n1:Character {id:1, name:'Cleopatra'})
      create (n2:Character {id:2, name:'Cesarion (Ptolemy XVI)'})
      create (n3:Character {id:3, name:'Julius Caesar'})
      create (n4:Character {id:4, name:'Edifis'})
      create (n5:Character {id:5, name:'Caius Fatuous'})
      create (n6:Character {id:6, name:'Odius Asparagus'})
      create (n7:Character {id:7, name:'Brutus'})
      create (n8:Character {id:8, name:'Cacofonix'})
      create (n9:Character {id:9, name:'Insalubrius'})
      create (n10:Character {id:10, name:'Ekonomikrisis'})
      create (n11:Character {id:11, name:'Porpus'})
      create (n12:CharacterType {id:12, name:'The Gauls'})
      create (n13:CharacterType {id:13, name:'The Romans'})
      create (n14:CharacterType {id:14, name:'The others'})
      create (n15:Citizenship {id:15, name:'Egyptian'})
      create (n16:Citizenship {id:16, name:'Gaul'})
      create (n17:Citizenship {id:17, name:'Phoenician'})
      create (n18:Citizenship {id:18, name:'Roman'})
      create (n19:Citizenship {id:19, name:'Roman/Egyptian'})
      create (n20:Album {id:20, name:'Asterix the Gaul'})
      create (n21:Album {id:21, name:'Asterix the Gladiator'})
      create (n22:Album {id:22, name:'Asterix and Cleopatra'})
      create (n23:Album {id:23, name:'Asterix in Britain'})
      create (n24:Album {id:24, name:'Asterix the Legionary'})
      create (n25:Album {id:25, name:'Asterix and the Chieftain’s Shield'})
      create (n26:Album {id:26, name:'Asterix in Spain'})
      create (n27:Album {id:27, name:'Asterix and the Roman Agent'})
      create (n28:Album {id:28, name:'The Mansions of the Gods'})
      create (n29:Album {id:29, name:'Asterix and the Laurel Wreath'})
      create (n30:Album {id:30, name:'Asterix and the Soothsayer'})
      create (n31:Album {id:31, name:'Asterix and Caesar’s Gift'})
      create (n32:Album {id:32, name:'Obelix and Co.'})
      create (n33:Album {id:33, name:'Asterix in Belgium'})
      create (n34:Album {id:34, name:'Asterix and the Black Gold'})
      create (n35:Album {id:35, name:'Asterix and Son'})
      create (n36:Album {id:36, name:'Asterix and the Magic Carpet'})
      create (n37:Album {id:37, name:'Asterix and the Secret Weapon'})
      create (n38:Album {id:38, name:'Asterix and Obelix all at Sea'})
      create (n39:Album {id:39, name:'Asterix and the Actress'})
      create (n40:Album {id:40, name:'Asterix and the class act'})
      create (n41:Album {id:41, name:'Asterix and Obelix’s Birthday'})
      create (n42:Album {id:42, name:'Asterix and the Picts'})
      create n1-[:FELLOW_COMPAGNON]->n2
      create n1-[:FELLOW_COMPAGNON]->n3
      create n1-[:FELLOW_COMPAGNON]->n4
      create n2-[:FELLOW_COMPAGNON]->n3
      create n3-[:FELLOW_COMPAGNON]->n5
      create n3-[:FELLOW_COMPAGNON]->n6
      create n3-[:FELLOW_COMPAGNON]->n7
      create n5-[:FELLOW_COMPAGNON]->n8
      create n5-[:FELLOW_COMPAGNON]->n9
      create n5-[:FELLOW_COMPAGNON]->n10
      create n5-[:FELLOW_COMPAGNON]->n11
      create n6-[:FELLOW_COMPAGNON]->n8
      create n7-[:FELLOW_COMPAGNON]->n2
      create n9-[:FELLOW_COMPAGNON]->n11
      create n1-[:CHARACTER_TYPE]->n14
      create n2-[:CHARACTER_TYPE]->n14
      create n3-[:CHARACTER_TYPE]->n13
      create n4-[:CHARACTER_TYPE]->n14
      create n5-[:CHARACTER_TYPE]->n13
      create n6-[:CHARACTER_TYPE]->n13
      create n7-[:CHARACTER_TYPE]->n13
      create n8-[:CHARACTER_TYPE]->n12
      create n9-[:CHARACTER_TYPE]->n13
      create n10-[:CHARACTER_TYPE]->n14
      create n11-[:CHARACTER_TYPE]->n14
      create n1-[:CITIZENSHIP]->n15
      create n2-[:CITIZENSHIP]->n19
      create n3-[:CITIZENSHIP]->n18
      create n4-[:CITIZENSHIP]->n15
      create n5-[:CITIZENSHIP]->n18
      create n6-[:CITIZENSHIP]->n18
      create n7-[:CITIZENSHIP]->n18
      create n8-[:CITIZENSHIP]->n16
      create n9-[:CITIZENSHIP]->n18
      create n10-[:CITIZENSHIP]->n17
      create n11-[:CITIZENSHIP]->n18
      create n1-[:APPEARS_IN]->n22
      create n1-[:APPEARS_IN]->n35
      create n1-[:APPEARS_IN]->n38
      create n1-[:APPEARS_IN]->n41
      create n3-[:APPEARS_IN]->n20
      create n3-[:APPEARS_IN]->n21
      create n3-[:APPEARS_IN]->n22
      create n3-[:APPEARS_IN]->n23
      create n3-[:APPEARS_IN]->n24
      create n3-[:APPEARS_IN]->n25
      create n3-[:APPEARS_IN]->n26
      create n3-[:APPEARS_IN]->n27
      create n3-[:APPEARS_IN]->n28
      create n3-[:APPEARS_IN]->n29
      create n3-[:APPEARS_IN]->n30
      create n3-[:APPEARS_IN]->n31
      create n3-[:APPEARS_IN]->n32
      create n3-[:APPEARS_IN]->n33
      create n3-[:APPEARS_IN]->n34
      create n3-[:APPEARS_IN]->n36
      create n3-[:APPEARS_IN]->n37
      create n3-[:APPEARS_IN]->n38
      create n3-[:APPEARS_IN]->n39
      create n3-[:APPEARS_IN]->n40
      create n3-[:APPEARS_IN]->n41
      create n3-[:APPEARS_IN]->n42
      create n2-[:APPEARS_IN]->n35
      create n4-[:APPEARS_IN]->n22
      create n4-[:APPEARS_IN]->n41
      create n5-[:APPEARS_IN]->n21
      create n6-[:APPEARS_IN]->n21
      create n7-[:APPEARS_IN]->n21
      create n7-[:APPEARS_IN]->n27
      create n7-[:APPEARS_IN]->n30
      create n7-[:APPEARS_IN]->n35
      create n9-[:APPEARS_IN]->n21
      create n10-[:APPEARS_IN]->n21
      create n10-[:APPEARS_IN]->n34
      create n10-[:APPEARS_IN]->n41
      create n11-[:APPEARS_IN]->n21;
      

      The result! A Graph…
      cypher_model

      In preparation for my presentation “Graph Database for SQL Server Professionals” (details here) I thought it would be worthwhile to quickly demonstrate and walkthrough installing Neo4j on Windows and get started creating graphs.

      With the exception of our Windows Installer, you’ll need a Java Virtual Machine installed on your computer. It is recommended that you install either OpenJDK 7 or Oracle Java 7.

      Then you need to download the latest stable version of Neo4j for Windows, which you can get here -> http://neo4j.org

      Once the download completed run the executable and follow these simple steps:
      [for the purpose of this demonstration Neo4j 2.1.3 was installed on a Windows Server 2012 R2 VM]

      neo4j_install01

      neo4j_install02

      neo4j_install03

      neo4j_install04

      neo4j_install05

      Not quite done! Before you click Start let’s you will need to change some default settings.

      Click on Settings…

      neo4j_install06

      Then click the Edit… button for the Database Configuration

      neo4j_install07

      In the neo4j.properties file we want to enable Autoindexing which is disabled by default.
      Simply remove the number sign (#) at the beginning of the following lines:

      node_auto_indexing=true
      node_keys_indexable=name,age

      Once completed the file should look similar to the following – Close the file to Save and Close the Neo4j Settings

      neo4j_install08

      Now you can click on Start and browse to http://localhost:7474

      neo4j_install09

      You are now ready to explore Neo4j and create graphs. Do the Getting started and go through the Movie Graph sample

      neo4j_install10

      Some useful resources to explore:

      In the near future I will posting more Neo4j topics; Graph Data Modeling, Importing data and Cypher.
      Enjoy!

      I have created Canadian cities Polygons for Tableau. They are either Electoral Districts or Wards and the source shapefiles comes from the following cities Open Data Portals:

      You can access these polygon points csv files from my public Dropbox here: Polygons for Tableau

      City of Ottawa Wards

      OttawaWards

      If you want to know how to create and prepare polygon maps from shapefiles to use in Tableau, I recommend the following articles:

      Keep you posted on new files from other Canadian cities…