It’s been a while since my last post! How about analyzing NYC Citi Bike data with Azure Databricks. Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform. Designed in collaboration with Microsoft and the creators of Apache Spark, Azure Databricks combines the best of Databricks and Azure to help customers accelerate innovation with one-click set up, streamlined workflows and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. Currently in Preview, I recommend you explore and start with the Azure Databricks Preview documentation. The Quickstarts section shows you how to create a Databricks workspace and create an Apache Spark cluster within that workspace and finally running a Spark job. Your best friend for this journey is also the Azure Databricks Guide (Documentation) check out the Getting Started section.

Read Full Post →

Copying data to the Cloud from the Command line, this is what this post is all about. Over the last couple of years we have seen vendors offering different object storage platforms/services in the Cloud. Cloud storage typically refers to a hosted object storage service! The following are considered the most known and used Cloud Object Storage platforms and we are going to demonstrate how to copy data to each from the command line. In no particular order I strongly recommend you read and get knowledgeable on each of their offerings and register for a free subscription:
 

Now for each platform we will specifically implement 4 actions (commands):

Read Full Post →

Finally the 2017-2018 NHL hockey season has begun! But wait… **PANIC** my little thingy nhlplaybyplay-node for fetching games and play-by-play does not work anymore… Hmmm, after some research it looked like I needed to do some updates and fixes in order to make it work again… Instead I decided to create another Node.js app nhlplaybyplay2-node

Read Full Post →

Search for a String in Multiple FilesIn support of an earlier post Fetching NHL Play by Play game data, I was recently asked how could one quickly search for a specific string in multiple JSON files recursively? Well if you are running macOS or Linux grep is your best friend! In a nutshell grep prints lines that contain a match for a pattern. The following is a sample grep and cut command that will list out (output) the games (files) that contains the following string -> “Montréal Canadiens”:

[Want to try it out! You can download and extract sample data which contains all the play by play games from the 2016-2017 season]
 
 
grep -H -R "Montréal Canadiens" /data/20162017/*.json | cut -d: -f1

Read Full Post →