Following-up from a previous post Fetching NHL Play by Play game data where I created a Node.js app that enabled the fetching of NHL Play by Play JSON game files. The next step was to enable the parsing of JSON files into CSV files in order to do some further exploring… say like analyzing with R, load into a RDBMS, or visualizing it!

So I added a new javascript file convert.js to the existing nhlplaybyplay-node app on the GitHub repo:

One important thing! I’m using jq a lightweight and flexible command-line JSON processor. You can download it here or install using Homebrew by issuing the following command: brew install jq


Read Full Post →

sql-loves-linux_2_twitter-002-640x358 SQL Server vNext represents a major step towards making SQL Server a platform that enables choices of development languages, data types, on-premises and in the cloud, and across operating systems by bringing the power of SQL Server to Linux, Linux-based Docker containers, and Windows.

For all the downloads, code samples and to stay informed visit the following link SQL Server v.Next Public Preview


I am sharing with you my experiences, gotchas in installing SQL Server on Linux on an existing Ubuntu VM I already had running for a while…I have since installed SQL Server for Linux on several other local VM’s and in Azure without any issues just by following the documentation.

Read Full Post →

Working on something interesting from a back-end and data visualization perspective and I needed to get all of the NHL Play by Play game statistic data files, So I wrote a Node.js application called nhlplaybyplay-node that provides the means for accessing, fetching the NHL Schedule and Play by Play game data files which are in JSON format.

Hope your familiar with this…
The following URL retrieves the complete 2016-2017 NHL season schedule

Read Full Post →

In-memory distributed processing for large datasets… How to connect to SQL Server using Apache Spark? The Spark documentation covers the basics of the API and Dataframes, there is a lack of info. and examples on actually how to get this feature to work.


First, what is Apache Spark? Apache Spark is an open source big data processing framework built around speed, ease of use, and sophisticated analytics. A fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark’s standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Read Full Post →