Skip to content

Code for running queries on a dataset about taxi rides in Azure (Azure functions + Event Hub)

Notifications You must be signed in to change notification settings

lefosg/azure-taxi-data-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repo includes all the code for an Azure function app which processes the dataset in this repo. Datasets are uploaded to a Blob Storage. The Azure function Blob trigger fire up, and do some calculations on that specific dataset. Streaming of mini-batches is also supported through an Event Hub which is set up. The Even Hub Trigger starts handling the mini-batch stream.

  • preprocess_dataset.py x1 x2 x3: gets the first x1 lines from the dataset and produces another csv called output.csv (locally). After that, it creates x2 batches of x3 lines each. Empirically, due to limitations of Event Hub, 20 lines per mini-batch csv file is the maximum size supported.
  • upload.py: supports two oprations 'upload.py single name_of_file.csv' which uploads the dataset to the Blob Storage (and the Blob trigger is invoked), and 'upload.py batch' which pulls the minibatch files from a directory created by preprocess_dataset.py and streams them to the Event Hub.
  • function_appy.py: code that is executed in the azure function. Includes 4 queries for the dataset uploaded, and the two triggers (Blob and Event Hub)

Results of the calculations are stored to Blob Storage, and some specific results are saved to Redis Cache.

About

Code for running queries on a dataset about taxi rides in Azure (Azure functions + Event Hub)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages