Skip to content

An implementation of MapReduce to analysis the flight delay data with "in-mapper combining with flush-when-full" design pattern

Notifications You must be signed in to change notification settings

marcochang1028/Flight-Delay-Analysis-by-MapReduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Analysis-of-Flight-Delay-Data-by-MapReduce

Objective

It is an implementation of MapReduce to analysis the flight delay data and applies the design pattern of the in-mapper combining with flush-when-full. There are two tasks in this project:

  1. Calculate the average delay (in minutes) of the departures of all scheduled flights and the average delay of the arrivals of all scheduled flights. The source file Delay.java is in the Program folder.
  2. Output all combinations of airline name and year such that the percentage P of scheduled flights whose departures were at least 31 minutes late (among all scheduled flights of that airline in that year) is at least 50%. The source file Late.java is in the Program folder.

Data

The dataset is the monthly Full Analysis with Arrival-Departure Split files (in CSV format) from January 2011 to August 2017. It can be downloaded from here (in the Data folder) or you can download it from the UK flight punctuality data.

Program

As mentioned above, the program files Delay.java and Late.java are in the Program folder and the code is well documented with comments.

Running Test

  1. Download the dataset from the Data folder and unzip it.
  2. Download the JAR file UKFlightAnalysis.jar from the Jar folder and execute the following scripts:
hadoop jar [jar folder path]/UKFlightAnalysis.jar org.marco.Delay [data folder]/ [output folder]/
hadoop jar [jar folder path]/UKFlightAnalysis.jar org.marco.Late [data folder]/ [output folder]/

An example of the [jar folder path] could be ~/CC/Jar. An example of the [data folder] could be ~/CC/input. An example of the [output folder] could be ~/CC/outputDelay.

References

Remarks

Welcome to contact me via [email protected]

About

An implementation of MapReduce to analysis the flight delay data with "in-mapper combining with flush-when-full" design pattern

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages