It is an implementation of MapReduce to analysis the flight delay data and applies the design pattern of the in-mapper combining with flush-when-full
. There are two tasks in this project:
- Calculate the average delay (in minutes) of the departures of all scheduled flights and the average delay of the arrivals of all scheduled flights. The source file
Delay.java
is in the Program folder. - Output all combinations of airline name and year such that the percentage P of scheduled flights whose departures were at least 31 minutes late (among all scheduled flights of that airline in that year) is at least 50%. The source file
Late.java
is in the Program folder.
The dataset is the monthly Full Analysis with Arrival-Departure Split
files (in CSV format) from January 2011 to August 2017. It can be downloaded from here (in the Data folder) or you can download it from the UK flight punctuality data.
As mentioned above, the program files Delay.java
and Late.java
are in the Program folder and the code is well documented with comments.
- Download the dataset from the Data folder and unzip it.
- Download the JAR file
UKFlightAnalysis.jar
from the Jar folder and execute the following scripts:
hadoop jar [jar folder path]/UKFlightAnalysis.jar org.marco.Delay [data folder]/ [output folder]/
hadoop jar [jar folder path]/UKFlightAnalysis.jar org.marco.Late [data folder]/ [output folder]/
An example of the [jar folder path] could be ~/CC/Jar
. An example of the [data folder] could be ~/CC/input
. An example of the [output folder] could be ~/CC/outputDelay
.
Welcome to contact me via [email protected]