Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlugHafen pipeline - first commit #138

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
53fb38e
FlugHafen pipeline - first commit
Crazz-Zaac Oct 26, 2024
14afdf4
exercise 2 initial commit
Crazz-Zaac Nov 3, 2024
71c382d
Project plan inital commit
Crazz-Zaac Nov 3, 2024
d72f740
ex 3 initial commit
Crazz-Zaac Nov 13, 2024
b3c8dee
minor change
Crazz-Zaac Nov 13, 2024
6a6507a
update to project plan
Crazz-Zaac Nov 13, 2024
f40dfd7
update to project plan
Crazz-Zaac Nov 13, 2024
e9454dc
update to project plan
Crazz-Zaac Nov 13, 2024
3976fad
Update project-plan-example.md
Crazz-Zaac Nov 14, 2024
57d181b
Update project-plan-example.md
Crazz-Zaac Nov 14, 2024
d7b0bd8
Update project-plan-example.md
Crazz-Zaac Nov 14, 2024
de98368
exercise 3 initial commit
Crazz-Zaac Nov 18, 2024
b93c460
removed python code, with complete jv code
Crazz-Zaac Nov 19, 2024
c60b392
code revision and documentation
Crazz-Zaac Nov 19, 2024
2f5ba0e
report pdf
Crazz-Zaac Nov 27, 2024
6fd741d
exercise 4 first commit
Crazz-Zaac Dec 1, 2024
917bd69
fixing data transform
Crazz-Zaac Dec 4, 2024
6bc38c3
test cases
Crazz-Zaac Dec 5, 2024
ae92cb8
rounded off temperature, validated month
Crazz-Zaac Dec 5, 2024
a43772a
data unit fixed
Crazz-Zaac Dec 5, 2024
382dc6a
test cases fixed
Crazz-Zaac Dec 5, 2024
c988083
exercise 5 initial commit
Crazz-Zaac Dec 5, 2024
5d6db94
exercise 5 adding validation
Crazz-Zaac Dec 5, 2024
cd12db8
workflow and requirements added
Crazz-Zaac Dec 11, 2024
30fdcdd
requirements updated
Crazz-Zaac Dec 11, 2024
5d143db
minor changes
Crazz-Zaac Dec 17, 2024
7fd19bb
minor changes
Crazz-Zaac Dec 17, 2024
5d185fc
minor changes
Crazz-Zaac Dec 17, 2024
2510641
minor changes
Crazz-Zaac Dec 19, 2024
a3849a8
experiments for the report
Crazz-Zaac Jan 9, 2025
08e5063
final report of the experiment
Crazz-Zaac Jan 9, 2025
ef80e88
minor changes
Crazz-Zaac Jan 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .github/workflows/project_feedback.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: Run Tests
run-name: ${{ github.actor }} is running tests

on:
push:
branches:
- main

jobs:
test:
runs-on: ubuntu-20.04

steps:
- uses: actions/checkout@v2

- name: Set up Python 3.11
uses: actions/setup-python@v2
with:
python-version: 3.11

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt

# project and exercise feedbacks
- name: Make test executable
run: chmod +x /home/runner/work/made-template/made-template/project/tests.sh

- name: Run project tests
run: /home/runner/work/made-template/made-template/project/tests.sh
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
.DS_Store
/data/*
!/data/.gitkeep
!/data/.gitkeep
.mypy_cache/
temp_dir/
project/__pycache__/
*.pyc
*.sqlite
74 changes: 74 additions & 0 deletions exercises/exercise1.jv
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
pipeline FlugHafen{

//1. FlugHafen Pipeline connects the blocks via pipes to extract data from a CSV file
// in the web to a SQLite file sink.
FlugHafenHttpExtractor
-> FlugHafenTextFileInterpreter;

//2. The FlugHafenTextFileInterpreter output is used as input for the FlugHafenCsvFileInterpreter
// block which is then used as input for the FlugHafenDataSelector block.
FlugHafenTextFileInterpreter
-> FlugHafenCsvFileInterpreter
// -> FlugHafenDatabaseWriter
-> FlugHafenDataSelector
-> FlugHafenTableInterpreter
-> FlugHafenLoader;

//3. The FlugHafenHttpExtractor block is of type HttpExtractor and the URL is specified.
block FlugHafenHttpExtractor oftype HttpExtractor {
// URL of the data source
url: "https://opendata.rhein-kreis-neuss.de/api/explore/v2.1/catalog/datasets/rhein-kreis-neuss-flughafen-weltweit/exports/csv?lang=en&timezone=Europe%2FBerlin&use_labels=true&delimiter=%3B";
}

//4. The FlugHafenTextFileInterpreter block is of type TextFileInterpreter.
block FlugHafenTextFileInterpreter oftype TextFileInterpreter { }

//5. Since we only need a specific range of the data, we use the CellRangeSelector block.
block FlugHafenDataSelector oftype CellRangeSelector {
// The name of the sheet
select: range A1:I*;
}

//6. The FlugHafenCsvFileInterpreter block is of type CSVInterpreter and the delimiter is specified.
block FlugHafenCsvFileInterpreter oftype CSVInterpreter {
// Specify the separator as a semicolon for the CSV
delimiter: ';';
}

// block FlugHafenDatabaseWriter oftype DatabaseWriter {
// // The name of the database
// database: "flughafen.db";
// // The name of the table
// table: "flughafen";
// }

//7. The FlugHafenTableInterpreter block is of type TableInterpreter and the necessary columns are specified.
block FlugHafenTableInterpreter oftype TableInterpreter {
// The first row contains the header
header: true;
// The columns of the table
columns: [
"Lfd. Nummmer" oftype integer,
"Name des Flughafens" oftype text,
"Ort" oftype text,
"Land" oftype text,
"IATA" oftype text,
"ICAO" oftype text,
"Latitude" oftype decimal,
"Longitude" oftype decimal,
"Altitude" oftype integer,

];
}

//8. Finally the FlugHafenLoader block is of type SQLiteLoader and the table name and file name are specified.
block FlugHafenLoader oftype SQLiteLoader {
// The name of the table
table: "airports";
// The name of the file
file: "airports.sqlite";
}


}

85 changes: 85 additions & 0 deletions exercises/exercise2.jv
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
pipeline TreePlanting{

//1. TreePlanting Pipeline connects the blocks via pipes to extract data from a CSV file
// in the web to a SQLite file sink.
TreePlantingHttpExtractor
-> TreePlantingTextFileInterpreter;

//2. The TreePlantingTextFileInterpreter output is used as input for the TreePlantingCsvFileInterpreter
// block which is then used as input for the TreePlantingDataSelector block.
TreePlantingTextFileInterpreter
-> TreePlantingCsvFileInterpreter
// -> TreePlantingDatabaseWriter
-> TreePlantingBaumartDeutschDeleter
-> TreePlantingTableInterpreter
-> TreePlantingLoader;

//3. The TreePlantingHttpExtractor block is of type HttpExtractor and the URL is specified.
block TreePlantingHttpExtractor oftype HttpExtractor {
// URL of the data source
url: "https://opendata.rhein-kreis-neuss.de/api/v2/catalog/datasets/stadt-neuss-herbstpflanzung-2023/exports/csv";
}

//4. The TreePlantingTextFileInterpreter block is of type TextFileInterpreter.
block TreePlantingTextFileInterpreter oftype TextFileInterpreter { }

//6. The TreePlantingCsvFileInterpreter block is of type CSVInterpreter and the delimiter is specified.
block TreePlantingCsvFileInterpreter oftype CSVInterpreter {
// Specify the separator as a semicolon for the CSV
delimiter: ';';
}

//5. The TreePlantingBaumartDeutschDeleter block is of type ColumnDeleter and the column to be deleted is specified.
block TreePlantingBaumartDeutschDeleter oftype ColumnDeleter {
// The name of the column
delete: [column E];
}

//7. The TreePlantingTableInterpreter block is of type TableInterpreter and the necessary columns are specified.
block TreePlantingTableInterpreter oftype TableInterpreter {
// The first row contains the header
header: true;
// The columns of the table
columns: [
"lfd_nr" oftype integer,
"stadtteil" oftype Vogelsang,
"standort" oftype text,
"baumart_botanisch" oftype text,
"id" oftype GeoCoordinate,
"baumfamilie" oftype text,

];
}

block TreePlantingLoader oftype SQLiteLoader {
// The name of the table
table: "trees";
// The name of the file
file: "trees.sqlite";
}


valuetype Vogelsang oftype text {
// The value of the column
constraints: [
// only allow column values that start with "Vogelsang"
VogelsangStadteil
];
}

valuetype GeoCoordinate oftype text {
// The value of the column
constraints: [
// only allow column values that match the pattern of a geo coordinate
Geopoints
];
}

constraint VogelsangStadteil on text: value matches(/^Vogelsang*/);
//8. Finally the TreePlantingLoader block is of type SQLiteLoader and the table name and file name are specified.

constraint Geopoints on text: value matches(/^\d{1,3}\.\d+,\s*\d{1,3}\.\d+$/);


}

92 changes: 92 additions & 0 deletions exercises/exercise3.jv
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
pipeline WorldBank {

// 1. Block to extract an XLSX file from the web
block WorldBankHttpExtractor oftype HttpExtractor {
// URL of the source file containing the data
url: "https://thedocs.worldbank.org/en/doc/7d852628d96b9411d43e5d36d5dff941-0050062022/original/Graphs-Chapter-5-02082022.xlsx";
}

// 2. Block to interpret the downloaded file as an XLSX workbook
block WorldBankTextXLSXInterpreter oftype XLSXInterpreter { }

// 3. Block to select the specific sheet "Figure S5.1.2" from the workbook
block DataCellSheetpicker oftype SheetPicker {
sheetName: "Figure S5.1.2";
}

// 4. Block to specify the cell range of interest in the selected sheet
block WorldBankRangeSelector oftype CellRangeSelector {
select: range P2:S45;
}

// 5. Block to rename columns for clarity and standardization
block NameHeaderWriter oftype CellWriter {
at: range A1:D1;
write: ["Country Code", "Economy", "GDP per Capita", "Bond Issuance Share"];
}

// 6. Block to interpret and filter the "Bond Issuance Share" column data
block BondIssuanceTableInterpreter oftype TableInterpreter {
columns: [
"Country Code" oftype CountryCodeAlpha3,
"Bond Issuance Share" oftype BetweenZeroAndOne,
];
}

// 7. Block to interpret and filter the "GDP per Capita" column data
block GDPTableInterpreter oftype TableInterpreter {
columns: [
"Country Code" oftype CountryCodeAlpha3,
"GDP per Capita" oftype PositiveDecimal,
];
}

// 8. Block to load "Bond Issuance Share" data into a SQLite table
block BondIssuanceTableLoader oftype SQLiteLoader {
table: "bondIssuance"; // Table name in the SQLite database
file: "country-stats.sqlite"; // SQLite database file
}

// 9. Block to load "GDP per Capita" data into a SQLite table
block GDPTableLoader oftype SQLiteLoader {
table: "gdpPerCapita"; // Table name in the SQLite database
file: "country-stats.sqlite"; // SQLite database file
}

// 10. Value type to enforce positive decimal values
valuetype PositiveDecimal oftype decimal {
constraints: [
OnlyPositiveDecimal // Constraint: Values must be > 0
];
}

// 11. Value type to enforce decimal values between 0 and 1
valuetype BetweenZeroAndOne oftype decimal {
constraints: [
BetweenZeroAndOneConstraint // Constraint: 0 <= value <= 1
];
}

// 12. Constraints definitions
constraint OnlyPositiveDecimal on decimal: value > 0;
constraint BetweenZeroAndOneConstraint on decimal: value >= 0 and value <= 1;

// Pipeline connections
WorldBankHttpExtractor
-> WorldBankTextXLSXInterpreter;

WorldBankTextXLSXInterpreter
-> DataCellSheetpicker
-> WorldBankRangeSelector
-> NameHeaderWriter;

// Bond Issuance pipeline
NameHeaderWriter
-> BondIssuanceTableInterpreter
-> BondIssuanceTableLoader;

// GDP per Capita pipeline
NameHeaderWriter
-> GDPTableInterpreter
-> GDPTableLoader;
}
Loading