The goal of this project is to develop a cloud-native pipelining (CNP) service that facilitates the analysis and management of project data. A project can request a data pipeline and storage, and then load the data into the pipeline. The loaded raw data is stored in a storage bucket, the insights gained from the raw data are stored in a database. The end user is provided with a frontend to upload the data, and to access and analyse the stored data.
- Docker installed and configured
docker —version
~ Docker version 24.0.6docker compose version
~ Docker Compose version v2.23.0-desktop.1
- Amazon S3 Bucket (https://aws.amazon.com/en/s3/)
- create and setup an S3 Bucket for file storage
- Keycloak
- https/SSL is required for keycloak to work. README.md
Clone the repo:
git clone https://github.com/amosproj/amos2023ws04-pipeline-manager.git
Navigate to the main root folder using:
cd amos2023ws04-pipeline-manager
As we have secrets in the backend app, we need to copy the template env to an .env
cp src/backend/.env.template src/backend/.env
cp src/backend/client_secrets.template.json src/backend/client_secrets.json
And then configure the environment variables to connect to your ASW and Apache Airflow connections.
To build the images:
docker compose build
And then in order to get the system up and running, execute the following:
docker compose up -d # in detached mode
Deployment Pipeline Functionality
- An IT staff member can roll out an instance of a CNP at the request of a project.
- The deployment pipeline provides the entire infrastructure consisting of frontend, backend, data pipeline, storage, and database so that a project member can then work with the CNP.
- The deployed frontend is accessible from the internet.
Frontend Functionality
- A user can upload his data through the provisioned CNP and select a data pipeline suitable for that purpose.
- A user can search and retrieve relevant information from his CNP project and associated data.
- A user can check and control the status of the CNP project.
Backend Functionality
The backend takes care of the orchestration of the described software components, with the following rough process flow:
- Access control to the project and project data using an appropriate IAM system
- Receiving the data provided via the frontend
- Forwarding the data to the data pipeline
- Transfer of the prepared data and raw data to the storage system and database
Please take a look at the SD wiki for creating personal tickets/Issues for the project.
Issue creation Guidlines for SD's
- @keldami
- @krutarth4
- @bhanuPrakashMa
- @sravanthidatla78
- @ingunnaf
- @CAgcoder
- @elementator
- @lalitha2395