rebase of old project
This is a project for internal use, demostrating the combination use of dbt, snowflake, azure devops.
Data comes from azure blob storage and/or local and will be pushlished to PowerBI
Pipline triggering by {commit to the main branch}
At this moment I would like to start to build a prototype data pipeline to process {a small amount of batch data}.
According to my plan it should involve
• Snowflake – data warehousing
• DBT – data transformation
• Azure Devops – orchestration & automation
• GitHub -- version control & repo host
• Ideally the data would be ingested from blob storage and be published via PowerBI.
• Trigger periodically.
- Ingest data from azure blob storage;
- Create a role in snowflake that excluding to dbt use;
- Ingesting stream data;
- Containerize data pipeline (even make it a micro service);
- Embed SonarQube into pipeline;
- Build an agent pool for data engineering team only (avoid dbt installing every time);
- Host the DBT docs from a seperate port (outside of the agent);
- Talk to architect and security people about connecting Snowflake and PowerBI (Production env.).
- Use a linked service to connect to DBT cloud (instead of installing in my agent)
- Use azure devops agent or function app to execute script? (pros / cons)