-
Serverless Application Model (SAM) for Data Professionals
- AWS Lambda provides serverless computing capabilities and it can be used for performing validation or light processing/transformation of data. Moreover, with its integration with more than 140 AWS services, it facilitates building complex systems employing event-driven architectures. There are many ways to build serverless applications and one of the most efficient ways is using specialised frameworks such as the AWS Serverless Application Model (SAM) and Serverless Framework. In this post, I’ll demonstrate how to build a serverless data processing application using SAM.
-
Kafka Connect for AWS Services Integration - Aiven OpenSearch Sink Connector
- We discuss how to develop a data pipeline from Apache Kafka into OpenSearch. In part 1, the pipeline is developed locally using Docker while it is deployed on AWS in the next post.
-
Setup Local Development Environment for Apache Flink and Spark Using EMR Container Images
- In this post, we will discuss how to set up a local development environment for Apache Flink and Spark using the EMR container images. For the former, a custom Docker image will be created, which downloads dependent connector Jar files into the Flink library folder, fixes process startup issues, and updates Hadoop configurations for Glue Data Catalog integration. For the latter, instead of creating a custom image, the EMR image is used to launch the Spark container where the required configuration updates are added at runtime via volume-mapping. After illustrating the environment setup, we will discuss a solution where data ingestion/processing is performed in real time using Apache Flink and the processed data is consumed by Apache Spark for analysis.
-
Data Build Tool (dbt) Pizza Shop Demo
- The data build tool (dbt) is a popular data transformation tool for data warehouse development. Moreover, it can be used for data lakehouse development thanks to open table formats such as Apache Iceberg, Apache Hudi and Delta Lake. In this series of posts, we discuss practical data warehouse/lakehouse examples including ETL orchestration with Apache Airflow.