DataPipeline.Pro

Challenge
The customer faced multiple challenges including:
- Lack of automated deployments for Composer/Dataflow and DBT.
- Absence of unit tests and data quality checks.
- Manual setup of development environments leading to inefficiencies.
Solution
CI/CD Implementation:
- Set up automated pipelines using Bitbucket Pipelines.
- Automated deployments for Cloud Composer, Dataflow and DBT projects.
- Ensure possibility of integration of unit tests and data quality checks into CI pipelines.
Development Environment Setup:
- Configured automated deployment to dev/prod environments from feature branches.
- Established processes for minimal manual setup during testing. Project completion: Q4-2024
Documentation and Training:
- Provided comprehensive documentation of the implemented solutions.
- Conducted training sessions for the Softonic team on new processes and tools.
Outcome
The implemented solution resulted in:
- Fully automated deployments, significantly reducing manual efforts.
- Enhanced data quality through integrated unit tests and validation.
- Streamlined development environment setup, reducing errors and improving efficiency.
- Improved agility, enabling faster iteration and deployment of data workflows.
This CI/CD pipeline transformation has empowered the customer’s data engineering team with an efficient, scalable, and automated workflow, enabling seamless deployment and data reliability.
Technology
- Orchestration: Airflow running in Cloud Composer
- Data Warehouse: BigQuery
- Transformations: Migrating to DBT
- Ingestion: Dataflow jobs deployed via GCS templates
- Processing: Python jobs running on GCE VMs
- Reporting: QlikSense
- Version Control & CI/CD: Bitbucket Pipelines

How does it work?

Data Sources

cloud databases
on-premise database
Excel files with "pretty" formatting
csv files

Python Script

processing Excel files with formatting
conversion to *.csv

Linux Pipeline

Data filtering

Staging

Staging schema data load

Aggragation / MDS

Data aggregation at the month level
Populating Intermediate Fact Tables
Loading MD datamarts
Data transfer to MDS

MDS

MD Enrichment byuser
Enter MD required for calculations: courses, units. conversion reates.
Launch dataflow continuation

DWH Loading

Calculation and loading of data marts from fact tables and MDS user data
Recording the download log and the errors that occurred with the reasons

PowerBI

PowerBI dataset refresh

Have a project? Let's make it happen!

Fill out the feedback form or write to us at Team@DPLP.com and we will back to you!

How does it work?

Have a project? Let's make it happen!

Do you have any questions?

Do you want to work with us?