Pipeline Development:

Build and maintain data pipelines using Azure Databricks and Azure Data Factory.
Implement ingestion and transformation logic across Bronze and Silver layers.
Support batch and incremental processing patterns.

Curated Layer Logic:

Implement hydration, merge, and upsert logic using Delta Lake.
Ensure curated datasets meet data quality and business requirements.
Handle late-arriving data and incremental updates.

Performance & Storage Optimization:

Optimize Delta Lake tables for performance and cost.
Select and tune appropriate storage formats (Parquet / Delta).
Apply partitioning, compaction, and file sizing strategies.
Tune Spark jobs for large-scale data processing.

Downstream & DWH Collaboration:

Work closely with DWH and BI teams to support downstream consumption.
Provide optimized datasets for Synapse and reporting workloads.
Support data validation and reconciliation with Gold layer outputs.

Engineering Best Practices:

Implement basic CI/CD practices for data pipelines.
Follow coding standards, documentation, and version control practices.
Support production troubleshooting and performance tuning.

Experience:

6–8 years of experience in data engineering.
Strong hands-on experience building pipelines on Azure.
Experience working with large datasets and distributed processing.

Technical Skills:

Strong proficiency in PySpark.
Hands-on experience with Azure Databricks.
Strong experience with Azure Data Factory.
Deep knowledge of Delta Lake tuning and optimization.
Experience with storage optimization (Parquet, Delta).
Strong SQL skills for transformation and validation.

Tools & Practices:

Experience with Git and basic CI/CD pipelines.
Familiarity with data quality and validation techniques.
Experience working in Agile delivery models.

Soft Skills:

Strong analytical and problem-solving skills.
Ability to work independently on complex pipelines.
Good communication and collaboration skills.

Nice to Have:

Experience supporting Synapse Dedicated SQL Pool.
Exposure to streaming or near real-time pipelines.
Familiarity with data governance or metadata tools.

View all job openings