Pipeline Development:
  • Build and maintain data pipelines using Azure Databricks and Azure Data Factory
  • Implement ingestion and transformation logic across Bronze and Silver layers. 
  • Support batch and incremental processing patterns. 
Curated Layer Logic: 
  • Implement hydration, merge, and upsert logic using Delta Lake. 
  • Ensure curated datasets meet data quality and business requirements. 
  • Handle late-arriving data and incremental updates. 
Performance & Storage Optimization:
  • Optimize Delta Lake tables for performance and cost. 
  • Select and tune appropriate storage formats (Parquet / Delta). 
  • Apply partitioning, compaction, and file sizing strategies. 
  • Tune Spark jobs for large-scale data processing. 
Downstream & DWH Collaboration: 
  • Work closely with DWH and BI teams to support downstream consumption. 
  • Provide optimized datasets for Synapse and reporting workloads. 
  • Support data validation and reconciliation with Gold layer outputs. 
Engineering Best Practices:
  • Implement basic CI/CD practices for data pipelines. 
  • Follow coding standards, documentation, and version control practices. 
  • Support production troubleshooting and performance tuning. 
Experience:
  • 6–8 years of experience in data engineering. 
  • Strong hands-on experience building pipelines on Azure. 
  • Experience working with large datasets and distributed processing. 
Technical Skills:
  • Strong proficiency in PySpark
  • Hands-on experience with Azure Databricks
  • Strong experience with Azure Data Factory
  • Deep knowledge of Delta Lake tuning and optimization
  • Experience with storage optimization (Parquet, Delta). 
  • Strong SQL skills for transformation and validation. 
Tools & Practices:
  • Experience with Git and basic CI/CD pipelines. 
  • Familiarity with data quality and validation techniques. 
  • Experience working in Agile delivery models. 
Soft Skills:
  • Strong analytical and problem-solving skills. 
  • Ability to work independently on complex pipelines. 
  • Good communication and collaboration skills. 
Nice to Have:
  • Experience supporting Synapse Dedicated SQL Pool
  • Exposure to streaming or near real-time pipelines
  • Familiarity with data governance or metadata tools