Job Description:
• Design and implement enterprise-scale data pipelines using Databricks on AWS, leveraging both cluster-based and serverless compute paradigms
• Architect and maintain medallion architecture (Bronze/Silver/Gold) data lakes and lakehouses
• Develop and optimize Delta Lake tables for ACID transactions and efficient data management
• Build and maintain real-time and batch data processing workflows
• Create reusable, modular data transformation logic using DBT to ensure data quality and consistency across the organization
• Develop complex Python applications for data ingestion, transformation, and orchestration
• Write optimized SQL queries and implement performance tuning strategies for large-scale datasets
• Implement comprehensive data quality checks, testing frameworks, and monitoring solutions
• Design and implement CI/CD pipelines for automated testing, deployment, and rollback of data artifacts
• Configure and optimize Databricks clusters, job scheduling, and workspace management
• Implement version control best practices using Git and collaborative development workflows
• Partner with data analysts, data scientists, and business stakeholders to understand requirements and deliver solutions
• Mentor junior engineers and promote best practices in data engineering
• Document technical designs, data lineage, and operational procedures
• Participate in code reviews and contribute to team knowledge sharing
Requirements:
• 5+ years of experience in data engineering roles
• Expert-level proficiency in Databricks (Unity Catalog, Delta Live Tables, Workflows, SQL Warehouses)
• Strong understanding of cluster configuration, optimization, and serverless SQL compute
• Advanced SQL skills including query optimization, indexing strategies, and performance tuning
• Production experience with DBT (models, tests, documentation, macros, packages)
• Proficient in Python for data engineering (PySpark, pandas, data validation libraries)
• Hands-on experience with Git workflows (branching strategies, pull requests, code reviews)
• Proven track record implementing CI/CD pipelines (Jenkins, GitLab CI)
• Working knowledge of Snowflake architecture and migration patterns
Benefits:
• Monitoring and analyzing Databricks DBU (Databricks Unit) consumption and cloud infrastructure costs
• Implementing cost optimization strategies including cluster right-sizing, autoscaling configurations, and spot instance usage
• Optimizing job scheduling to leverage off-peak pricing and minimize idle cluster time
• Establishing cost allocation tags and chargeback models for different teams and projects
• Conducting regular cost reviews and providing recommendations for efficiency improvements