Design, implement, and maintain scalable data pipelines using Databricks Lakehouse Platform, with a strong focus on Apache Spark, Delta Lake, and Unity Catalog.
Lead the development of batch and streaming data workflows that power analytics, machine learning, and business intelligence use cases.
Collaborate with data scientists, architects, and business stakeholders to translate complex data requirements into robust, production-grade solutions.
Optimize performance and cost-efficiency of Databricks clusters and jobs, leveraging tools like Photon, Auto Loader, and Job Workflows.
Establish and enforce best practices for data quality, governance, and security within the Databricks environment.
Mentor junior engineers and contribute to the evolution of the team’s Databricks expertise.
Job Specification
Job Specifications:
Deep hands-on experience with Databricks on Azure, AWS, or GCP, including Spark (PySpark/Scala), Delta Lake, and MLflow.
Strong programming skills in Python or Scala, and experience with CI/CD pipelines (e.g., GitHub Actions, Azure DevOps).
Solid understanding of distributed computing, data modeling, and performance tuning in cloud-native environments.
Familiarity with orchestration tools (e.g., Databricks Workflows, Airflow) and infrastructure-as-code (e.g., Terraform).
A proactive mindset, strong communication skills, and a passion for building scalable, reliable data systems.