We are seeking a Senior Big Data Engineer with a strong background in managing structured and unstructured data pipelines, who thrives in a fast-paced AI-focused environment. You will be instrumental in building and scaling our data lake architecture, supporting a system designed to fuel intelligent AI agents for data collection, labeling, and analytical reasoning. This includes integrating vector databases and optimizing for retrieval-augmented generation (RAG) workflows deployed on AWS Bedrock and other AI stacks.
- Design and implement scalable ingestion pipelines for structured/unstructured data using AWS and Databricks Unity Catalog.
- Build and maintain high-throughput ETL/ELT pipelines with Apache Airflow and Databricks.
- Architect and manage data modeling, storage, and indexing strategies in PostgreSQL and RDS, ensuring compatibility with AI retrieval systems.
- Integrate and manage vector databases to support fast semantic and embedding-based search in RAG pipelines.
- Collaborate with AI engineers to ensure seamless compatibility with LangGraph and LangSmith agent systems.
- Implement robust data validation, lineage, and governance systems using Unity Catalog.
- Optimize performance across distributed compute environments (Databricks, EC2).
- Deploy and maintain Lambda-based microservices for scalable, real-time data ingestion and enrichment.