Falcon Smart IT

Data Engineer

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Engineer on a contract-to-hire basis, offering remote work. Key skills include Databricks, Spark, Python, and SQL. Experience with AI/ML workflows and cloud platforms (AWS, Azure, GCP) is essential. Certifications in Databricks are preferred.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
October 24, 2025
🕒 - Duration
Unknown
-
🏝️ - Location
Remote
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
United States
-
🧠 - Skills detailed
#Dataflow #Data Engineering #ADLS (Azure Data Lake Storage) #GitLab #Langchain #BI (Business Intelligence) #Azure Repos #ML (Machine Learning) #Storage #Apache Spark #DevOps #AWS (Amazon Web Services) #AI (Artificial Intelligence) #Azure #Data Integration #GCP (Google Cloud Platform) #Delta Lake #Data Science #MLflow #Databricks #AWS Glue #Kafka (Apache Kafka) #Hugging Face #Scala #Python #Transformers #Spark (Apache Spark) #Data Quality #Computer Science #Data Lake #S3 (Amazon Simple Storage Service) #Security #SQL (Structured Query Language) #Azure DevOps #Data Processing #REST API #Version Control #"ETL (Extract #Transform #Load)" #Azure Data Factory #Model Deployment #GitHub #Data Storage #Cloud #Data Lakehouse #GIT #ADF (Azure Data Factory) #REST (Representational State Transfer) #Deployment #Data Pipeline
Role description
Job Title: Data Engineer Location: Remote Job Type: Contract to Hire Job Description: Overview: We are looking for a highly skilled Data Engineer with deep expertise in Databricks and a strong understanding of AI/ML workflows. This role is central to building and optimizing scalable data platforms that support advanced analytics and machine learning initiatives. You will work closely with data engineers, data scientists, ML engineers, and business stakeholders to enable intelligent data-driven solutions. Key Responsibilities: • Design, develop, and maintain scalable data pipelines using Apache Spark on Databricks. • Build and manage Delta Lake architectures for efficient data storage and retrieval. • Implement robust ETL/ELT workflows using Databricks notebooks, SQL, and Python. • Collaborate with AI/ML teams to operationalize models within the Databricks environment. • Optimize data workflows for performance, reliability, and cost-efficiency in cloud platforms (AWS, Azure, or GCP). • Ensure data quality, lineage, and governance using tools like Unity Catalog and MLflow. • Develop CI/CD pipelines for data and ML workflows using Databricks Repos and Git integrations. • Monitor and troubleshoot production data pipelines and model deployments. Primary Skill/ Experience: • Strong hands-on experience with Databricks, including Spark, Delta Lake, and MLflow. • Proficiency in Python, SQL, and distributed data processing. • Experience with cloud-native data services (e.g., AWS Glue, Azure Data Factory, GCP Dataflow). • Familiarity with machine learning lifecycle and integration of models into data pipelines. • Understanding of data warehousing, data lakehouse architecture, and real-time streaming (Kafka, Spark Structured Streaming). • Experience with version control, CI/CD, and infrastructure-as-code tools. • Excellent communication and collaboration skills. Secondary Skill/ Experience: • Certifications in Databricks (e.g., Databricks Certified Data Engineer Associate/Professional). • Experience with feature engineering and feature stores in Databricks. • Exposure to MLOps practices and tools. • Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field. • Leveraged Databricks for scalable AI and BI solutions, integrating well-known large language models (Anthropic, LLaMA, Gemini) to enhance data-driven insights. Developed agentic AI agents to automate complex decision-making workflows. Tech Stack - Core Tools: • Databricks (Spark, Delta Lake, MLflow, Notebooks) • Python & SQL • Apache Spark (via Databricks) • Delta Lake (for lakehouse architecture) • Cloud Platforms • Azure, AWS, or GCP • Cloud Storage (ADLS, S3, GCS) • Data Integration • Kafka or Event Hubs (streaming) • Auto Loader (Databricks file ingestion) • REST APIs • AI/ML • MLflow (model tracking/deployment) • Hugging Face Transformers • LangChain / LlamaIndex (LLM integration) • LLMs: Anthropic Claude, Meta LLaMA, Google Gemini • DevOps • Git (GitHub, GitLab, Azure Repos) • Databricks Repos • CI/CD: GitHub Actions, Azure DevOps • Security & Governance • Unity Catalog • RBAC Education: Bachelor in Computer Science or Similar