

DigiTran Technologies Inc.
Databricks SME
β - Featured Role | Apply direct with Data Freelance Hub
This role is for a Databricks SME with a contract length of "unknown," offering a pay rate of "unknown," and requires expertise in data ingestion, de-duplication, and tagging. Key skills include Azure, Databricks, and data pipeline management. A bachelor's degree and 13 years of experience are required.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
Unknown
-
ποΈ - Date
May 8, 2026
π - Duration
Unknown
-
ποΈ - Location
Unknown
-
π - Contract
Unknown
-
π - Security
Unknown
-
π - Location detailed
Raleigh, NC
-
π§ - Skills detailed
#Hadoop #Teradata #GIT #Data Ingestion #Classification #Continuous Deployment #Databases #"ETL (Extract #Transform #Load)" #Data Lake #Tableau #Compliance #Cloud #ML (Machine Learning) #Apache NiFi #Monitoring #Kafka (Apache Kafka) #Deployment #NLP (Natural Language Processing) #Data Management #Data Integrity #Spark (Apache Spark) #SQL (Structured Query Language) #Metadata #Apache Kafka #Migration #Data Pipeline #Scala #Azure #Documentation #Scripting #Oracle #Visualization #Datasets #Python #ADF (Azure Data Factory) #Data Catalog #NiFi (Apache NiFi) #SAS #Batch #Perl #DevOps #Data Governance #Data Engineering #Ruby #S3 (Amazon Simple Storage Service) #C++ #Azure Data Factory #Databricks #Apache Spark #Qlik #Azure cloud #Strategy
Role description
Description:
We are seeking a Data Engineer to support our client with data ingestion, data deduplication and data tagging for migration of a large-scale data environment into Databricks.
The ideal candidate will also bring hands-on expertise in end-to-end data pipeline management, including data ingestion from diverse sources, de-duplication of large-scale datasets, and data tagging to support downstream analytics, governance, and machine learning workflows.
Roles and Responsibilities (including but not limited to)
β’ Design, develop, and maintain scalable data ingestion pipelines to onboard structured, semi-structured, and unstructured data from batch and streaming sources (e.g., APIs, databases, flat files, message queues) into the Azure/Databricks environment.
β’ Implement de-duplication strategies across large-scale datasets using deterministic and probabilistic matching techniques to ensure data integrity and reduce redundancy within the Data Lake.
β’ Develop and enforce data tagging frameworks to classify, label, and annotate datasets with appropriate metadata (e.g., sensitivity, source, domain, lineage) to support data governance, discoverability, and compliance requirements.
β’ Assist with Operationalizing deployments and support of Cloud services for ETL Operations. This will include standardizing and automating processes and workflows, creating documentation/knowledge articles, and overall assisting Operations staff who have limited experience in Cloud.
β’ Written and oral presentations to high-level CIO management on status of current efforts.
β’ Possesses skills and experience related to business management, systems engineering, operations research, and management engineering. Typically has specialization in a particular technology or business application. Keeps abreast of technological developments and industry trends.
β’ Assist with deployment, configuration, and management of Azure Cloud environment.
β’ Assist with migration efforts of existing ETL jobs into Azure/Databricks cloud environment.
β’ Ability to share optimization and efficiencies with the larger team and management.
β’ Ability to automate solutions to repetitive problems/tasks.
Basic Qualifications
β’ Bachelorβs degree and 13 years of experience. A degree from an accredited College/University in the applicable field of services is preferred. Four additional years of relevant experience in lieu of a college degree is required. If Degree is not in the applicable field, then four additional years of related experience is required.
β’ 3+ years demonstrated experience designing and implementing data ingestion pipelines using tools such as Azure Data Factory, Apache Kafka, Apache NiFi, Spark Structured Streaming, or equivalent technologies.
β’ 3+ years of experience applying de-duplication techniques at scale, including record linkage, fuzzy matching, and entity resolution across structured and unstructured datasets.
β’ 3+ Hands-on experience with data tagging and metadata management, including the use of tagging schemas, data catalogs (e.g., Azure Purview, Apache Atlas), and automated classification tools to support data governance and lineage tracking.
β’ 3 + Demonstrated experience working with unstructured data.
β’ 2 + years of experience in using Databricks or other Spark-based platforms.
β’ Fluency in at least one scripting language (Python, Perl, Ruby, or equivalent).
Desired Skills:
β’ Integration of Git in continuous deployment and experience with DevOps monitoring tools.
β’ Experience with one or more of the following products and technologies: SAS, Python, C++, Hadoop, SQL Database/Coding, Teradata, Oracle, Amazon S3, Apache Spark, Machine Learning, Natural Language Processing, and visualization tools such as Tableau, Strategy and QLIK.
β’ Strong skills and experience in Cloud Operations support in Azure.
Description:
We are seeking a Data Engineer to support our client with data ingestion, data deduplication and data tagging for migration of a large-scale data environment into Databricks.
The ideal candidate will also bring hands-on expertise in end-to-end data pipeline management, including data ingestion from diverse sources, de-duplication of large-scale datasets, and data tagging to support downstream analytics, governance, and machine learning workflows.
Roles and Responsibilities (including but not limited to)
β’ Design, develop, and maintain scalable data ingestion pipelines to onboard structured, semi-structured, and unstructured data from batch and streaming sources (e.g., APIs, databases, flat files, message queues) into the Azure/Databricks environment.
β’ Implement de-duplication strategies across large-scale datasets using deterministic and probabilistic matching techniques to ensure data integrity and reduce redundancy within the Data Lake.
β’ Develop and enforce data tagging frameworks to classify, label, and annotate datasets with appropriate metadata (e.g., sensitivity, source, domain, lineage) to support data governance, discoverability, and compliance requirements.
β’ Assist with Operationalizing deployments and support of Cloud services for ETL Operations. This will include standardizing and automating processes and workflows, creating documentation/knowledge articles, and overall assisting Operations staff who have limited experience in Cloud.
β’ Written and oral presentations to high-level CIO management on status of current efforts.
β’ Possesses skills and experience related to business management, systems engineering, operations research, and management engineering. Typically has specialization in a particular technology or business application. Keeps abreast of technological developments and industry trends.
β’ Assist with deployment, configuration, and management of Azure Cloud environment.
β’ Assist with migration efforts of existing ETL jobs into Azure/Databricks cloud environment.
β’ Ability to share optimization and efficiencies with the larger team and management.
β’ Ability to automate solutions to repetitive problems/tasks.
Basic Qualifications
β’ Bachelorβs degree and 13 years of experience. A degree from an accredited College/University in the applicable field of services is preferred. Four additional years of relevant experience in lieu of a college degree is required. If Degree is not in the applicable field, then four additional years of related experience is required.
β’ 3+ years demonstrated experience designing and implementing data ingestion pipelines using tools such as Azure Data Factory, Apache Kafka, Apache NiFi, Spark Structured Streaming, or equivalent technologies.
β’ 3+ years of experience applying de-duplication techniques at scale, including record linkage, fuzzy matching, and entity resolution across structured and unstructured datasets.
β’ 3+ Hands-on experience with data tagging and metadata management, including the use of tagging schemas, data catalogs (e.g., Azure Purview, Apache Atlas), and automated classification tools to support data governance and lineage tracking.
β’ 3 + Demonstrated experience working with unstructured data.
β’ 2 + years of experience in using Databricks or other Spark-based platforms.
β’ Fluency in at least one scripting language (Python, Perl, Ruby, or equivalent).
Desired Skills:
β’ Integration of Git in continuous deployment and experience with DevOps monitoring tools.
β’ Experience with one or more of the following products and technologies: SAS, Python, C++, Hadoop, SQL Database/Coding, Teradata, Oracle, Amazon S3, Apache Spark, Machine Learning, Natural Language Processing, and visualization tools such as Tableau, Strategy and QLIK.
β’ Strong skills and experience in Cloud Operations support in Azure.






