

VBeyond Corporation
Data Scientist with NLP
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Scientist with NLP in Houston, TX, on a long-term contract. Key skills include NLP, AWS Bedrock, and healthcare experience. Proficiency in Python, SQL, and big data technologies is required.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
January 6, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
On-site
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Houston, TX
-
🧠 - Skills detailed
#MySQL #Spark (Apache Spark) #Python #Langchain #Scala #PostgreSQL #Data Storage #AWS EMR (Amazon Elastic MapReduce) #Storage #AWS (Amazon Web Services) #ML (Machine Learning) #PySpark #AI (Artificial Intelligence) #Documentation #"ETL (Extract #Transform #Load)" #SQL (Structured Query Language) #Scripting #NLP (Natural Language Processing) #Deep Learning #FHIR (Fast Healthcare Interoperability Resources) #Data Science #Databases #Data Framework #Programming #Big Data #Automated Testing
Role description
Job Description
Job Title : - Data Scientist
Location : - Houston, TX (5 days onsite/Week)
Type of Employment : - Contract
Duration : - Long Term
Must have : -NLP, AWS Bedrock & Healthcare
Mandatory skills
• Expertise in Fine tuning using AWS Nova
• Proficiency in Python and scripting languages for NLP and machine learning development.
• Hands-on experience with large language models and agentic workflow tools such as LangGraph.
• Strong understanding of clinical NLP techniques and experience with machine learning and deep learning models.
• Expertise in SQL and big data technologies including AWS EMR and Spark/pySpark.
• Practical knowledge of AWS services, especially AWS Bedrock for generative AI applications.
• Experience with relational databases such as PostgreSQL or MySQL.
Good to have skills: -
• Familiarity with generative AI applications in healthcare and related use cases.
• Understanding of healthcare data standards and terminologies such as HL7, FHIR, and CCDA.
• Experience in creating detailed documentation, user manuals, and technical specifications.
• Background in automated testing and validation frameworks for NLP outputs.
• Ability to collaborate effectively with cross-functional teams including engineering and products.
• Exposure to LangChain or similar frameworks for building intelligent agent workflows.
Responsibilities: -
• Analyze and process clinical textual data using AI-powered NLP techniques and advanced machine learning models.
• Modify and improve current workflows by incorporating cutting-edge machine learning and deep learning algorithms, including leveraging large language models (LLMs) and tools like LangGraph for complex AI agentic workflows in healthcare contexts.
• Develop NLP modules within the NLP development team using programming or scripting languages such as Python.
• Conduct pre-processing and quality analysis for textual data inputs and validate performance of NLP outputs.
• Create systematic testing procedures, error-checking mechanisms, and user manuals for NLP modules.
• Build infrastructure for optimal extraction, transformation, and loading of data from diverse sources including MCP servers, using SQL and AWS big data frameworks such as EMR and Spark/pySpark.
• Collaborate with Engineering teams to ensure scalable and efficient data workflows using SQL and AWS big data technologies.
• Apply working knowledge of AWS services, particularly AWS Bedrock, to develop generative AI applications.
• Utilize relational databases such as PostgreSQL or MySQL for data storage and retrieval in NLP and AI workflows.
Job Description
Job Title : - Data Scientist
Location : - Houston, TX (5 days onsite/Week)
Type of Employment : - Contract
Duration : - Long Term
Must have : -NLP, AWS Bedrock & Healthcare
Mandatory skills
• Expertise in Fine tuning using AWS Nova
• Proficiency in Python and scripting languages for NLP and machine learning development.
• Hands-on experience with large language models and agentic workflow tools such as LangGraph.
• Strong understanding of clinical NLP techniques and experience with machine learning and deep learning models.
• Expertise in SQL and big data technologies including AWS EMR and Spark/pySpark.
• Practical knowledge of AWS services, especially AWS Bedrock for generative AI applications.
• Experience with relational databases such as PostgreSQL or MySQL.
Good to have skills: -
• Familiarity with generative AI applications in healthcare and related use cases.
• Understanding of healthcare data standards and terminologies such as HL7, FHIR, and CCDA.
• Experience in creating detailed documentation, user manuals, and technical specifications.
• Background in automated testing and validation frameworks for NLP outputs.
• Ability to collaborate effectively with cross-functional teams including engineering and products.
• Exposure to LangChain or similar frameworks for building intelligent agent workflows.
Responsibilities: -
• Analyze and process clinical textual data using AI-powered NLP techniques and advanced machine learning models.
• Modify and improve current workflows by incorporating cutting-edge machine learning and deep learning algorithms, including leveraging large language models (LLMs) and tools like LangGraph for complex AI agentic workflows in healthcare contexts.
• Develop NLP modules within the NLP development team using programming or scripting languages such as Python.
• Conduct pre-processing and quality analysis for textual data inputs and validate performance of NLP outputs.
• Create systematic testing procedures, error-checking mechanisms, and user manuals for NLP modules.
• Build infrastructure for optimal extraction, transformation, and loading of data from diverse sources including MCP servers, using SQL and AWS big data frameworks such as EMR and Spark/pySpark.
• Collaborate with Engineering teams to ensure scalable and efficient data workflows using SQL and AWS big data technologies.
• Apply working knowledge of AWS services, particularly AWS Bedrock, to develop generative AI applications.
• Utilize relational databases such as PostgreSQL or MySQL for data storage and retrieval in NLP and AI workflows.






