VBeyond Corporation

Data Scientist with NLP

⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Data Scientist with NLP in Houston, TX, on a long-term contract. Key skills include NLP, AWS Bedrock, and healthcare experience. Proficiency in Python, SQL, and big data technologies is required.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
January 6, 2026
🕒 - Duration
Unknown
-
🏝️ - Location
On-site
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
Houston, TX
-
🧠 - Skills detailed
#MySQL #Spark (Apache Spark) #Python #Langchain #Scala #PostgreSQL #Data Storage #AWS EMR (Amazon Elastic MapReduce) #Storage #AWS (Amazon Web Services) #ML (Machine Learning) #PySpark #AI (Artificial Intelligence) #Documentation #"ETL (Extract #Transform #Load)" #SQL (Structured Query Language) #Scripting #NLP (Natural Language Processing) #Deep Learning #FHIR (Fast Healthcare Interoperability Resources) #Data Science #Databases #Data Framework #Programming #Big Data #Automated Testing
Role description
Job Description Job Title : - Data Scientist Location : - Houston, TX (5 days onsite/Week) Type of Employment : - Contract Duration : - Long Term Must have : -NLP, AWS Bedrock & Healthcare Mandatory skills • Expertise in Fine tuning using AWS Nova • Proficiency in Python and scripting languages for NLP and machine learning development. • Hands-on experience with large language models and agentic workflow tools such as LangGraph. • Strong understanding of clinical NLP techniques and experience with machine learning and deep learning models. • Expertise in SQL and big data technologies including AWS EMR and Spark/pySpark. • Practical knowledge of AWS services, especially AWS Bedrock for generative AI applications. • Experience with relational databases such as PostgreSQL or MySQL. Good to have skills: - • Familiarity with generative AI applications in healthcare and related use cases. • Understanding of healthcare data standards and terminologies such as HL7, FHIR, and CCDA. • Experience in creating detailed documentation, user manuals, and technical specifications. • Background in automated testing and validation frameworks for NLP outputs. • Ability to collaborate effectively with cross-functional teams including engineering and products. • Exposure to LangChain or similar frameworks for building intelligent agent workflows. Responsibilities: - • Analyze and process clinical textual data using AI-powered NLP techniques and advanced machine learning models. • Modify and improve current workflows by incorporating cutting-edge machine learning and deep learning algorithms, including leveraging large language models (LLMs) and tools like LangGraph for complex AI agentic workflows in healthcare contexts. • Develop NLP modules within the NLP development team using programming or scripting languages such as Python. • Conduct pre-processing and quality analysis for textual data inputs and validate performance of NLP outputs. • Create systematic testing procedures, error-checking mechanisms, and user manuals for NLP modules. • Build infrastructure for optimal extraction, transformation, and loading of data from diverse sources including MCP servers, using SQL and AWS big data frameworks such as EMR and Spark/pySpark. • Collaborate with Engineering teams to ensure scalable and efficient data workflows using SQL and AWS big data technologies. • Apply working knowledge of AWS services, particularly AWS Bedrock, to develop generative AI applications. • Utilize relational databases such as PostgreSQL or MySQL for data storage and retrieval in NLP and AI workflows.