

Robert Half
100% Remote 3+ Month Reinforcement Learning Engineer (Pricing Intelligence) Contract
β - Featured Role | Apply direct with Data Freelance Hub
This role is a 100% remote, 3+ month contract for a Reinforcement Learning Engineer focused on pricing intelligence. Key requirements include hands-on RL experience, proficiency in Python and ML frameworks, and familiarity with AWS SageMaker.
π - Country
United States
π± - Currency
$ USD
-
π° - Day rate
640
-
ποΈ - Date
October 23, 2025
π - Duration
More than 6 months
-
ποΈ - Location
Unknown
-
π - Contract
Unknown
-
π - Security
Unknown
-
π - Location detailed
United States
-
π§ - Skills detailed
#ML (Machine Learning) #AWS (Amazon Web Services) #Data Science #Python #Scala #PyTorch #AWS SageMaker #Deployment #TensorFlow #SageMaker #Reinforcement Learning #MLflow #Automation
Role description
Overview:
Our client is building next-generation pricing intelligence capabilities that leverage reinforcement learning (RL) to optimize decisions in dynamic, real-world environments. This role is ideal for someone who thrives at the intersection of applied machine learning, experimentation, and scalable deployment β not just theory.
What Youβll Do:
β’ Design, train, and evaluate reinforcement learning agents to solve complex optimization problems in pricing and decision automation.
β’ Develop end-to-end RL pipelines β from environment design and reward shaping to policy evaluation and tuning.
β’ Implement and iterate on algorithms such as PPO, DQN, and contextual bandits using frameworks like PyTorch, TensorFlow Agents, RLlib, or Stable Baselines.
β’ Deploy and monitor models in production via AWS SageMaker, building automated training, testing, and CI/CD workflows.
β’ Collaborate cross-functionally with data scientists, engineers, and business stakeholders to bring research concepts into measurable business impact.
What Weβre Looking For:
β’ Proven hands-on experience applying reinforcement learning in real or simulated environments β beyond coursework or research papers.
β’ Strong understanding of RL principles: reward functions, exploration vs. exploitation, policy optimization, and convergence behavior.
β’ Proficiency in Python and at least one major ML framework (PyTorch, TensorFlow, JAX, etc.).
β’ Experience with AWS SageMaker or similar platforms for model training, deployment, and experimentation.
β’ Familiarity with experimentation tracking tools (e.g., MLflow, Weights & Biases) and MLOps best practices.
β’ Bonus: experience with optimization, pricing models, or applied decision systems.
Overview:
Our client is building next-generation pricing intelligence capabilities that leverage reinforcement learning (RL) to optimize decisions in dynamic, real-world environments. This role is ideal for someone who thrives at the intersection of applied machine learning, experimentation, and scalable deployment β not just theory.
What Youβll Do:
β’ Design, train, and evaluate reinforcement learning agents to solve complex optimization problems in pricing and decision automation.
β’ Develop end-to-end RL pipelines β from environment design and reward shaping to policy evaluation and tuning.
β’ Implement and iterate on algorithms such as PPO, DQN, and contextual bandits using frameworks like PyTorch, TensorFlow Agents, RLlib, or Stable Baselines.
β’ Deploy and monitor models in production via AWS SageMaker, building automated training, testing, and CI/CD workflows.
β’ Collaborate cross-functionally with data scientists, engineers, and business stakeholders to bring research concepts into measurable business impact.
What Weβre Looking For:
β’ Proven hands-on experience applying reinforcement learning in real or simulated environments β beyond coursework or research papers.
β’ Strong understanding of RL principles: reward functions, exploration vs. exploitation, policy optimization, and convergence behavior.
β’ Proficiency in Python and at least one major ML framework (PyTorch, TensorFlow, JAX, etc.).
β’ Experience with AWS SageMaker or similar platforms for model training, deployment, and experimentation.
β’ Familiarity with experimentation tracking tools (e.g., MLflow, Weights & Biases) and MLOps best practices.
β’ Bonus: experience with optimization, pricing models, or applied decision systems.






