

Apar Technologies
Staff Machine Learning Engineer
⭐ - Featured Role | Apply direct with Data Freelance Hub
This role is for a Staff Machine Learning Engineer with a contract length of "unknown" and a pay rate of "unknown." Key skills required include 10+ years of engineering experience, expertise in AWS, and proficiency in PyTorch and Hugging Face Transformers.
🌎 - Country
United States
💱 - Currency
$ USD
-
💰 - Day rate
Unknown
-
🗓️ - Date
November 22, 2025
🕒 - Duration
Unknown
-
🏝️ - Location
Unknown
-
📄 - Contract
Unknown
-
🔒 - Security
Unknown
-
📍 - Location detailed
San Jose, CA
-
🧠 - Skills detailed
#Transformers #SageMaker #S3 (Amazon Simple Storage Service) #VPC (Virtual Private Cloud) #ECR (Elastic Container Registery) #AI (Artificial Intelligence) #Compliance #C++ #REST (Representational State Transfer) #Security #Deployment #AWS (Amazon Web Services) #Datasets #EC2 #MLflow #Licensing #IAM (Identity and Access Management) #Regression #PyTorch #AutoScaling #Java #ML (Machine Learning) #Python #Observability #Cloud #Batch #Hugging Face #"ETL (Extract #Transform #Load)"
Role description
What you’ll do (Responsibilities)
• Own the technical roadmap for Verilog/RTL‑focused LLM capabilities—from model selection and adaptation to evaluation, deployment, and continuous improvement.
• Lead a hands‑on team of applied scientists/engineers: set direction, unblock technically, review designs/code, and raise the bar on experimentation velocity and reliability.
• Fine‑tune and customize models using state‑of‑the‑art techniques (LoRA/QLoRA, PEFT, instruction tuning, preference optimization/RLAIF) with robust HDL‑specific evals:
• Compile‑/lint‑/simulate‑based pass rates, pass@k for code generation, constrained decoding to enforce syntax, and “does‑it‑synthesize” checks.
• Design privacy‑first ML pipelines on AWS:
• Training/customization and hosting using Amazon Bedrock (including Anthropic models) where appropriate; SageMaker (or EKS + KServe/Triton/DJL) for bespoke training needs.
• Artifacts in S3 with KMS CMKs; isolated VPC subnets & PrivateLink (including Bedrock VPC endpoints), IAM least‑privilege, CloudTrail auditing, and Secrets Manager for credentials.
• Enforce encryption in transit/at rest, data minimization, no public egress for customer/RTL corpora.
• Stand up dependable model serving: Bedrock model invocation where it fits, and/or low‑latency self‑hosted inference (vLLM/TensorRT‑LLM), autoscaling, and canary/blue‑green rollouts.
• Build an evaluation culture: automatic regression suites that run HDL compilers/simulators, measure behavioral fidelity, and detect hallucinations/constraint violations; model cards and experiment tracking (MLflow/Weights & Biases).
• Partner deeply with hardware design, CAD/EDA, Security, and Legal to source/prepare datasets (anonymization, redaction, licensing), define acceptance gates, and meet compliance requirements.
• Drive productization: integrate LLMs with internal developer tools (IDEs/plug‑ins, code review bots, CI), retrieval (RAG) over internal HDL repos/specs, and safe tool‑use/function‑calling.
• Mentor & uplevel: coach ICs on LLM best practices, reproducible training, critical paper reading, and building secure‑by‑default systems.
What you’ll bring (Minimum qualifications)
• 10+ years total engineering experience with 5+ years in ML/AI or large‑scale distributed systems; 3+ years working directly with transformers/LLMs.
• Proven track record shipping LLM‑powered features in production and leading ambiguous, cross‑functional initiatives at Staff level.
• Deep hands‑on skill with PyTorch, Hugging Face Transformers/PEFT/TRL, distributed training (DeepSpeed/FSDP), quantization‑aware fine‑tuning (LoRA/QLoRA), and constrained/grammar‑guided decoding.
• AWS expertise to design and defend secure enterprise deployments, including:
• Amazon Bedrock (model selection, Anthropic model usage, model customization, Guardrails, Knowledge Bases, Bedrock runtime APIs, VPC endpoints)
• SageMaker (Training, Inference, Pipelines), S3, EC2/EKS/ECR, VPC/Subnets/Security Groups, IAM, KMS, PrivateLink, CloudWatch/CloudTrail, Step Functions, Batch, Secrets Manager.
• Strong software engineering fundamentals: testing, CI/CD, observability, performance tuning; Python a must (bonus for Go/Java/C++).
Demonstrated ability to set technical vision and influence across teams; excellent written and verbal communication for execs and engineers.
What you’ll do (Responsibilities)
• Own the technical roadmap for Verilog/RTL‑focused LLM capabilities—from model selection and adaptation to evaluation, deployment, and continuous improvement.
• Lead a hands‑on team of applied scientists/engineers: set direction, unblock technically, review designs/code, and raise the bar on experimentation velocity and reliability.
• Fine‑tune and customize models using state‑of‑the‑art techniques (LoRA/QLoRA, PEFT, instruction tuning, preference optimization/RLAIF) with robust HDL‑specific evals:
• Compile‑/lint‑/simulate‑based pass rates, pass@k for code generation, constrained decoding to enforce syntax, and “does‑it‑synthesize” checks.
• Design privacy‑first ML pipelines on AWS:
• Training/customization and hosting using Amazon Bedrock (including Anthropic models) where appropriate; SageMaker (or EKS + KServe/Triton/DJL) for bespoke training needs.
• Artifacts in S3 with KMS CMKs; isolated VPC subnets & PrivateLink (including Bedrock VPC endpoints), IAM least‑privilege, CloudTrail auditing, and Secrets Manager for credentials.
• Enforce encryption in transit/at rest, data minimization, no public egress for customer/RTL corpora.
• Stand up dependable model serving: Bedrock model invocation where it fits, and/or low‑latency self‑hosted inference (vLLM/TensorRT‑LLM), autoscaling, and canary/blue‑green rollouts.
• Build an evaluation culture: automatic regression suites that run HDL compilers/simulators, measure behavioral fidelity, and detect hallucinations/constraint violations; model cards and experiment tracking (MLflow/Weights & Biases).
• Partner deeply with hardware design, CAD/EDA, Security, and Legal to source/prepare datasets (anonymization, redaction, licensing), define acceptance gates, and meet compliance requirements.
• Drive productization: integrate LLMs with internal developer tools (IDEs/plug‑ins, code review bots, CI), retrieval (RAG) over internal HDL repos/specs, and safe tool‑use/function‑calling.
• Mentor & uplevel: coach ICs on LLM best practices, reproducible training, critical paper reading, and building secure‑by‑default systems.
What you’ll bring (Minimum qualifications)
• 10+ years total engineering experience with 5+ years in ML/AI or large‑scale distributed systems; 3+ years working directly with transformers/LLMs.
• Proven track record shipping LLM‑powered features in production and leading ambiguous, cross‑functional initiatives at Staff level.
• Deep hands‑on skill with PyTorch, Hugging Face Transformers/PEFT/TRL, distributed training (DeepSpeed/FSDP), quantization‑aware fine‑tuning (LoRA/QLoRA), and constrained/grammar‑guided decoding.
• AWS expertise to design and defend secure enterprise deployments, including:
• Amazon Bedrock (model selection, Anthropic model usage, model customization, Guardrails, Knowledge Bases, Bedrock runtime APIs, VPC endpoints)
• SageMaker (Training, Inference, Pipelines), S3, EC2/EKS/ECR, VPC/Subnets/Security Groups, IAM, KMS, PrivateLink, CloudWatch/CloudTrail, Step Functions, Batch, Secrets Manager.
• Strong software engineering fundamentals: testing, CI/CD, observability, performance tuning; Python a must (bonus for Go/Java/C++).
Demonstrated ability to set technical vision and influence across teams; excellent written and verbal communication for execs and engineers.






