Expert Prompt Curators for Advanced AI Evaluation Dataset

⭐ - Featured Role | Apply direct with Data Freelance Hub

This role is for "Expert Prompt Curators for Advanced AI Evaluation Dataset," a remote contract position lasting approximately 2 months, paying from $70.00 per hour. Requires advanced expertise in specialized subjects and experience in academic research or test question design.

🌎 - Country

United States

💱 - Currency

$ USD

💰 - Day rate

560

🗓️ - Date discovered

August 23, 2025

🕒 - Project duration

1 to 3 months

🏝️ - Location type

Remote

📄 - Contract type

1099 Contractor

🔒 - Security clearance

Unknown

📍 - Location detailed

United States

🧠 - Skills detailed

#Scripting #Data Management #AI (Artificial Intelligence) #Big Data #Shell Scripting #Supervised Learning #Programming #Automation #NLP (Natural Language Processing) #SAS #Data Storage #Hadoop #Unsupervised Learning #Cloud #SQL (Structured Query Language) #Spark (Apache Spark) #"ETL (Extract #Transform #Load)" #Looker #AWS (Amazon Web Services) #Bash #Data Mining #Unix #Python #Databases #Deployment #Visualization #Storage #ML (Machine Learning) #Java #Talend #R #Data Integrity

Role description

We are collaborating with a leading AI research lab to develop a next-generation evaluation dataset for frontier AI models. We are seeking experts with advanced domain knowledge across diverse fields to design extremely challenging prompts that cannot be solved by existing AI systems without internet search or browsing capabilities. The goal is to create a benchmark dataset that pushes the limits of current AI reasoning and retrieval. This is a short-term research engagement with significant impact on AI evaluation. 1. Key Responsibilities Create original, expert-level prompts that require tool use (e.g., search, browse, or code execution). Ensure prompts are objective, self-contained, and yield clear, unambiguous answers. Test prompts against advanced AI models and document failures/successes. Provide reasoning steps and solutions for each prompt. Classify prompts into subject domains for dataset organization. Collaborate with reviewers for expert validation and prompt refinement. 1. Ideal Qualifications Advanced academic or professional expertise in a specialized subject (STEM, law, finance, history, cultural studies, etc.). Strong ability to design precise, high-difficulty questions requiring deep knowledge and external references. Experience in academic research, benchmarking, or test question design preferred. Attention to detail and ability to provide concise reasoning explanations. Familiarity with AI models and their limitations is a plus. 1. More About the Opportunity Remote and asynchronous — set your own hours. Expected commitment: ~10–20 hours/week. Project duration: ~2 months, with possible extensions based on dataset needs. Opportunity to contribute to high-impact AI safety and evaluation research. 1. Compensation & Contract Terms Competitive hourly compensation based on expertise. Independent contractor engagement. Payments for services rendered processed weekly via Stripe Connect. 1. Application Process Submit your resume or CV highlighting your subject matter expertise. Complete a brief questionnaire about your background and areas of specialization. Selected applicants may be asked to draft a short test prompt. You’ll receive follow-up within a few days regarding next steps. accessible to the public. Duties Manage and oversee the acquisition, preservation, and exhibition of collections. Conduct research on artifacts and artworks to provide context and enhance educational outreach. Develop engaging exhibitions that highlight key themes and narratives within the collection. Collaborate with cross-functional teams to integrate data analytics into curatorial practices. Utilize machine learning frameworks for data mining and analysis of collection trends. Design databases for efficient storage and retrieval of collection information. Implement ETL processes to ensure data integrity across various platforms. Train models using Python, R, or Java for predictive analysis related to collections. Engage in public programming and educational initiatives to promote understanding of the collections. Skills Proficiency in programming languages such as Python, R, Java, C, and SQL. Experience with machine learning techniques including unsupervised learning, model training, and deployment. Familiarity with big data technologies such as Hadoop and Spark. Knowledge of analytics tools like Looker and SAS for data visualization and reporting. Understanding of linked data principles and quantum engineering concepts is a plus. Competence in using AWS for cloud-based solutions related to data storage and processing. Ability to work with ETL tools such as Talend for effective data management. Strong analytical skills with experience in natural language processing (NLP) applications. Excellent communication skills for collaboration with diverse teams. Familiarity with Bash (Unix shell) scripting for automation tasks. Join us in this exciting opportunity to shape the future of our collections through innovative curation practices! Apply directly here: https://work.mercor.com/jobs/list_AAABmM1FTTCh3dQzwTZAzowg?referralCode=99606272-4450-4208-a14e-f618cb2fca0e&utm_source=referral&utm_medium=share&utm_campaign=job\_referral Job Type: Contract Pay: From $70.00 per hour Expected hours: 10 – 20 per week Work Location: Remote

Apply now Apply with DFH Sign up

← See all roles