**Role Overview**
Mercor is collaborating with a leading AI research organization to engage experts capable of developing challenging Q&A pairs derived from academic papers on arXiv. This short-term project aims to enhance benchmark datasets by crafting questions designed to surpass the capabilities of state-of-the-art AI models. Experts will leverage their deep knowledge of specialized research domains to create analytically demanding and insightful prompts for evaluating advanced AI systems.
• *Key Responsibilities**
- Select and thoroughly analyze academic papers from arXiv in your area of expertise.
- Craft sophisticated Q&A pairs that test advanced AI models’ conceptual, analytical, and synthesis abilities.
- Clearly document detailed metadata, including paper references, question categories, difficulty justifications, and task completion times.
- Evaluate model-generated answers, iteratively refining questions until leading AI models fail.
• *Ideal Qualifications**
- Advanced degree (PhD or equivalent) or significant professional experience in a specialized research field (e.g., computational biology, quantum physics, deep learning, etc.).
- Proven familiarity with academic literature and research publication standards (e.g., arXiv).
- Strong analytical reasoning, meticulous attention to detail, and clarity in technical writing.
- Prior experience with benchmark design or AI evaluation preferred.
- h-index >= 10
• *More About the Opportunity**
- Expected commitment: approximately 10–20 hours per week.
- Estimated duration: ~2–3 months, with the start of 5 sample tasks in week 1 for a small pilot.
• *Application Process**
- Submit your CV/resume highlighting relevant research expertise.
- Complete a short assessment to demonstrate ability to craft challenging, analytically complex questions.
- Mercor will follow up within one week regarding next steps.