The Effective Altruism
Opportunities Board
Work on the world's most pressing problems. Browse jobs, fellowships, internships, courses, and more at high-impact organisations.
Research Scientist, Safety Post-Training
ScaleSan Francisco, CA / New York, NY
San Francisco, CA / New York, NY
3 weeks ago
Salary
$216,000 $270,000 USD
Routes to impact
Direct high impact on an important cause
Skill-building & building career capital
Learning about important cause areas
Description
Conduct research on post-training and interpretability techniques to improve frontier AI safety and robustness.
- Design RLHF and post-training safety evaluation pipelines
- Study deceptive or unsafe model behaviors using interpretability tools
- Translate findings into safety standards and evaluation benchmarks
- Collaborate across policy, engineering, and research teams
This text was generated by AI. If you notice any inconsistencies, please let us know using this form.
Related opportunities
Mathematical Scientist, AI Safety Research
LawZeroMontreal, Canada
Montreal, Canada
3 weeks ago
Researcher, AI Cognition Initiative (Technical Focus)
Rethink PrioritiesRemote
Remote
3 weeks ago
Research Scientist/Engineer (Science of Scheming)
Apollo ResearchLondon, United Kingdom
London, United Kingdom
3 months ago
Research Scientist/Engineer (Evaluations)
Apollo ResearchLondon, United Kingdom
London, United Kingdom
3 months ago
Research Scientist, Manipulation Evaluations
Apart ResearchRemote (Europe preferred)
Remote (Europe preferred)
3 days ago
Research Engineer, Scalable Interpretability
TransluceSan Francisco, CA
San Francisco, CA
4 days ago
Associate Machine Learning Engineer, Secure AI Lab
Carnegie Mellon UniversityPittsburgh, PA | Arlington, VA
Pittsburgh, PA | Arlington, VA
6 days ago
Expression of Interest, Red Team
AI Safety Ideas (AISI)London, UK
London, UK
2 weeks ago
Join 60k subscribers and sign up for the EA Newsletter, a monthly email with the latest ideas and opportunities