The Effective Altruism
Opportunities Board
Work on the world's most pressing problems. Browse jobs, fellowships, internships, courses, and more at high-impact organisations.
Research Scientist, Safety Post-Training
ScaleSan Francisco, CA / New York, NY
San Francisco, CA / New York, NY
Today
Routes to impact
Direct high impact on an important cause
Skill-building & building career capital
Learning about important cause areas
Description
Conduct research on post-training and interpretability techniques to improve frontier AI safety and robustness.
- Design RLHF and post-training safety evaluation pipelines
- Study deceptive or unsafe model behaviors using interpretability tools
- Translate findings into safety standards and evaluation benchmarks
- Collaborate across policy, engineering, and research teams
This text was generated by AI. If you notice any inconsistencies, please let us know using this form.
Related opportunities
Research Scientist/Engineer (Science of Scheming)
Apollo ResearchLondon, United Kingdom
London, United Kingdom
2 months ago
Research Scientist/Engineer (Evaluations)
Apollo ResearchLondon, United Kingdom
London, United Kingdom
2 months ago
Team Member, Search and AI Evaluations
National Institute of Standards and Technology (NIST)Gaithersburg, MD
Gaithersburg, MD
Yesterday
Director, Evaluations
LawZeroMontreal, Canada
Montreal, Canada
Yesterday
Team Member, Model Policy
OpenAISan Francisco, CA
San Francisco, CA
2 days ago
Member of Technical Staff
CivAIBerkeley, USA
Berkeley, USA
2 weeks ago
Data Scientist
Center for Security and Emerging Technology (CSET)Washington, USA
Washington, USA
2 weeks ago
Researcher, Misalignment Research
OpenAISan Francisco, USA
San Francisco, USA
3 weeks ago
Join 60k subscribers and sign up for the EA Newsletter, a monthly email with the latest ideas and opportunities