top of page
Analysis · Risk · Intelligence
Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
Adversarial Evaluation of Large Language Models
Project type
AI Safety & Red Teaming
Date
April 2026
Location
Remote
Conducted adversarial evaluations of advanced large language models (LLMs), assessing manipulation, persuasion, trust & safety, and behavioral risk scenarios. Developed realistic evaluation scenarios and structured testing frameworks to identify model vulnerabilities, failure modes, and harmful influence patterns across multiple prompting conditions. Applied rigorous annotation, calibration, and peer-review methodologies to support large-scale AI safety evaluations and improve the reliability of model risk assessments.
bottom of page

