top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

Adversarial Evaluation of Large Language Models

Project type

AI Safety & Red Teaming

Date

April 2026

Location

Remote

Conducted adversarial evaluations of advanced large language models (LLMs), assessing manipulation, persuasion, trust & safety, and behavioral risk scenarios. Developed realistic evaluation scenarios and structured testing frameworks to identify model vulnerabilities, failure modes, and harmful influence patterns across multiple prompting conditions. Applied rigorous annotation, calibration, and peer-review methodologies to support large-scale AI safety evaluations and improve the reliability of model risk assessments.

bottom of page