Coursera
Secure AI: Red-Teaming & Safety Filters

Enjoy unlimited growth with a year of Coursera Plus for $199 (regularly $399). Save now.

Coursera

Secure AI: Red-Teaming & Safety Filters

Brian Newman
Starweaver

Instructors: Brian Newman

Included with Coursera Plus

Learn more

Gain insight into a topic and learn the fundamentals.
Intermediate level

Recommended experience

3 hours to complete
Flexible schedule
Learn at your own pace
Gain insight into a topic and learn the fundamentals.
Intermediate level

Recommended experience

3 hours to complete
Flexible schedule
Learn at your own pace

What you'll learn

  • Design red-teaming scenarios to identify vulnerabilities and attack vectors in large language models using structured adversarial testing.

  • Implement content-safety filters to detect and mitigate harmful outputs while maintaining model performance and user experience.

  • Evaluate and enhance LLM resilience by analyzing adversarial inputs and developing defense strategies to strengthen overall AI system security.

Details to know

Shareable certificate

Add to your LinkedIn profile

Recently updated!

December 2025

Assessments

1 assignmentÂą

AI Graded see disclaimer
Taught in English

See how employees at top companies are mastering in-demand skills

 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

There are 3 modules in this course

This module introduces participants to the systematic creation and execution of red-teaming scenarios targeting large language models. Students learn to identify common vulnerability categories including prompt injection, jailbreaking, and data extraction attacks. The module demonstrates how to design realistic adversarial scenarios that mirror real-world attack patterns, using structured methodologies to probe LLM weaknesses. Hands-on demonstrations show how red-teamers simulate malicious user behavior to uncover security gaps before deployment.

What's included

4 videos2 readings1 peer review

This module covers the design, implementation, and evaluation of content-safety filters for LLM applications. Participants explore multi-layered defense strategies including input sanitization, output filtering, and behavioral monitoring systems. The module demonstrates how to configure safety mechanisms that balance security with functionality, and shows practical testing methods to validate filter effectiveness against sophisticated bypass attempts. Real-world examples illustrate the challenges of maintaining robust content filtering while preserving user experience.

What's included

3 videos1 reading1 peer review

This module focuses on comprehensive resilience testing and systematic improvement of AI system robustness. Students learn to conduct thorough security assessments that measure LLM resistance to adversarial inputs, evaluate defense mechanism effectiveness, and identify areas for improvement. The module demonstrates how to establish baseline security metrics, implement iterative hardening processes, and validate improvements through continuous testing. Participants gain skills in developing robust AI systems that maintain integrity under real-world adversarial conditions.

What's included

4 videos1 reading1 assignment2 peer reviews

Instructors

Brian Newman
Coursera
3 Courses928 learners

Offered by

Coursera

Why people choose Coursera for their career

Felipe M.
Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
Jennifer J.
Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
Larry W.
Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
Chaitanya A.
"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Frequently asked questions

Âą Some assignments in this course are AI-graded. For these assignments, your data will be used in accordance with Coursera's Privacy Notice.