llm-safety-benchmark

Star

Here are 7 public repositories matching this topic...

EasyJailbreak / EasyJailbreak

Star

An easy-to-use Python framework to generate adversarial jailbreak prompts.

jailbreak discrete-optimization large-language-model llm-security llm-safety-benchmark jailbreak-framework

Updated Mar 30, 2026
Python

Babelscape / ALERT

Star

Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"

nlp benchmark ai artificial-intelligence nlp-machine-learning red-teaming bias-detection safety-monitoring transformers-models llm llm-evaluation llm-safety llm-safety-benchmark

Updated Sep 20, 2024
Python

BillChan226 / SafeWatch

Star

[ICLR 2025] Official implementation for "SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations"

video-generation llm-safety-benchmark video-guardrail

Updated Feb 11, 2025
Python

declare-lab / resta

Star

Restore safety in fine-tuned language models through task arithmetic

alignment safety alignment-algorithm llm llms llm-safety llms-benchmarking llm-safety-benchmark

Updated Mar 28, 2024
Python

awsm-research / SEALGuard

Star

SEALGuard: Safeguarding the Multilingual Conversations in Southeast Asian Languages for LLM Software Systems

guardrails llm-safety-benchmark safety-alignment multilingual-safety-alignment

Updated Jul 12, 2025
Python

kanson1996 / LiveSecBench

Star

LiveSecBench，面向中文场景的大模型安全评测基准。框架结合动态题库、模型对战与客观评分流程，可在伦理、合法性、事实性、隐私、对抗鲁棒与推理安全等核心维度持续追踪模型表现。

benchmark ai-safety ai-engineering ai-evaluation llm llm-security llm-evaluation llm-safety-benchmark

Updated Mar 24, 2026
Python

beviah / fracture

Star

Red-team framework for discovering alignment failures in frontier language models.

model-evaluation ai-safety jailbreak-detection red-teaming rlhf prompt-injection llm-evaluation llm-safety llm-safety-benchmark llm-judge alignment-testing adversarial-testing alignment-research

Updated Feb 19, 2026
Python

Improve this page

Add a description, image, and links to the llm-safety-benchmark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-safety-benchmark topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-safety-benchmark

Here are 7 public repositories matching this topic...

EasyJailbreak / EasyJailbreak

Babelscape / ALERT

BillChan226 / SafeWatch

declare-lab / resta

awsm-research / SEALGuard

kanson1996 / LiveSecBench

beviah / fracture

Improve this page

Add this topic to your repo