plan-bound authorization architecture for governing privileged effects in untrusted computational agents.
-
Updated
Mar 30, 2026
plan-bound authorization architecture for governing privileged effects in untrusted computational agents.
Chrome extension PoC for AI training data poisoning via silent network interception. Inverts subscribe→unsubscribe, like→dislike, accept→reject while preserving UX.
Not new AI, but accountable and auditable AI
Fork for my contributions on Trojans, Comprehensive course materials covering ML safety, robustness, and AI alignment
Technical whitepaper on runtime governance for autonomous AI systems.
Add a description, image, and links to the ai-saftey topic page so that developers can more easily learn about it.
To associate your repository with the ai-saftey topic, visit your repo's landing page and select "manage topics."