Survey of Small Language Models from Penn State, ...
-
Updated
Nov 6, 2025
Survey of Small Language Models from Penn State, ...
Deep Fact Validation
Real-time trustworthiness evaluation and safety interception for AI agents. Semantic analysis, safe alternative suggestions, multi-step attack chain detection, and LLM-as-Judge.
Provides web credibility models (Likert scale) to assign a trustworthiness score to a given website.
[USENIX Security 2025] Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval-Augmented Generation Models
a matrix to provide the clarified definition and relationship information of trustworthiness characteristics between in the AI/ML standards
A list of tools and methods for building trustworthy software following TrustOps principles.
In the dynamic landscape of medical artificial intelligence, this study explores the vulnerabilities of the Pathology Language-Image Pretraining (PLIP) model, a Vision Language Foundation model, under targeted attacks like PGD adversarial attack.
In this paper, we introduce SAShA, a new attack strategy that leverages semantic features extracted from a knowledge graph in order to strengthen the efficacy of the attack to standard CF models. We performed an extensive experimental evaluation in order to investigate whether SAShA is more effective than baseline attacks against CF models by ta…
Squeeze your model with pressure prompts to see if its behavior leaks.
Trustworthiness Monitoring & Assessment Framework
Codes and Datasets for our WSDM 2022 Paper: "MTLTS: A Multi-Task Framework To Obtain Trustworthy Summaries From Crisis-Related Microblogs"
Proof-Carrying Numbers (PCN): Trust is earned only by proof — the absence of a verification mark communicates uncertainty.
Visualization and embedding of large datasets using various Dimensionality Reduction (DR) techniques such as t-SNE, UMAP, PaCMAP & IVHD. Implementation of custom metrics to assess DR quality with complete explaination and workflow.
CodeGenLink is a Visual Studio Code extension that interacts with GitHub Copilot Chat to generate code, analyze its origin, and identify the associated license.
Independent continuation of a project from AstonHack 2017
Emotion architecture from Reddit comments: rater behavior, semantic clusters, and contradiction mapping in GoEmotions.
Website for health data science at KDD 2021
Which LLM do you actually trust? Blind-test 100+ AI models with truth scoring and reasoning failure classification. No branding, no marketing — just data.
Secure and trustworthy mobile AI.
Add a description, image, and links to the trustworthiness topic page so that developers can more easily learn about it.
To associate your repository with the trustworthiness topic, visit your repo's landing page and select "manage topics."