Securing Chatbot Systems with LLM Guard

March 2025
AI Safety • LLM Security Evaluation
Securing Chatbot Systems with LLM Guard
Prompt Injection Evaluation
Securing Chatbot Systems with LLM Guard

Objective

To evaluate the integration of LLM Guard with TinyLLaMA-1.1B-Chat in mitigating prompt injection attacks and enhancing chatbot security using adversarial prompt datasets.

Tools & Technologies

Python, LLM Guard, TinyLLaMA-1.1B, xTRam1 Dataset, Hugging Face Transformers, PyTorch, Google Colab

View on GitHub

Implementation & Results

Integrated TinyLLaMA-1.1B-Chat with LLM Guard as a middleware scanner to block adversarial prompts.

Used xTRam1 Safe-Guard Prompt Injection Dataset for robust training and evaluation.

Implemented response-level filtering using a toxicity detector to capture risky outputs from false negatives.

Achieved 94.6% accuracy, 99.8% precision, and 0.94 AUC—confirming production-grade reliability.

Demonstrated layered security as a practical alternative to retraining foundational models like GPT-4.