Securing Chatbot Systems with LLM Guard
About Project
Objective
To evaluate the integration of LLM Guard with TinyLLaMA-1.1B-Chat in mitigating prompt injection attacks and enhancing chatbot security using adversarial prompt datasets.
Tools & Technologies
Python, LLM Guard, TinyLLaMA-1.1B, xTRam1 Dataset, Hugging Face Transformers, PyTorch, Google Colab
View on GitHub
Implementation & Results
Integrated TinyLLaMA-1.1B-Chat with LLM Guard as a middleware scanner to block adversarial prompts.
Used xTRam1 Safe-Guard Prompt Injection Dataset for robust training and evaluation.
Implemented response-level filtering using a toxicity detector to capture risky outputs from false negatives.
Achieved 94.6% accuracy, 99.8% precision, and 0.94 AUC—confirming production-grade reliability.
Demonstrated layered security as a practical alternative to retraining foundational models like GPT-4.