Detect Prompt Injection validator detects prompt injection attack attempts in LLM inputs. Prompt injection attack is a type of security attack which manipulates the LLM input to modify its behavior.
The two primary types of prompt injection attacks are:
Direct injection: Directly entering malicious prompts in LLM.
Indirect injection: Subtle manipulation of LLM inputs to influence model behavior.
The detect prompt injection validator scans LLM inputs to identify vulnerabilities and classify them into:
0: No injection detected
1: Injection detected
This ensures that the inputs are properly sanitized before they’re processed by LLMs and impact their functioning.