Pythia can be used during LLM development to track their performance or integrated into existing workflows. Find out your use case from the resource list below and get started with Pythia today.
How it Works
Pythia is available as an API and is designed to identify hallucinations to generate a detailed report highlighting the performance of your LLM.
Once integrated with an LLM, Pythia captures relevant keywords and concepts from a user query and looks for relevant information in the knowledge base. Once the system retrieves relevant sources from the knowledge base, it filters them and aggregates results. The filter and aggregation depend on the source's relevance to the user query.
Finally, LLM formulates a comprehensive response for the user based on the user query and retrieved information.
Resources
Introduction to Pythia
Read the documentation
Request your API key
API key authentication
Learn to integrate Pythia with Python SDK into your workflows
Pythia prompt examples
Get assistance on Pythia Subreddit
Key Concepts
Knowledge Triplets
Pythia extracts claims from reference and LLM responses in the form of knowledge triplets. Knowledge triplets divide a sentence into <subject, predicate, object> format. This allows capturing the connection between words in a sentence and understanding user context.
Claim Comparison and Categorization
Pythia compares response claims with LLM references and categorizes them based on the deviation of LLM response from accuracy. These categories include:
Entailment
Claims that are present in both response and references are categorized as Entailment. These claims indicate accurate outputs.
Contradiction
Claims present in LLM responses but disregarded by references are flagged as Contradiction.
Missing facts
Claims present in the references but disregarded by LLM responses are flagged as Missing facts. Missing facts represent gaps in LLM responses.
Neutral
Claims present in the LLM response but are neither contradicted nor confirmed by the references are Neutral claims. Neutral claims can be further verified against a broader knowledge base to evaluate LLM’s trustworthiness.
Aggregate Report
After categorizing claims, Pythia generates an aggregate report highlighting LLM performance. The report includes the percentage contribution of each hallucination category in LLM response, diagnosing potential issues in an LLM.
Real-time Monitoring
When integrated into AI workflows, Pythia detects hallucinations in real-time. This allows developers and researchers to spot and address LLM shortcomings on time.