Pythia dashboard allows for the generation and downloading of a comprehensive report to track AI hallucinations over time and maintain audit trails.
Pythia AI reliability report provides predictive insights into LLM performance and optimization suggestions for LLM improvement.
The report highlights Pythia metrics against each user query, highlighting Pythia performance by categorizing LLM response claims into the following categories:
Accuracy
The accuracy metric represents the proportion of factually correct claims in the LLM response.
Contradiction
Contradiction represents claims that are generated by LLM but don’t exist in references. These highlight potential hallucinations.
Entailment
Entailment claims are claims present in both LLM responses and references.
Neutral
Claims that are generated by LLM but neither contradicted nor confirmed by the references are flagged as Neutral claims.