Benchmark

API

How To

A Guide To Integrating Pythia With Text Summarizers

Haziqa Sajid

Jul 26, 2024

Learn to integrate Pythia with text summarizers using Wisecube Python SDK and detect AI hallucinations in real time.

Text summarization tools use Natural Language Processing (NLP) to convert long, complex text into smaller summaries while preserving the key takeaways. There are usually two types of text summarization: extractive text summarization and abstract text summarization. Extractive summarization selects the most important sentences from the input text and combines them into a summary, while abstractive summarization generates new sentences by paraphrasing the input text. 

Like any other AI technology, text summarizers can hallucinate. These hallucinations can be in the form of made-up text or missed details. Therefore, they must be spotted as soon as they occur to improve AI performance over time. Wisecube’s Pythia monitors text summarizers for continuous hallucination detection and analysis. This lets stakeholders know when the AI system exceeds the acceptable inaccuracy rate and take timely corrective actions. 

In this guide, we’ll integrate Wisecube Pythia with a text summarizer using the Wisecube Python SDK. 

Integrating Pythia with Text Summarizer for Hallucination Detection

Wisecube Python SDK has a simple syntax for integrating Pythia with text summarizers. Below is the step-by-step guide to integrating Pythia in text summarizers:

  1. Getting an API key

Before you begin hallucination detection, you need a Wisecube API key. To get your unique API key, fill out the API key request form with your email address and the purpose of the API request. 

  1. Installing Wisecube

Once you receive your API key, install the Wisecube Python SDK in your Python environment. Copy the following command in your Python console and run the code to install Wisecube:

pip install wisecube
  1. Authenticating Wisecube API Key

Next, you must authenticate your API key to use Pythia for online hallucination monitoring. Copy and run the following command to do so:

from wisecube_sdk.client import WisecubeClientAPI_KEY = "YOUR_API_KEY"
client = WisecubeClient(API_KEY).client
  1. Developing a Text Summarizer

For this tutorial, we’re using the NLTK library in Python to build a text summarizer that accepts web URLs as inputs to generate a webpage summary. However, you can integrate Pythia with any text summarizer, regardless of its framework and purpose.

pip install beautifulsoup4
pip install lxml
pip install nltk

from urllib.request import urlopen
from bs4 import BeautifulSoup
  
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize
import heapq
  
nltk.download('punkt')
nltk.download('stopwords')
  
def summarize(text, num_sentences=5):  
 "" Summarizes the given text by extracting the most important sentencesArgs:      
    text: The text to be summarized.      
    num_sentences: The number of sentences to include in the summary 
(default: 5)Returns:     
    A string containing the summarized text""  
# Preprocess the text  
sentences = sent_tokenize(text)  
stop_words = stopwords.words('english')  
filtered_sentences = [      
    [word for word in sentence.lower().split() if word not in stop_words]     
    for sentence in sentences  
]  

# Calculate sentence score based on word frequency (improved with TF-IDF)  
 from collections import Counter  
 word_counts = Counter()  
 total_words = 0  
 for sentence in filtered_sentences:    
   word_counts.update(sentence)    
   total_words += len(sentence)  
    
 sentence_scores = {i: 0 for i in range(len(filtered_sentences))}  
 for i, sentence in enumerate(filtered_sentences):    
   for word in sentence:      
     tf = word_counts[word] / total_words  # Term Frequency      
     idf = sum(word in s for s in filtered_sentences)  # Inverse Document Frequency (assuming single document)     
     sentence_scores[i] += tf * idf  
      
 # Select top scoring sentences for summary  
 summary_sentences = heapq.nlargest(num_sentences, sentence_scores, key=sentence_scores.get)  
 summary = ' '.join([sentences[i] for i in summary_sentences])  
  
 return summary
   
def summarize_from_web(url, num_sentences=5):  
  ""    
  Summarizes the text content of a webpageArgs:      
      url: The URL of the webpage to summarize.      
      num_sentences: The number of sentences to include in the summary 
        
(default: 5)Returns:      
      A string containing the summarized text from the webpage, or None if unable to access the webpage""        
  try:    
    html_content = urlopen(url).read()    
    soup = BeautifulSoup(html_content, features="html.parser")    
      
    # Identify the main content area (adjust based on website structure)    
    text = soup.find_all('p')  # Change this selector based on the website's content structure    
      
    # Combine paragraphs into a single string    
    text = ' '.join([paragraph.get_text() for paragraph in text])    
    summary = summarize(text, num_sentences)    
    accuracy = client.ask_pythia([text], summary, "")   
    return summary  
  except Exception as e:    
    print(f"Error accessing webpage: {e}")    
    return None
      
def get_summary(url):  
  summary = summarize_from_web(url)  
    
  if summary:    
    print(f"Summary: {summary}")  
  else:    
    return "Failed to summarize the webpage."
      
get_summary("https://en.wikipedia.org/wiki/Ancient_Rome")
  1. Using Pythia to Detect Hallucinations

Now, we can use Pythia to detect real-time hallucinations in summarized text. To do this, we save the ask_pythia() response to the accuracy variable and return it from the summarize_from_web function. Finally, we extract summary and accuracy from the get_summary function to display the output. 

Note that we’ve stored only the accuracy from Pythia's response so that the user can see the accuracy of the AI system along with the summarized text. However, you can also display other metrics like Contradiction and Entailment depending on your needs.

Modify the summarize_from_web and get_summary function to return accuracy as shown below:

def summarize_from_web(url, num_sentences=5):  
  try:    
    html_content = urlopen(url).read()    
    soup = BeautifulSoup(html_content, features="html.parser")    
      
    text = soup.find_all('p')         
    
    text = ' '.join([paragraph.get_text() for paragraph in text])    
    summary = summarize(text, num_sentences)    
    accuracy = client.ask_pythia([text], summary, "")    
    return summary, accuracy['data']['askPythia']['metrics']['accuracy']  
  except Exception as e:    
    print(f"Error accessing webpage: {e}")    
    return None
      
def get_summary(url):  
  summary, accuracy = summarize_from_web(url)  
    
  if summary:    
    print(f"Summary: {summary}")    
    print(f"Accuracy: {accuracy:.4f}")  
  else:    
    return "Failed to summarize the webpage."

The final Pythia output can be seen in the screenshot below. This output includes the summary and accuracy of the summarized text.

Benefits of Using Pythia with Text Summarizers

Pythia offers numerous benefits when integrated into your workflows, helping you to continually improve your AI systems. However, each AI system has distinct requirements and functionalities, and Pythia efficiently handles them. Recently, Pythia helped a healthcare client achieve an LLM accuracy of 98.8% through continuous monitoring, a billion-scale knowledge graph, and a systematic checker error (SCErr) metric. Below are some reasons why Pythia is a must-have for your text summarization tools:

Reliable Summaries

Pythia extracts claims from input text in the form of knowledge triplets and verifies them against the summarized text. If the AI tool fails to include crucial information or generates made-up claims, it categorizes them into relevant hallucination categories and creates an audit report. The audit report reveals the strengths and weaknesses of your system so you can debug the system on time. 

Real-time Monitoring

Pythia monitors Large Language Models (LLMs) against relevant references in real time. The real-time insights are available for your review in the Pythia dashboard, representing AI performance through relevant visualizations. Real-time monitoring allows immediate action when the LLM exceeds acceptable hallucination limits.

Safe Outputs

Pythia uses multiple input and output validators to protect user queries and LLM responses against bias, data leakage, and nonsensical outputs. These validators operate with each Pythia call, ensuring safe interactions between an LLM and a user. This increases user trust in AI and the company's reputation. 

Contact us today to detect real-time hallucinations in text summarization tools to build user trust and company reputation.

Haziqa Sajid

Share this post