Hi Barb,
I just tested the default prompt used for enriching chunks with the LLM, and it’s working fine on my side. I’m using XO11 and created a new Search AI app configured to crawl our docs.kore.ai site for content.
After running a test chat to generate some chunks, I used one to test the enrichment prompt. I did, however, lower the temperature from the default (which is quite high) to 0.3, since this task benefits more from consistency than creativity.
Here’s the value I used for the Chunk test when testing the prompt:
Skip to content. Batch Testing, DialogGPT: Batch Testing is a testing framework for evaluating and validating the accuracy of intent detection in an AI Agent. It systematically tests understanding across Dialogs, FAQs, Knowledge (Search AI), and Conversation Intents. It supports multiple model configurations and provides metrics for both development and production environments. Unlike static testing, Batch Testing replicates the complete DialogGPT runtime pipeline for authentic performance insights.
Key features: End to End Pipeline Testing, Model Configuration Flexibility, Granular Performance Insights, and Lifecycle Support.
Supported Conversation Types: Single Intent, Multi Intent, Small Talk, Conversation Intent, No Intent, Ambiguous Intent, Answer Generation.
How it Works: (1) Query Rephrasing (if enabled), (2) Chunk Qualification from Dialogs, FAQs, and Search Index, (3) Semantic Similarity Matching based on thresholds, (4) LLM Processing for intent identification and fulfillment.
The framework enables validation of specific Conversation Intent Types including Hold, Restart, Refuse, End, Agent Transfer, and Repeat. During execution, results display expected vs detected intents to identify mismatches.
Expected intent format: ConversationIntent, Hold, ConversationIntent, Restart, ConversationIntent, Refuse, ConversationIntent, End, ConversationIntent, AgentTransfer, ConversationIntent, Repeat.
When importing via JSON or CSV, use the same format. For Quick Entry mode, select from the predefined list when fulfillment type is set to Conversation Intent.
Access Batch Testing under Automation AI > Virtual Assistant > Testing > Regression Testing > Batch Testing.
Steps: (1) Create a Test Suite via upload or manual entry. Each suite includes utterance, expected intent, linked app, and fulfillment type.
(2) To upload: navigate to Automation AI > Virtual Assistant > Testing > Regression Testing > Batch Testing, click +New Test Suite, enter name and description, upload file, add to suite, then click Create Suite.
Note: Do not use quotes around the text value, or you’ll get a JSON parsing error.
Please try to replicate my results. If you encounter an error, take a screenshot and share it here (making sure to remove any confidential information).
Observations and Suggestions:
The default enrichment prompt is working, but it’s overly minimal. It currently says:
“You are an expert at extracting the Metadata from this particular text. Extract the metadata from this text.”
That instruction technically works but doesn’t leverage what the LLM Stage is designed to do. According to the documentation, the LLM Stage allows the model to refine, update, or enrich chunks by generating additional metadata, deriving contextual tags, or updating fields based on the content. For example, if the chunk is a policy document, the LLM could automatically assign attributes like topic, category, or keywords, improving retrieval accuracy and ranking.
To take advantage of that capability, it helps to define what “enrichment” means for your use case. For example:
“You are an expert content analyzer. Review the provided text and generate structured metadata fields such as Title, Topic, Keywords, Summary, and Category. Return the output as a JSON object with these fields.”
This gives the model clear expectations, improves output consistency, and better aligns with the purpose of the enrichment stage.
I’d also recommend lowering the temperature to around 0.1–0.2 for this type of operation. You’ll get more stable and reproducible enrichments, which makes your test results easier to compare.
In short, the LLM Stage can do more than simple metadata extraction. With a more descriptive prompt, you turn it into a lightweight content classifier that enhances your retrieval pipeline rather than just annotating it.