Multimodal File Analysis

This page demonstrates how to write Scenario tests where the user provides files (PDF, CSV, etc.) as part of the conversation and the agent must parse and respond appropriately.

Understanding Test Fixtures

Before diving into code, let's talk about test fixtures — the sample files you'll use to test your agent.

What Are Test Fixtures?

Test fixtures are pre-prepared files (PDFs, CSVs, images, etc.) that serve as controlled inputs for your tests. Think of them as the "props" in your testing stage. Just like actors rehearse with the same props each time, your agent should be tested against consistent, well-defined files.

Why Fixtures Matter

Reproducibility: Tests run the same way every time with the same inputs
Coverage: Different fixtures test different scenarios (edge cases, happy paths, error conditions)
Documentation: Fixtures serve as examples of what your agent should handle
Debugging: When a test fails, you can inspect the exact file that caused the issue

Organizing Your Test Fixtures

You should create a dedicated fixtures folder in your test directory to store all your test files. Organize them by file type (e.g., fixtures/pdfs/, fixtures/csvs/, fixtures/images/).

Key principles:

One folder per file type: Separate PDFs, CSVs, images, etc.
Descriptive names: The filename should indicate what scenario it tests (e.g., financial-report-2024-q1.pdf, employee-database-small.csv)
Multiple variants: Have several examples for each scenario type (e.g., multiple financial reports, different-sized datasets)
Include edge cases: Empty files, corrupted files, unusually large files
Version fixtures: Keep different versions if testing historical behavior

Adding Files to Scenario Messages

Files are included in scenario messages using the OpenAI ChatCompletionMessageParam format. You can pass file content as base64-encoded data using the file type with file_data:

import base64
from pathlib import Path
 
# Encode file to base64
file_content = Path("/path/to/document.pdf").read_bytes()
base64_data = base64.b64encode(file_content).decode()
 
scenario.message({
    "role": "user",
    "content": [
        {"type": "text", "text": "Please summarize this document."},
        {
            "type": "file",
            "file": {
                "filename": "document.pdf",
                "file_data": f"data:application/pdf;base64,{base64_data}",
            },
        },
    ],
})

URL Support: The OpenAI ChatCompletionMessageParam format requires base64-encoded file data, not URLs. If you need to load files from URLs, download them first and encode to base64. Here's a helper example:

import base64
from urllib.request import urlopen
from urllib.parse import urlparse
 
def load_file_from_url(url: str) -> str:
    """Load a file from a URL and return as base64 data URL."""
    with urlopen(url) as response:
        file_content = response.read()
        base64_data = base64.b64encode(file_content).decode()
        content_type = response.headers.get("Content-Type", "application/octet-stream")
        return f"data:{content_type};base64,{base64_data}"
 
# Usage
file_data = load_file_from_url("https://example.com/document.pdf")
scenario.message({
    "role": "user",
    "content": [
        {"type": "text", "text": "Please summarize this document."},
        {
            "type": "file",
            "file": {
                "filename": "document.pdf",
                "file_data": file_data,
            },
        },
    ],
})

Example: PDF Summarization

Test that your agent can read a PDF document and provide a meaningful summary.

import base64
from pathlib import Path
import pytest
import scenario
 
# Path to test fixtures
FIXTURES_DIR = Path(__file__).parent / "fixtures"
PDF_PATH = FIXTURES_DIR / "sample_report.pdf"
 
 
@pytest.mark.asyncio
async def test_pdf_summarization():
    """Test that the agent can summarize a PDF document."""
 
    result = await scenario.run(
        name="PDF Summarization",
        description="Test that the agent can read a PDF document and provide a concise summary of its contents.",
        agents=[
            YourAgentAdapter(),
            scenario.UserSimulatorAgent(),
            scenario.JudgeAgent(
                criteria=[
                    "Agent provides a clear summary of the PDF contents",
                    "Summary captures the key information from the document",
                    "Response is well-organized and easy to read",
                ]
            ),
        ],
        script=[
            scenario.message({
                "role": "user",
                "content": [
                    {"type": "text", "text": "Please summarize this PDF document for me."},
                    {
                        "type": "file",
                        "file": {
                            "filename": "sample_report.pdf",
                            "file_data": f"data:application/pdf;base64,{base64.b64encode(PDF_PATH.read_bytes()).decode()}",
                        },
                    },
                ],
            }),
            scenario.agent(),
            scenario.judge(),
        ],
    )
 
    assert result.success, f"Scenario failed: {result.reasoning}"

Example: CSV Data Analysis

Test that your agent can parse a CSV file and answer questions about the data.

import base64
from pathlib import Path
import pytest
import scenario
 
FIXTURES_DIR = Path(__file__).parent / "fixtures"
CSV_PATH = FIXTURES_DIR / "employee_database.csv"
 
 
@pytest.mark.asyncio
async def test_csv_employee_analysis():
    """Test that the agent can analyze employee CSV and provide insights."""
 
    result = await scenario.run(
        name="Employee Database Analysis",
        description="Test that the agent can process an employee database CSV and provide accurate statistics about the workforce.",
        agents=[
            YourAgentAdapter(),
            scenario.UserSimulatorAgent(),
            scenario.JudgeAgent(
                criteria=[
                    "Agent identifies the total number of employees",
                    "Agent mentions the different departments present",
                    "Agent provides relevant statistics or insights about the data",
                ]
            ),
        ],
        script=[
            scenario.message({
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "Please analyze this employee database and give me a summary. How many employees are there? What departments exist?",
                    },
                    {
                        "type": "file",
                        "file": {
                            "filename": "employee_database.csv",
                            "file_data": f"data:text/csv;base64,{base64.b64encode(CSV_PATH.read_bytes()).decode()}",
                        },
                    },
                ],
            }),
            scenario.agent(),
            scenario.judge(),
        ],
    )
 
    assert result.success, f"Scenario failed: {result.reasoning}"

Scaling Your Tests: Multiple Files

As your test suite grows, you'll want to test your agent against multiple files for the same scenario. This ensures your agent works consistently across different variations of similar content.

Pattern 1: Looping Over Multiple Fixtures

When you have several files that test the same capability (e.g., multiple financial reports), you can loop over them to run the same test scenario with each file. This is excellent for comprehensive coverage.

import base64
from pathlib import Path
import pytest
import scenario
 
FIXTURES_DIR = Path(__file__).parent / "fixtures" / "pdfs"
 
# Define multiple PDF fixtures for the same scenario type
FINANCIAL_REPORTS = [
    "financial-report-2024-q1.pdf",
    "financial-report-2024-q2.pdf",
    "financial-report-2024-q3.pdf",
    "financial-report-2024-q4.pdf",
]
 
 
# Loop over each financial report using pytest parametrize
@pytest.mark.asyncio
@pytest.mark.parametrize("filename", FINANCIAL_REPORTS)
async def test_extract_revenue_from_multiple_reports(filename: str):
    """Test revenue extraction across multiple quarterly reports."""
 
    pdf_path = FIXTURES_DIR / filename
    file_content = pdf_path.read_bytes()
    base64_data = base64.b64encode(file_content).decode()
 
    result = await scenario.run(
        name=f"Financial Analysis - {filename}",
        description=f"Test revenue extraction from {filename}",
        agents=[
            YourAgentAdapter(),
            scenario.UserSimulatorAgent(),
            scenario.JudgeAgent(
                criteria=[
                    "Agent correctly identifies total revenue figures",
                    "Agent mentions the reporting period",
                    "Agent provides key financial metrics",
                ]
            ),
        ],
        script=[
            scenario.message({
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "What was the total revenue in this financial report?",
                    },
                    {
                        "type": "file",
                        "file": {
                            "filename": filename,
                            "file_data": f"data:application/pdf;base64,{base64_data}",
                        },
                    },
                ],
            }),
            scenario.agent(),
            scenario.judge(),
        ],
    )
 
    assert result.success, f"Scenario failed for {filename}: {result.reasoning}"

Benefits of looping:

Tests all your fixtures automatically — add a new PDF and it's instantly tested
Identifies which specific files cause failures
Ensures consistent behavior across variations

Pattern 2: Random Selection for Diverse Testing

Sometimes you want to test against one random file from a collection to keep test runs fast while still ensuring variety over time. This is useful for large fixture sets or in CI/CD pipelines.

import base64
import random
from pathlib import Path
import pytest
import scenario
 
FIXTURES_DIR = Path(__file__).parent / "fixtures" / "pdfs"
 
LEGAL_CONTRACTS = [
    "employment-contract-2023.pdf",
    "employment-contract-2024.pdf",
    "vendor-agreement.pdf",
    "service-level-agreement.pdf",
    "non-disclosure-agreement.pdf",
]
 
 
@pytest.mark.asyncio
async def test_identify_key_terms_random_contract():
    """Test contract analysis with a randomly selected legal document."""
 
    # Pick a random contract from the collection
    random_contract = random.choice(LEGAL_CONTRACTS)
    pdf_path = FIXTURES_DIR / random_contract
 
    print(f"Testing with: {random_contract}")  # Helps with debugging
 
    result = await scenario.run(
        name=f"Legal Analysis - {random_contract}",
        description="Test that agent can identify key contractual terms from legal documents",
        agents=[
            YourAgentAdapter(),
            scenario.UserSimulatorAgent(),
            scenario.JudgeAgent(
                criteria=[
                    "Agent identifies the type of legal document",
                    "Agent extracts key dates or terms",
                    "Agent mentions parties involved",
                ]
            ),
        ],
        script=[
            scenario.message({
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "Please review this contract and tell me the key terms.",
                    },
                    {
                        "type": "file",
                        "file": {
                            "filename": random_contract,
                            "file_data": f"data:application/pdf;base64,{base64.b64encode(pdf_path.read_bytes()).decode()}",
                        },
                    },
                ],
            }),
            scenario.agent(),
            scenario.judge(),
        ],
    )
 
    assert result.success, f"Scenario failed for {random_contract}: {result.reasoning}"

Benefits of random selection:

Faster test runs (only one file per test)
Still ensures variety across multiple test executions
Good for CI/CD where you want quick feedback

Pro tip: For comprehensive testing, use looping in your full test suite and random selection in quick smoke tests or during development.

Advanced: Cross-File Comparison and Multi-File Scenarios

Real users often need to work with multiple files simultaneously — comparing reports, aggregating data across documents, or validating consistency between sources. Testing these scenarios ensures your agent can synthesize information across multiple files.

Why Test Multi-File Scenarios?

Context Management: Validates that your agent can track information from multiple sources without confusion
Information Synthesis: Tests the agent's ability to combine and compare data across documents
Real-World Relevance: Users commonly upload multiple related files (e.g., "Compare Q1 and Q2 results")
Complex Reasoning: Requires higher-level analysis than single-file processing

Providing Multiple Files in a Single Message

To send multiple files at once, simply include multiple file objects in the content array:

scenario.message({
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": "Compare these two quarterly reports and tell me which had higher revenue.",
        },
        {
            "type": "file",
            "file": {
                "filename": "q1-report.pdf",
                "file_data": f"data:application/pdf;base64,{base64.b64encode(q1_path.read_bytes()).decode()}",
            },
        },
        {
            "type": "file",
            "file": {
                "filename": "q2-report.pdf",
                "file_data": f"data:application/pdf;base64,{base64.b64encode(q2_path.read_bytes()).decode()}",
            },
        },
    ],
})

Other multi-file scenarios to test:

Aggregation: "Here are 3 invoices — what's the total amount across all of them?"
Cross-validation: "Analyze this CSV and this PDF together — do the numbers match?"
Data enrichment: "Use this price list CSV to calculate totals for the items in this invoice PDF"

Use scenario.judgeAgent() with criteria like:

"Agent correctly identifies which report had higher revenue"
"Agent mentions specific numbers from both files"
"Agent doesn't confuse data between the two documents"

Real-World Example

For a complete, production-ready example, see the langwatch/multimodal-ai repository. It includes:

Full test scenarios for PDF and CSV analysis
AgentAdapter implementation for file handling
Organized fixture files and test structure
LangWatch instrumentation for observability

Ready to build your own? Start with better-agents to create production-ready AI agents with built-in testing, monitoring, and safety features.

Related Guides

Multimodal Images — Testing agents that process images
Audio to Text — Testing agents that transcribe audio
Fixtures Guide — Managing test fixtures