How Vinny uses LangWatch Scenario to slash manual testing and accelerate customer onboarding

AI-powered agent simulations help Vinny’s team validate chatbot configurations in minutes instead of hours

Company: Vinny (askvinny.com) - AI-powered property management assistant

Industry: PropTech / Property Management

Use case: End-to-end agent simulation testing during customer onboarding and ongoing QA

LangWatch product: Scenario (agent simulations)

Key integration: LangWatch MCP Server, production database (read-only replica), Slack

What are the key results with LangWatch?

Hours → Minutes

Testing cycle reduction

Non-Technical

Team members run tests

Production-Grade

Test scenario realism

About Vinny

Vinny is an AI-powered property management assistant that automates leasing, renewals, arrears, and maintenance communication for property teams across the UK, EU, and US. Operating across channels including voice, WhatsApp, email, and web portals, Vinny handles everything from qualifying leads and scheduling viewings to triaging maintenance requests and tracking compliance. The platform currently manages over 1,200 properties with commercial agreements for an additional 2,500 units.

Because every property portfolio is different — unique buildings, pricing structures, FAQs, and tenant requirements — Vinny’s onboarding process involves significant configuration for each customer. Getting these configurations right is critical: a misconfigured chatbot that returns wrong pricing or mishandles a maintenance request can erode tenant trust and customer confidence.

The AI agent challenge: Manual testing couldn’t keep up

Before adopting LangWatch Scenario, Vinny’s quality assurance process was almost entirely manual. Each time a new customer was onboarded or a configuration change was made, a customer success manager had to personally step into the role of a tenant or landlord and have test conversations with the chatbot to verify it was responding correctly.

This manual approach presented several problems. It was time-consuming, with each customer success manager spending significant time simulating conversations for every configuration change. It was difficult to scale, since as the portfolio grew, the number of unique configurations multiplied. The coverage was limited because manual testers could only explore a fraction of possible conversation paths. And bug validation was slow — when a customer reported an issue, the team had to manually reproduce it, implement a fix, and then manually verify the fix again.

“Previously, one of our customer success managers was spinning up the bot and having those conversations manually. It was taking a lot of time, and we couldn’t scale it.”
Vinny Engineering Team

The solution: LangWatch Scenario with deep production context

Vinny’s team built an end-to-end testing workflow powered by LangWatch Scenario that fundamentally changed how they validate their AI assistant. The solution is notable not just for its automation, but for the depth of real-world context it brings to every test.

A Slack-first workflow for non-technical teams

Vinny created a dedicated Slack channel where sales, onboarding, and customer success team members — none of whom need to be technical — can request end-to-end test scenarios. A team member simply describes the scenario for the organization they want to test, and the system handles the rest.

“We’ve created a Slack channel dedicated to end-to-end testing for our sales team, our onboarding teams, and non-technical persons. They just describe the scenario for the organization they want to test, and we spin up the agent to generate and run scenarios.”
Vinny Engineering Team

Production-grade test scenarios

What makes Vinny’s implementation particularly powerful is how deeply the scenario generation agent is plugged into real production context. The system connects to three critical data sources to ensure every test scenario mirrors real-world conditions.

First, it uses the LangWatch MCP Server to review previous conversation traces, understanding how the chatbot has been performing historically. Second, it connects to a read-only replica of the production database to fetch organization-specific settings, including available buildings, pricing structures, and custom FAQs. Third, it integrates with traces to pull context from previous conversations, enabling the agent to account for known issues and edge cases during scenario generation.

“To spin up a scenario, we connect through the LangWatch MCP to see previous traces. We connect to our read-only replica of the production database to fetch the organization settings: buildings available, pricing, FAQs. We also allow the agent to query our traces for debugging. The agent that generates scenarios is plugged into a lot of context to generate the best real-world conversations for us.”
— Rory, CTO

Intelligent Scenario generation

Using all of this context, the system leverages Claude to generate realistic test scenarios as structured JSON, complete with judge criteria for automated evaluation. The team maintains a library of static base scenarios for each organization, which the AI agent uses as a foundation to create targeted ad hoc scenarios for specific parts of the system. Each generated scenario can be reviewed, triggered directly, and the results are automatically pushed to the LangWatch platform for record-keeping and reuse.

The workflow in action

The end-to-end flow works seamlessly across Vinny’s toolchain. A team member describes a test scenario in the dedicated Slack channel. The system spins up an agent that pulls context from LangWatch traces, the production database, and Datadog logs. Based on this context and existing static scenarios, the agent generates tailored test scenarios with evaluation criteria. The team reviews the generated scenarios and triggers them with a single button click. Vinny’s chatbot then engages with a second agent that simulates a realistic tenant or landlord conversation. Results are logged back to Slack and recorded in LangWatch’s simulation run history for future reference.

“We log the results into our Slack channels, and it’s also visible in LangWatch. If we go to simulations run history, we can see the results of all the testing that’s happened. It’s pretty powerful.”
Vinny Engineering Team

The real impact LangWatch gives

Faster onboarding validation

During customer onboarding, there is a substantial amount of configuration required to get Vinny set up for each property portfolio. Every building is different, and every customer’s requirements are slightly different. Now, instead of manually testing each configuration, the team sets up what they believe is correct, asks the agent to spin up scenarios based on that customer’s specific needs, and instantly identifies any gaps or misconfigurations.

Concrete bug validation

When a customer reports an issue — for example, that a conversation should have done X but did Y — the team can implement what they believe is a fix, spin up a targeted test scenario, and provide the customer with concrete evidence that the issue has been resolved.

“Customer says, ‘hey, this conversation should have done X, but it did Y.’ We can put in what we think is a fix, spin up a test, and then give them really concrete evidence that this has been fixed, so they don’t need to worry about it anymore.”

— Vinny Engineering Team

A real example: Fixing pricing errors overnight

In a recent case, one of Vinny’s customer organizations was experiencing incorrect pricing being returned by the chatbot. The team generated targeted test scenarios based on the issue, identified that the root cause was a configuration mismatch in the customer’s property management system (PMS), and asked the customer to align their configuration. The next day, they re-ran the scenarios and confirmed everything was working correctly — a turnaround that would have taken significantly longer with manual testing.

“We had an issue with one organization where Vinny was returning wrong pricing. We generated tests based on it, asked them to align the configuration on their side, and the next day we ran the scenarios again — everything was working fine. It’s working in production and it’s used by our managers.”

— Vinny Engineering Team

Why LangWatch Scenario

The Vinny team highlighted several reasons why LangWatch Scenario has become central to their workflow. The MCP Server integration allows their scenario generation agent to pull real conversation history, making tests grounded in actual usage patterns. The simulation run history provides a persistent record of all tests, enabling the team to rerun scenarios, compare results over time, and maintain a growing regression test suite. The platform is flexible enough to support Vinny’s Slack-first, non-technical workflow while also providing the depth that engineers need for debugging. And the judge criteria in generated scenarios enable automated pass/fail evaluation, removing subjectivity from the QA process.

“It’s pretty powerful, and we are really, really happy to use this tool from you guys.”
— Mateusz, AI Engineer

Looking ahead

Vinny’s implementation of LangWatch Scenario demonstrates how AI-native companies can leverage agent simulations not just as a testing tool, but as a core part of their customer success workflow. By combining deep production context, intelligent scenario generation, and a non-technical interface, Vinny has transformed what was once a manual, time-consuming process into an automated pipeline that delivers faster onboarding, quicker bug resolution, and greater confidence for both their team and their customers.

As Vinny continues to scale across the UK, EU, and US markets, LangWatch Scenario will scale with them, ensuring that every new customer’s chatbot is validated thoroughly before going live, and that every reported issue is resolved with evidence-backed confidence. Voice agent testing is also something that will be addressed with LangWatch.

Learn more about LangWatch Scenario at scenario.langwatch.ai