What happens when two engineering teams just... talk

Manouk Draisma
Last week we hosted the first of what we're planning to make a regular thing: an engineering get-together with Altura, one of our customers. No agenda beyond learning from each other. And it ended up being one of the more useful evenings we've had in a while.
The initiative came from Lorenzo, VP of Engineering at Altura. He'd seen it work before — back when he was in Australia, two completely unrelated engineering teams would meet up regularly. Different companies, different stacks, same genuine curiosity about how the other side was solving problems. He never forgot how useful it was. With AI moving at the speed it's moving now, he figured it was time to revive the format. We were immediately on board.
Martijn's talk: the real story behind choosing a tool
Martijn, AI engineer at Altura, opened the evening with something we don't often get to hear — a completely candid account of how they ended up using LangWatch.
He started at the beginning: joining Altura and finding essentially zero visibility into what the LLMs were doing. Customer tickets would come in, engineers would try to reproduce issues, and there was simply nothing to go on. No traces, no logs worth anything, just a live stream in Azure if you were lucky. The phrase he used was: "What the fuck is going on?" — and he described it less as a single dramatic incident and more as a slow accumulation of that exact feeling.
His first fix was Langfuse. Practical choice, fast to set up, one import swap and everything got captured. But as the complexity grew — multi-step workflows, different industries with totally different vocabularies and requirements — observability alone stopped being enough. The question shifted from "what happened?" to "is this actually good?" And that's where things got hard.
He talked about the challenge of translating technical metrics into something leadership could act on. Having six or seven metrics for a single pipeline component isn't useful for an OKR. And an LLM judge running in the background for five months that nobody looks at isn't an evaluation strategy — it's theater. (He mentioned, almost in passing, that he had turned those off that very morning.)
What drew him to LangWatch over the alternatives, he said, was that we're opinionated. Not in an arrogant way, but in a "here is a path, here is a structure, here is what good looks like" way. In a domain where almost nobody agrees on best practices and most AI engineers are, as he put it, "figuring it out as they go," that direction has real value. The MCP integration mattered too — by the time they were evaluating tools, Claude Code had become central to how Altura builds.
For our team, this was genuinely illuminating. We know why customers choose us in the abstract. Hearing it traced through the full story — the messy reality before, the failed attempts, the actual decision — is a different kind of knowledge.
Rogerio's talk: how we build (and stay sane doing it)
Rogerio then walked through how the LangWatch engineering team works — specifically how we try to move fast with AI without losing our minds or our codebase.
The core of it was BDD specs: writing behavior-driven specs before any implementation. Not as bureaucracy, but as a forcing function for clear thinking. Three letters — write the BDD specs first — and the model already knows exactly what format to use. Each scenario is atomic, token-light, non-ambiguous. It becomes the living documentation of what the system actually does, committed to Git, updated as behavior changes.
He showed Kanban Code — the open-source macOS app he built to manage multiple parallel Claude sessions, with each card representing an agent working on a branch, moving through states from backlog to done. The insight behind it wasn't the tool itself but the principle: when you have five agents running simultaneously, your cognitive load isn't about the code anymore, it's about knowing where everything is and what needs your attention.
The discussion that followed was honest and wide-ranging. How do you handle the parts of code review that actually matter versus the parts you can let go? How do you keep specs up to date without it becoming its own overhead? What's the right architecture when you don't want to over-specify but also don't want the AI to make a mess? There were no clean answers. There rarely are. But the conversation itself — two teams comparing notes on problems they're both living with — was exactly the point.
Why we're making this a monthly thing
The AI engineering space is accumulating hard-won knowledge fast. Most of it lives in Slack channels, late-night experiments, and conversations between people who happened to sit next to each other. The industry is moving too quickly for any one team to stay current by reading Twitter.
Lorenzo's instinct was right. The format works. You learn different things from talking to people you trust and respect than you learn from conference talks or documentation. You hear the failures, the detours, the things that almost worked. That's the stuff that actually changes how you build.
We're planning to run this monthly — rotating hosts, rotating topics, sometimes guest speakers from outside both teams. If you're an engineering team building seriously with AI and want to be part of the conversation, reach out.

