Website QA intelligence for teams who ship
Guides Tool Comparisons QA Glossary Archive RSS Feed
background e-commerce incidents 1 sources

How are you testing LLM behavior in production? Looking for real workflows

Hey everyone, I've been building AI-first products and integrating LLMs into production systems for a while. At some point I needed more confidence in what I was shipping and started looking into automated evals — couldn't find anything that integrated cleanly with Playwright and Vitest, so I ended up writing some lightweight extensions for internal use. Now I'm not sure whether to open source them or just delete them — depends on whether this is actually a problem other people have. But first —

Flagged via r/QualityAssurance.