background
e-commerce incidents
1 sources
How are you testing LLM behavior in production? Looking for real workflows
What happened
Hey everyone, I've been building AI-first products and integrating LLMs into production systems for a while. At some point I needed more confidence in what I was shipping and started looking into automated evals — couldn't find anything that integrated cleanly with Playwright and Vitest, so I ended up writing some lightweight extensions for internal use. Now I'm not sure whether to open source them or just delete them — depends on whether this is actually a problem other people have. But first —
Business impact
Flagged via r/QualityAssurance.
Sources
-
How are you testing LLM behavior in production? Looking for real workflows
r/QualityAssurance
Related stories