Skip to main content

スピーカー

Manoj Kumar (マノージ・クマール)

Biography

Manoj is a seasoned professional and adept technologist. Presently, Manoj works at Planit as the Director- NextGen Solutions. He enjoys exploring the entire software development lifecycle and is especially interested in solving problems in Software Quality, Digital Transformation, Human-Computer Interaction, and Cloud Computing spaces.

Having worked in enterprises, Fintech, and early-stage startups for over 16+ years, Manoj brings a wealth of experience. Notably, he is a contributor to the Selenium project and serves on the project leadership committee for Selenium.

He genuinely believes that sharing knowledge and experiences strengthens our community. Manoj is a member and Distinguished Speaker of the ACM and IEEE Computer Society. He has given keynote addresses and technical talks at numerous international conferences on software engineering and testing in over 15 countries.

Manoj has previously worked at startups like Applitools and LambdaTest and has been a part of digital transformation programs at leading companies such as ThoughtWorks, Wipro, and IAG, among others.

プレゼンテーション

Trust, but Verify: Quality Engineering for AI Systems

 

Artificial Intelligence is rapidly moving into real world systems, yet the way we test software has barely changed.

Traditional testing assumes that a given input should produce a predictable and correct output. AI systems challenge this assumption. Large language models and AI driven workflows produce probabilistic responses that may vary between runs and depend heavily on prompts, context, and retrieved information.

This creates a new challenge for testers. When multiple responses may be acceptable, how do we define quality and verify system behaviour?

This talk explores why traditional testing approaches struggle with AI systems and how quality engineering must evolve. Using examples from AI assisted development tools and retrieval based systems, we examine common failure patterns across model behaviour, context retrieval, and orchestration layers.

The session introduces practical evaluation techniques that testers can apply to AI systems. Instead of relying only on deterministic assertions, teams can use behavioural metrics and reference based evaluation methods such as semantic similarity, grounding checks, BLEU, and ROUGE to measure response quality, consistency, and factual alignment.

Attendees will leave with a clear mental model for testing AI systems and practical evaluation techniques they can begin applying in real AI powered products.

 

What will the audience get out of this talk?

By the end of this session, attendees will:

  • Understand why traditional testing does not work well for AI systems
  • Learn a testing taxonomy for AI systems across model, prompt, retrieval, and orchestration layer
  • Understand how to test systems with probabilistic outputs and engineer confidence
  • Learn practical evaluation methods like semantic similarity, grounding checks, BLEU and ROUGE
  • Leave with a practical approach for testing real AI and LLM-based systems