After over a decade in product management—drafting, iterating and owning roadmaps, scaling products/platforms, launching enterprise tools, I assumed moving into the world of AI would be a familiar path. I brought my usual toolkit: feature prioritization, cross-functional coordination, and adoption metrics. But as I took my baby steps and explored AI-powered and generative tools, especially early autonomous agents, I quickly realized this wasn’t business as usual.
These systems were fast, flexible, and capable of producing surprisingly useful outputs in seconds; they were also unpredictable. They didn’t just deliver answers, they often invented them. They hallucinated. They adapted. And sometimes responded with confidence even when they were far from the point. As a product manager, I found myself asking a question I rarely had to ask before: Can I trust the response generated by these systems and base my work on it?
That was my wake-up call. That’s when it became clear for me. You can’t measure AI the same way you measure traditional software. Traditional key performance indicators (KPIs)—reliability, throughput, error rates—only scratched the surface. Speed and utility are only part of the equation. Trust, reliability, and guardrails take on a whole new meaning when the product is generative and probabilistic by nature.
So, how do we measure something so fluid, so human-adjacent?
That question became the catalyst for my deep dive into AI evaluation. It led me to research, study industry practices, and ultimately to write “KPIs for AI Agents and Generative AI: A Rigorous Framework for Evaluation and Accountability,” published in the International Journal of Scientific Research and Modern Technology in 2024.
In this post, I go beyond the paper and share how that framework changed my own thinking, why it matters, and the practical playbook I now use as a product manager navigating AI.
Foundational Lens for Thinking About AI Products
As I studied AI evaluation practices across the industry, I understood that AI systems can’t be measured purely through technical performance. Instead, they need to be evaluated across multiple dimensions that reflect not just how well they function, but how responsibly they behave.
Through that learning, the following five-dimensional KPI framework has since become a foundational lens for me when thinking about AI products.
- Model Quality—Accuracy, reliability, and creativity
- System Performance—Efficiency, scalability, resilience
- Business Impact—ROI, productivity, market relevance
- Human-AI Interaction—Usability, trust, and user adoption
- Ethical and Environmental Considerations—Fairness, explainability, sustainability, and ethical drift
This framework completely changed how I think about product evaluation. In traditional software, things either work or they don’t: it’s binary, predictable. But AI is different. It learns, it adapts, and sometimes it behaves in ways you don’t expect. This shift forced me to move beyond checking boxes like uptime and accuracy, to start thinking about trust, fairness, and long-term impact. That framework has become my go-to guide as I continue learning and finding my footing in the world of AI product development.
Flying Blind
As I dug deeper into AI, one thing became obvious: if we don’t have a solid way to measure these systems, we’re flying blind.
Without clear ways to measure AI, we risk two extremes: overhyping capabilities, or deploying systems irresponsibly. This framework creates a shared language across product, engineering, compliance, and leadership keeping everyone aligned on building AI that’s not just powerful, but trustworthy and human-centered.
My Playbook for PMs Moving Into AI
Stepping into AI as a product manager is exciting, but it’s a different game. The rules, the risks, and even the definitions of success are constantly evolving. Here’s the checklist I wish I had when I started:
- Redefine success: Go beyond accuracy. Focus on trust and ethics.
- Make metrics business-relevant: Translate technical KPIs into outcomes leadership understands.
- Put users first: Track adoption, satisfaction, and real engagement.
- Embed ethics early: Partner with legal, measure fairness and explainability from the start.
- Monitor in real time: Watch for bias, drift, and trust breakdowns, not just latency.
- Be ready for trade-offs: AI is full of them. Document decisions clearly.
- Keep evolving: What works in a prototype won’t hold in production. Adjust constantly.
Measuring the Unmeasurable
AI isn’t just code anymore. It learns, adapts, and evolves. That means we need a new way to measure success. One that goes beyond speed and accuracy, and asks the harder questions:
- Can people trust what AI produces?
- Is AI fair and responsible?
- Is AI solving real problems or just showing off?
We’re not throwing out traditional KPIs; rather, we’re building on them. Because with AI, it’s not just about performance anymore. It’s about principles.
This is the next chapter in product management. And in AI, the way we measure success truly matters.
Vivek Sunkara is a Technology Product Manager at Citi, transforming Risks & Controls data into actionable insights that drive strategic growth. A BCS Member, IEEE Senior Member, IETE Fellow, and ACM professional member, he is an ‘AI-first’ product leader focused on building products and emotionally resonant user experiences.




Join the Discussion (0)
Become a Member or Sign In to Post a Comment