The AI Vetting Playbook

Playback speed

Share post at current time

0:00

Transcript

The AI Vetting Playbook

How a Principal AI Expert at Siemens is helping enterprise leaders separate real innovation from smoke and mirrors.

Real Talk AI

Jul 10, 2025

Welcome back to Real Talk AI, where we separate hype from reality in the world of AI and technology. I’m Esther Katz—deep tech strategist, marketer, and AI optimist—and today, I had the privilege of sitting down with Maria Sukhareva, Principal AI Expert at Siemens.

I just returned from a few days in South Florida. It was supposed to be relaxing, but what stuck with me wasn’t the beach—it was the iguanas. Everywhere. On rooftops, across sidewalks, lounging by the pool like they ran the place. It made me think about the state of AI startups right now. So many of them seem to have appeared overnight, crawling into every business pitch and product roadmap, whether they belong or not.

That’s what makes Maria’s work so essential. She operates in the high-stakes, zero-margin-for-error world of industrial AI, where hype doesn’t fly and failures cost real money. In our conversation, she shares how to properly vet AI startups, how to choose the right consulting partner, and why so many enterprise AI efforts fail before they even begin.

EK:
Maria, welcome. Let’s start with the basics. How did you get into AI safety and implementation?

Maria Sukhareva (MS):
Thank you, Esther. I’ve been working with AI for a while, and over time I naturally gravitated toward evaluating AI systems from both a technical and organizational risk perspective. At Siemens, the environments we operate in—industrial automation, infrastructure, energy—are very complex. The consequences of a failure aren’t just lost revenue. They can involve safety hazards, reputational damage, even legal liability.

That means we have to ask hard questions upfront. Not just “Does it work?” but “Can it be trusted in this environment?” and “What does it break when it breaks?”

EK:
Let’s go straight into the first topic: how to vet an AI startup before you invest. What’s the first thing you look at?

MS:
I usually start by asking: What part of the stack do they own? Are they building core models? Do they rely on OpenAI or Hugging Face models? Or are they simply fine-tuning and wrapping models into SaaS?

If they don’t own any part of the AI pipeline and can’t articulate the value of their tuning or data, that’s a red flag. It means they are vulnerable to shifts in API pricing, platform policy, and performance regressions that are out of their control.

EK:
And when founders claim “We’ve built our own proprietary models”?

MS:
I immediately ask: On what data? If they say “on our clients’ data,” I ask what preprocessing pipeline they built, what evaluation metrics they’re using, and whether they’re measuring performance drift.

Sometimes, founders confuse having a database with having a training dataset. Just storing data doesn’t make it ready for modeling.

EK:
Let’s say the startup has some traction. How do you evaluate their tech maturity?

MS:
Good question. I often look at:

Versioning: Do they have model version control? Not just GitHub for code, but for model weights and pipelines.
Monitoring: What does their model monitoring look like in production? Are they measuring performance over time or just during dev?
Evaluation suite: Do they have a benchmark that reflects real user scenarios—or just a cherry-picked accuracy number from an academic dataset?

Most startups overfit to the demo and underestimate how much work goes into reliability post-deployment.

EK:
Any deal-breakers?

MS:
Yes: when a founder refuses to show model failure modes. I always ask, “When does your model break?” If they say “It doesn’t,” I know they don’t understand the real-world nature of machine learning. Every model breaks. Mature teams can talk about it without flinching.

EK:
Beautiful. Let's shift to the second topic: choosing the right AI engineering or consulting partner. What’s your first move when evaluating a vendor?

MS:
I ignore their slide deck and start by asking for examples of past implementations—real ones. I want to know:

What was the client context?
What were the inputs and outputs?
What didn’t work?
What did they learn?

A lot of firms show generic case studies with buzzwords like “optimization” or “NLP engine” but no specifics. If they can’t explain a project from end to end in plain language, I don’t trust them.

EK:
What do you make of AI agencies that promise to do everything—prompt engineering, LLM app development, data pipelines?

MS:
That’s another red flag. If they say they “do everything,” they usually don’t do anything well. The field is evolving so fast, and specialization matters. Someone building retrieval-augmented generation (RAG) systems is in a very different domain than someone fine-tuning computer vision models for factory inspection.

I often ask, “What do you say no to?” If they can’t answer, I walk away.

EK:
What about evaluating partners for safety and compliance?

MS:
That’s huge. In regulated environments, you can’t just throw an AI model into production without thinking about traceability, auditability, and fallback strategies.

I look for:

Can the partner provide explanations for model outputs?
Do they implement human-in-the-loop safeguards?
Do they have logs for decision making and error cases?

And importantly: how do they handle data privacy and model retraining when data distributions shift?

EK:
Let's zoom in on one recurring issue: lots of people say they’re “doing AI,” but they’re really doing glorified automation or analytics. What’s your take?

MS:
Exactly. Many so-called “AI projects” are just conditional logic or dashboards with auto-tagging. That’s not AI.

To me, AI implies adaptivity. If the system can’t update, learn, or generalize—it’s not intelligent. And that matters because if you deploy it in the wild and it can’t handle novelty, it will fail hard.

EK:
Say a mid-sized enterprise wants to bring in AI. How should they start?

MS:
I recommend starting with a single, high-impact workflow that has clear inputs and outputs. Something repetitive but valuable. Document what you know. Then bring in a partner to evaluate feasibility—not to build the whole thing from day one.

The mistake I see most often is companies going for “AI transformation” instead of “AI experiment.” That leads to waste.

EK:
And finally, how do you see the future of AI adoption in enterprise?

MS:
I think we’ll see more toolchain maturity—like how DevOps evolved. Right now, MLOps is still catching up. But soon, companies will stop chasing models and start investing in infrastructure:

Better data contracts.
Evaluation frameworks.
Reliable observability.
Feedback loops between business and ML teams.

In the long run, it’s not the smartest model that wins. It’s the most maintainable system.

EK:
Maria, thank you. This was an incredible masterclass in AI realism. Before we go, where can people find you or learn more about your work?

MS:
You can find me on LinkedIn. I post occasionally about AI systems thinking, and I’m always happy to connect with people serious about responsible AI.

If you’ve ever been pitched by an AI startup or stared at a vendor deck full of acronyms and wondered what’s actually real—Maria just gave you the manual.

Too many decision-makers are still falling for AI theater. Don’t be one of them.

If this resonated, subscribe to Real Talk AI and share this episode with a colleague evaluating vendors or exploring AI for their team.

Until next time,
Esther Katz