Wes Herzik

№ 01 Experiment · Agent Advisory Live

Governance for autonomous agents. Verdicts, with citations.

A governance experiment that evaluates an autonomous agent and returns a stoplight verdict with the regulatory clauses behind it.

Problem

Enterprises are moving quickly to deploy autonomous agents, but governance has not kept pace. Most organizations still lack the operating model, inventory, and control structure needed to evaluate agents before they affect customers, employees, or business-critical decisions. The result is a widening gap between AI ambition and enterprise readiness.

The gap

The regulatory and risk frameworks already exist, including the EU AI Act, NIST AI RMF, ISO/IEC 42001, and the Colorado AI Act. The harder problem is translating those frameworks into practical decisions at the point where an agent is being designed, approved, or launched. Teams need a way to understand what rules apply, what controls are required, and what evidence a reviewer can verify.

What it does

Enter Agent Advisory. Agent Advisory evaluates a proposed or deployed agent and returns a clear governance verdict: launch, launch with conditions, or do not launch as described. Each verdict is tied to supporting regulatory clauses and control expectations. If the system cannot cite a source with confidence, it returns insufficient information rather than inventing an answer.

How it works

The system uses a sequence of specialized reasoning agents to assess classification, autonomy, control architecture, charter synthesis, and runtime incident response. A curated regulatory corpus supports clause-level retrieval, while a durable workflow runtime manages retries, timeouts, and audit logging. Each verdict is designed to be reproducible, reviewable, and exportable. The system also includes signed verdicts, an offline verifier, and an evaluation harness that tests for unsupported citations before results reach the user.

Examples

Three seeded agents show how the system evaluates different risk profiles.

YELLOW. A bank wire-fraud triage assistant may trigger high-risk AI obligations, GDPR Article 22 considerations, and model risk validation requirements.
RED. A global supply-chain inventory rebalancer can place thousands of binding orders per day, with significant financial exposure and limited human review.
GREEN. A healthcare appointment optimizer has a narrow scope, strong human oversight, and minimal data exposure.

What it is

I created Agent Advisory as a working experiment in agentic governance as an operational system, not a policy document or commercial product. It demonstrates how governance can move from abstract policy to clear decision rights, ownership, and evidence-backed verdicts at the moment an agent is designed, approved, or launched.

Demo

Try Agent Advisory agent-charter.vercel.app →

Add your own agent, or evaluate one of the seeded examples to see how the verdict is formed.

Status: Live and ongoing. Five specialized agents run in the demo workflow.
Built: End-to-end across architecture, prompts, regulatory retrieval, evaluation, signing protocol, and product surfaces.
Verifiable: Each verdict is tied to cited clauses and exported with a tamper-evident signature. The bundle includes an offline verifier so a reviewer can confirm the result outside the app.

№ 02 Experiment · Agentic Discovery Engine Live

From prompt to evidence-weighted direction.

A multi-agent discovery pipeline that turns an early question into a structured, evidence-backed starting point for product and strategy teams.

Problem

Discovery is one of the most valuable phases of product work, but it's also one of the easiest to skip or do inconsistently. How well it's done often depends on the team, the timeline, and what evidence is available when decisions need to be made.

The gap

It's not that teams don't have the right methods. They already use things like Double Diamond, Jobs-to-be-Done, Lean Discovery, and internal playbooks. The real challenge is putting those methods into action quickly, right when a team is still shaping an opportunity. Early prompts need to be turned into testable directions with sufficient evidence and signals to support better decisions about what to do next.

Output

The Agent Discovery Engine provides a weighted direction, including a working hypothesis, the evidence supporting it, constraints, risks, and a confidence check. It's not a final brief or a recommendation. Instead, it's a structured starting point you can build on, challenge, or extend, with every section traceable to the evidence behind it.

How it works

Here's how it works: four reasoning agents, each with a job to do. The Hypothesis Framer turns your prompt into a testable claim. The Evidence Hunter finds external signals. The Feasibility and Risk Assessor looks at constraints and potential blockers. The Synthesizer puts it all together, making confidence a core part of the output instead of an afterthought.

What it is

Agent Discovery Engine is a working experiment in turning ambiguity into structured direction using agents, evidence, and confidence signals. It is not a commercial product or deployed enterprise system. It is a demonstration of human-agent interaction in early decision work: people stay in the lead, the agents do the legwork, and confidence is treated as a first-class output rather than a footnote.

Demo

Run a discovery prompt agentic-discovery-engine.vercel.app →

Returns a weighted direction with cited evidence in minutes.

Status: Live and ongoing. Four agents currently run in the workflow.
Confidence: Surfaced directly in the output. Areas with limited evidence are flagged instead of being presented as certainty.
Traceability: Each section links back to its supporting evidence, making the output easier to review, challenge, and build on.

№ 03 Enterprise Fieldwork

The hidden middle.

Why enterprise AI keeps stalling at the agent layer

Some organizations have an AI strategy. Many have a wall of pilots. Almost nobody has mapped the layer that should sit underneath both. The operational capabilities AI is actually supposed to act on. Workflows, decisions, escalations, knowledge handoffs, the connective tissue of how work moves. That layer isn't missing. Most of the time, it's hidden.

The pattern I want to describe shows up almost everywhere I've seen enterprise AI attempted, across multiple companies and industries.

In a large enough organization, different teams routinely solve the same problem without realizing it. A workflow team on one product rebuilds the same escalation pattern that another team on another product just shipped under a different name. Multiply that across every product and servicing group in a large enterprise, and you stop seeing dozens of unrelated issues. You start seeing one issue: nobody can see the whole picture, so everyone keeps reinventing the parts.

In one engagement, I ran this exercise directly. A cross-functional group mapped the work at the capability level rather than the workflow level, resulting in 30 macro capabilities, 26 of which were shared across the products in scope. The shared capabilities clustered into three rough layers: data and intelligence, user interaction, and process orchestration and governance. The structure isn't novel. What mattered was having an evidence-backed frame that leaders could plan against.

The inventory wasn't the point. The point was the shift in how teams plan once the middle is no longer hidden. You can have a real conversation about what gets designed once and reused, what gets governed centrally, and what gets owned where. Without that visibility, every team's build brings its own version of the same patterns, governance, controls, and escalation paths. You end up with the same sprawl, just more confident about it.

This matters more than it used to. Hidden middles were tolerable when humans were doing all the work, because humans are flexible. A person can absorb six variants of the same escalation pattern across six products and just figure it out. AI agents and automation can sometimes muddle through, but not reliably, not auditably, and not at scale. They need named, stable, shared capabilities to operate against. An agent that's supposed to handle escalation can't do that dependably if escalation lives as six undocumented variants under six different names. The hidden middle becomes the ceiling on how far AI can scale inside an enterprise. The models can be excellent. If the capabilities underneath them aren't visible, there is no coherent path for the agents to land.

This is the operating-model layer of AI strategy. It is the part most AI programs skip. Strategy teams don't think it's their job. Engineering teams don't think it's their job. The middle stays hidden, builds multiply, three years pass, and someone eventually asks why the AI investment isn't compounding. This is why.

№ 04 Platform Fieldwork

The evidence has to reach the decision.

On timing customer evidence so it can still change the build

It is easy to agree that teams should understand customers before they build. That is not the hard part.

The harder question is whether customer evidence reaches the decisions that actually shape the work: what gets funded, what gets cut, what gets sequenced first, what risks are accepted, what gets measured, and what assumptions become too expensive to revisit.

In enterprise builds, evidence can exist and still miss the moment that matters. A team learns something true about the customer, but the roadmap has already moved. Engineering has already made tradeoffs. Leaders have already gotten used to the plan's shape. The work has started to organize itself around the assumptions already in front of it.

By then, the evidence may still be useful. But it is no longer early enough to change the important decisions.

That is the problem.

Not a lack of insight. A timing problem.

A few years ago, I worked on a cloud services platform where we had a chance to run the work differently. The platform was the management interface that customers used to operate their cloud services. It helped them understand what they owned, what they were using, what needed attention, and where they needed to act.

The next phase of work was meant to make the platform smarter: prediction, in-context suggestions, better alerts, and automation across parts of the operation that support the customer.

The team did not need a generic readout on customer needs. It needed evidence close enough to the work to change product direction while that direction was still forming.

That distinction mattered.

We spoke with customers, employees, and internal experts who shaped the customer experience. We tested specific questions about key moments in the customer lifecycle. But the important choice was not the method. It was where the evidence landed.

The findings were not held for a final report after the plan had hardened. They showed up in the conversations where roadmap, scope, and product direction were still being decided.

That changed what the team could see.

Reporting was fragmented across products that customers expected to manage as one environment. Cost visibility was weak at the exact moment customers were trying to justify spending inside their own companies. Usage monitoring did not answer the capacity questions customers were actually asking. Trust was being lost during adoption and renewal, when the platform needed to be most credible.

One customer walked us through how she managed her cloud spend and said:

"This is the part where I just open a spreadsheet and do it myself, because the platform doesn't really help here."

That sentence mattered because it was not a complaint. It was the work.

She was showing us the point at which the platform stopped helping, and the customer built her own operating layer outside it. The issue was not a missing chart or a better screen. It was a decision the product had not earned the right to support.

Because that evidence arrived while decisions were still open, the work moved.

Predictive right-sizing pointed more directly at the capacity decisions customers were struggling with. Smarter notifications were aimed at moments where trust was already fragile. Reporting work had to account for how customers understood their environment across products, not how the company organized those products internally.

The work changed because the evidence reached a decision.

That is the part worth carrying forward. Not the interview count. Not the sprint structure. Not the research plan.

The structural choice.

Customer evidence is not valuable because it exists somewhere in the building. It is valuable when it reaches the people making decisions, while those decisions can still change.

That sounds obvious until the build starts moving.

Once funding is approved, scope is set, teams are staffed, and dates are committed, evidence has to work much harder to have any effect. At that point, it is no longer competing with an idea. It is competing with momentum.

This is why "getting closer to the customer" is not specific enough. The question is closer to which decision, at what moment, with enough force to change the work?

If the evidence only informs the team after the plan is set, it becomes context. If it arrives while the plan is being made, it can change the product, the sequencing, the risk model, and the measures of success.

That was the lesson from the cloud platform work.

The team did not need more customer empathy. It needed customer evidence in the room where decisions were being made.

The fix is not more research. It is not another discovery phase. It is a build rhythm in which customer evidence informs decisions that shape the work before those decisions become too expensive to change.

Start the conversation.

If something here resonates, whether it’s a live experiment, a piece of fieldwork, or the missing-middle work itself, I’d like to hear what you’re working on.

01 — The form

Send a note

A short message is fine. Add your email only if you'd like a reply.

02 — The open channel

Or reach me on LinkedIn

The surest channel. Connect there if you'd rather skip the form, or if the conversation should run in the open.

linkedin.com/in/williamherzik →

The missing middle between AI strategy and operating reality.