Verifyax User Guide

VerifyAX tests AI agents in realistic simulations and tells you exactly how they perform, giving you the clarity to act on what matters.

Term	Meaning
Agent	The AI system you want to verify — registered with a name, endpoint, and type (A2A or API). Treated as a black box.
Simulation	A multi-turn, multi-agent test scenario with objectives. Reusable across agents.
Skill Tags	Labels describing what a simulation tests (e.g. `negotiation`, `deception_detection`). Drive both generation and scoring.
Verification Run	One execution of an agent against a simulation — produces a transcript and metadata.
Evaluation	The scoring step after a run — structured metrics plus a report.
Batch	A group of related simulations generated, run, and evaluated together.
Aggregated Evaluation	Cross-run comparison showing mean scores, robustness metrics, and per-run breakdowns.
Credits	The currency for running verifications. Cost depends on model, token volume, and repetitions — see [Credits](#credits) for details.
Workspace	An isolated environment within your organization. All data is workspace-scoped

Getting Started

Sign in at the VerifyAX login page. First-time users see a four-step onboarding guide that introduces the platform, shows what you can do (connect agents, build simulations, launch verifications), and lets you choose your starting path — connect your own agent or explore the Agent Catalogue.

You can reopen the onboarding guide at any time from the Welcome page (**Tips → Show onboarding guide**).

After onboarding, you land on the **Welcome page** — your home hub. It shows three summary cards for the current workspace:

Agents — how many agents are connected, with links to view all or connect a new one
Simulations — how many simulations exist, with links to view all or create one
Reports — how many completed runs are available, with links to view all or start a new run

Use these cards to jump straight into the step you need. The **Tips** section below the cards links to the API documentation, the Roles & Permissions guide, and the onboarding walkthrough.

From here, the typical path is: connect an agent, create a simulation, run a verification, and view the report. The next section walks through each step.

Navigation

Sidebar

Collapsible left sidebar, organized by section. Visibility is role-driven — missing items mean your role doesn't grant access.

Section	Pages
Verification	Agents Registry, Simulations, Workbench, Dashboard, Reports
Settings	API Keys, Members & Workspaces, Account & Billing, Usage, Audit Log
Documentation	API Documentation, Roles and Permissions

The sidebar footer shows your **credit balance** and a **Buy Credits** button (see [Credits](#credits)).

Workspace Selector

Top of sidebar. Switch between organizations and workspaces. All data is scoped to the selected workspace.

User Menu

Click your avatar. Access **Profile**, **Edit Organisation**, **Help & Support**, **Feedback**, and **Log out**. Shows your current workspace role.

Generating a Report: Step by Step

This is the core workflow. Every report starts with an agent and a simulation, brought together in the Workbench.

Step 1	Connect Agent
Step 2	Create Simulation
Step 3	Run in Workbench
Step 4	View Report

Step 1 — Connect an Agent

Go to **Agents Registry** and add the agent you want to verify. You have two options:

Option A: Connect your own agent

Open the **Connect Agent** section. Choose the connection protocol:

**A2A** (Agent-to-Agent) — use this if your agent implements the A2A protocol and publishes an agent card. Provide a base URL and agent card path. Configure authentication, rate limits, and behaviour options (full-context mode, message history, proactive first round).
**API** (REST) — use this if your agent exposes a standard HTTP/JSON endpoint (any REST API that can receive a message and return a response). Provide session and message curl commands and tell VerifyAX where to find the session ID and assistant reply in the response. See the in-app **API Documentation → Connect Agents** guide for full details.

If you're unsure which to pick: most custom-built agents use **API**; agents built on frameworks that support the A2A standard use **A2A**.

Both protocols let you **Test Connection** before saving to confirm the agent is reachable.

Option B: Deploy from the Agent Catalogue

Open the **Agent Catalogue** section. Browse or search pre-built agent templates. Each card shows the agent's description, credits per 1M tokens, publisher, and available tools.

Select one or more agents, optionally customise **tools** and **system prompt** per agent, then click **Deploy Agents**. Deployed agents appear immediately in your My Agents table.

Step 2 — Create a Simulation

Go to **Simulations** and set up the test scenarios. Again, two options:

Option A: Generate with AI

Open the **Create Simulation** section. Configure:

Setting	Description
Mode	Single (one simulation) or Batch (2–50 simulations generated at once)
Simulation Type	Multi-Agent (info exchange) or 1-to-1 interaction (interview)
Name	Display name for the simulation or batch
Skill Tags	Pick from the tag library — filter by category, search by name. In Single mode, drag tags into the Selected tags zone. In Batch mode, drag tags into Sampling pool (randomly sampled per scenario) and Always included (present in every scenario).
Tags per Simulation	Batch only — how many tags each generated scenario should have (up to 5 for Multi-Agent, up to 2 for 1-to-1)
Context Prompt	Optional free-text (500 chars) to steer the generation. Use the example prompt link for inspiration.

Review the **estimated credits** (see [Credits](#credits)) and click **Generate**. The simulation appears in your table with a loading status while it generates.

Option B: Pick from the Simulation Catalogue

Browse pre-built simulation templates in a paginated grid. Each card shows the simulation name, description, and skill tags. Select a template and create a new simulation from it.

Step 3 — Launch a Verification Run

Go to the **Workbench** and configure the run:

**Select an Agent** — pick one agent from the dropdown (connected or catalogue). No agent yet? Click **Add new agent** to go to the Registry.
**Select Simulations** — choose one or more simulations. Individual simulations and batch groups both appear in the list. See [Batch Simulations](#batch-simulations) for how batches work here.
**Set Repetitions** (1–10) — each repetition runs the full simulation independently, helping you measure consistency.
**Review Credits** — the Workbench shows estimated credits for the run, pending operations, and your remaining balance (see [Credits](#credits)). The Run button is disabled if you don't have enough credits.
**Click Run** — the verification starts. If a very similar run already exists (same agent and simulation, no updates since), a confirmation dialog asks you to confirm.

When you select multiple simulations, they form a **run group** — this enables [aggregated evaluation](#aggregated-evaluation-batch-reports) across all of them.

Step 4 — View the Report

Go to **Reports** and click any completed run to open its evaluation report. You can also reach reports from the **Runs History** section in the Workbench.

Each report contains:

Section	What you see
Executive Summary	Overall score (out of 5), total tags tested, duration, date, agent and simulation names
Areas for Improvement	Collapsible section with actionable suggestions per skill tag
Tag Performance Summary	Table showing each tag's score, description, and number of tester agents
Aspects	Detailed cards per evaluated aspect — score justification per skill tag, plus the full conversation transcript with file previews

Report actions:

**Print Report** — opens a print-friendly version in a new tab (includes the executive summary, tag scores, improvement suggestions, score justifications, and full conversation transcripts)
**View Batch** — opens the aggregated evaluation view (when the run is part of a batch group)

Key Features

Agent Catalogue

The Agent Catalogue in the **Agents Registry** provides ready-to-use agent templates you can deploy without building your own.

**Search and sort** — filter by name, description, or publisher; sort by Most Popular, Newest, or credits (low/high)

**Preview** — each card shows input/output credits per 1M tokens, publisher, model, and available tools with their defaults

**Customise before deploying** — toggle individual tools on/off and override the system prompt per agent
**Multi-deploy** — select and deploy several agents in one action

Simulation Catalogue

The Simulation Catalogue in **Simulations** offers pre-built scenario templates:

**Paginated grid** with search

**Tag preview** — each card shows up to 3 skill tags with a "+N" overflow

**One-click creation** — select a template and create a workspace simulation from it

Batch Simulations

Batch simulations let you generate and manage groups of related scenarios together.

Creating a batch: In the simulation generator, switch to **Batch** mode. Set the number of simulations (2–50), configure your tag sampling pool and always-included tags, and generate. The entire batch appears as a single row in your Simulations table, with a count badge showing how many scenarios it contains. Expand the row to see individual members.

Running a batch: In the Workbench, batch groups appear as one selectable row. Selecting a batch selects all its member simulations. The bottom bar shows the batch name with "(N scenarios)". Click the expand icon to open the **Selected simulations** modal, where you can review each member or remove the entire batch.

Evaluating a batch: When a batch run completes, each member gets its own individual report. Additionally, a **View Batch** link appears on each member report, taking you to the aggregated evaluation.

Aggregated Evaluation (Batch Reports)

When you run multiple simulations as a group (a batch or multi-select), the platform generates an aggregated evaluation that lets you compare results across all runs.

The aggregated view shows:

Section	What you see
Executive Summary	Narrative summary of the batch
KPIs	Mean overall score, success rate %, total runs, tags evaluated
Tag Robustness Metrics	Per-tag table with mean score, standard deviation, and min–max range — showing how consistently the agent performs across scenarios
Individual Run Results	Per-run breakdown with average grade, tag scores, and a link to open each member's full report

You can also trigger a comparison from the **Runs History** table by selecting multiple completed runs.

Dashboard & Agent Comparison

The **Dashboard** provides workspace-wide trends and agent comparison.

Workspace overview:

Metric cards: Agents, Runs, Simulations, Tags, Remaining Credits

**Leaderboard** — agents ranked by average total score with horizontal bar chart

**Spending** — credit usage over time

Agent cards:

each agent shows its average score (donut chart), trend indicator (Improving / Declining / Stable), sparkline of recent run scores, and average credits per test.

Agent drill-down — click any agent card to see:

Per-tag performance bars (cross-agent comparison)

Per-tag performance over time

Verification run history

Usage breakdown

Refresh controls:

choose "last N runs per agent" (5–200) or a date period (Today, Last week, Last month, or custom range).

Verification Pages

Agents Registry

Action	Role
Test Connection	Viewer or above
Edit	Editor or above
Delete	Workspace Admin

Search, sort (by name, date, status), filter by type and date range. **Test Connections** tests all agents at once. Failed tests show error details.

Simulations

Action	Role
Edit	Editor or above
Copy	Editor or above
Retry (failed generation)	Editor or above
Delete (single or bulk)	Editor or above

Simulations with existing runs cannot be deleted

Workbench

The Workbench is the launch pad for verification runs and the central place to track their progress.

Run configuration — select an agent, one or more simulations (including batches), set repetitions, and review the credit estimate before running. Full details in [Step 3 — Launch a Verification Run](#step-3--launch-a-verification-run).

Runs History — a table of all past and in-progress runs for the current workspace. For each run you can see:

Run status (queued, running, completed, failed)

Agent and simulation names

Date started

Evaluation score (when complete)

Click any row to open the full evaluation report. Search and filter to find specific runs.

Action	Role
View runs	View runs
Start a run	Editor or above

Credits

Credits are the currency used to run verifications and generate simulations. Every operation that invokes the platform's AI engine consumes credits.

What affects cost:

**Model** — the underlying model used by the agent or evaluator

**Token volume** — the length and complexity of the simulation (more turns and richer context = more tokens)

**Repetitions** — each repetition is charged independently

Where you see credits:

**Sidebar footer** — your current balance at a glance

**Workbench** — estimated cost before you run, plus pending operations and remaining balance

**Simulation generator** — estimated cost before you generate

**Agent Catalogue** — input/output credits per 1M tokens per agent template

**Dashboard** — average credits per test on each agent card

Managing credits:

Purchase credits via **Account & Billing → Buy Credits** (Stripe checkout)

Quick top-up from the **Buy Credits** button in the sidebar

Track spend over time in **Usage** (date-range filter, daily debit chart, organization ledger)

> Credit pricing and balance management are available to org Owners and Admins. If your only org role is User, billing pages are hidden — ask an Owner or Admin for access.

Settings

API Keys

Action	Role
View	Viewer or above
Create	Editor or above
Revoke/ delete	Workspace Admin

Members & Workspaces

Invite members, assign org and workspace roles, create and manage workspaces. Org Owners and Admins.

Account & Billing

Credit balance, purchase credits (Stripe checkout), transaction history. Org Owners and Admins only — hidden for org-only Users.

Usage

Date-range filter, daily debit chart, organization ledger. Org Owners and Admins only.

Audit Log

History of workspace actions. Org Owners and Admins only.

Profile

Edit your name, account info, and organization details. Create a new organization.

Roles & Permissions

Two-layer model: **organization role** + **workspace role**.

Organization Roles

Role	Access
Owner	Full control, can delete the org
Admin	Same as Owner minus org deletion
User	Day-to-day seat — no billing, usage, or audit log access

Workspace Roles

Role	Access
Admin	Full CRUD on all workspace resources
Editor	Create and edit — cannot delete agents or API keys
Viewer	Read-only

Quick Reference

Task	Role
View dashboards and reports	Any workspace role
Browse agents and runs	Viewer or above
Create / edit agents and runs	Editor or above
Delete agents or API keys	Workspace admin
Run or edit simulations	Editor or above
Invite members / change roles	Org Owner or Admin
View usage and spend	Org Owner or Admin
Buy credits	Org Owner or Admin + Workspace Admin
Audit logs	Org Owner or Admin
Delete organization	Org Owner only

Troubleshooting

Agent connection test fails

The platform shows "Agent Connection Unsuccessful" with a prompt to review your settings. Common causes:

The endpoint URL is unreachable from VerifyAX (firewall, VPN, or the agent is not running)

Authentication credentials are missing or expired

For A2A agents: the agent card path is incorrect (default is `/.well-known/agent-card.json`)

For API agents: the curl command is malformed, or the response field mappings don't match the actual response structure

Open the agent's edit panel, correct the settings, and run **Test Connection** again.

Simulation generation stuck in loading

AI-generated simulations can take a few minutes. If a simulation stays in "loading" for an unusually long time:

Refresh the Simulations page — the status updates on page load

If it shows as **failed**, use the **Retry** action on the row to re-trigger generation

Check that you had sufficient credits when you generated — generation consumes credits upfront

Run fails to start or errors out

"Insufficient credits" — your balance is too low for the estimated cost. Top up via **Account & Billing → Buy Credits**.

**Run button is disabled** — either no agent is selected, no simulation is selected, or the credit estimate exceeds your balance.

**Run starts but fails** — the agent may have become unreachable during the run. Go to the Agents Registry, run **Test Connection**, and fix any issues before re-running.

Simulation cannot be deleted

Simulations with existing runs cannot be deleted — this is by design to preserve report history.

Recently generated simulations have a brief cooldown before deletion is allowed — wait a few minutes and try again.

Missing pages or actions in the sidebar

Both your organization role and workspace role control what you see. If a page or button is missing, check your roles on the **Members & Workspaces** page. See [Roles & Permissions](#roles--permissions) for details. In particular, if your only org role is **User**, Usage and Billing pages are hidden regardless of workspace role.

Page appears empty or data looks stale

Try refreshing the page. If the problem persists, sign out and sign back in. If it continues, submit a request via **Help & Support**.

Help & Support

Getting help — click your avatar in the sidebar and select **Help & Support**. This opens a form where you can describe your issue or question. Submit the form and the team will aim to respond within 48 hours.

Sharing feedback — select **Feedback** from the same user menu to submit product suggestions, feature requests, or general comments.

Self-service resources

Tips — every major page includes a Tips section with contextual guidance relevant to that screen

API Documentation — in-app reference and how-to guides for programmatic access (under Documentation in the sidebar)

Roles and Permissions — in-app guide explaining what each role can do (under Documentation in the sidebar)

Onboarding Guide — reopen the platform walkthrough any time from the Welcome page

For anything not covered in this guide, reach out via **Help & Support** in the user menu — the team will aim to respond within 48 hours.