VerifyAX tests AI agents in realistic simulations and tells you exactly how they perform, giving you the clarity to act on what matters.
| Term | Meaning |
| Agent | The AI system you want to verify — registered with a name, endpoint, and type (A2A or API). Treated as a black box. |
| Simulation | A multi-turn, multi-agent test scenario with objectives. Reusable across agents. |
| Skill Tags | Labels describing what a simulation tests (e.g. `negotiation`, `deception_detection`). Drive both generation and scoring. |
| Verification Run | One execution of an agent against a simulation — produces a transcript and metadata. |
| Evaluation | The scoring step after a run — structured metrics plus a report. |
| Batch | A group of related simulations generated, run, and evaluated together. |
| Aggregated Evaluation | Cross-run comparison showing mean scores, robustness metrics, and per-run breakdowns. |
| Credits | The currency for running verifications. Cost depends on model, token volume, and repetitions — see [Credits](#credits) for details. |
| Workspace | An isolated environment within your organization. All data is workspace-scoped |
Getting Started
Sign in at the VerifyAX login page. First-time users see a four-step onboarding guide that introduces the platform, shows what you can do (connect agents, build simulations, launch verifications), and lets you choose your starting path — connect your own agent or explore the Agent Catalogue.
You can reopen the onboarding guide at any time from the Welcome page (**Tips → Show onboarding guide**).
After onboarding, you land on the **Welcome page** — your home hub. It shows three summary cards for the current workspace:
Agents — how many agents are connected, with links to view all or connect a new one
Simulations — how many simulations exist, with links to view all or create one
Reports — how many completed runs are available, with links to view all or start a new run
Use these cards to jump straight into the step you need. The **Tips** section below the cards links to the API documentation, the Roles & Permissions guide, and the onboarding walkthrough.
From here, the typical path is: connect an agent, create a simulation, run a verification, and view the report. The next section walks through each step.
Navigation
Sidebar
Collapsible left sidebar, organized by section. Visibility is role-driven — missing items mean your role doesn't grant access.
| Section | Pages |
| Verification | Agents Registry, Simulations, Workbench, Dashboard, Reports |
| Settings | API Keys, Members & Workspaces, Account & Billing, Usage, Audit Log |
| Documentation | API Documentation, Roles and Permissions |
The sidebar footer shows your **credit balance** and a **Buy Credits** button (see [Credits](#credits)).
Workspace Selector
Top of sidebar. Switch between organizations and workspaces. All data is scoped to the selected workspace.
User Menu
Click your avatar. Access **Profile**, **Edit Organisation**, **Help & Support**, **Feedback**, and **Log out**. Shows your current workspace role.
Generating a Report: Step by Step
This is the core workflow. Every report starts with an agent and a simulation, brought together in the Workbench.
| Step 1 | Connect Agent |
| Step 2 | Create Simulation |
| Step 3 | Run in Workbench |
| Step 4 | View Report |
Step 1 — Connect an Agent
Go to **Agents Registry** and add the agent you want to verify. You have two options:
Option A: Connect your own agent
Open the **Connect Agent** section. Choose the connection protocol:
-
**A2A** (Agent-to-Agent) — use this if your agent implements the A2A protocol and publishes an agent card. Provide a base URL and agent card path. Configure authentication, rate limits, and behaviour options (full-context mode, message history, proactive first round).
-
**API** (REST) — use this if your agent exposes a standard HTTP/JSON endpoint (any REST API that can receive a message and return a response). Provide session and message curl commands and tell VerifyAX where to find the session ID and assistant reply in the response. See the in-app **API Documentation → Connect Agents** guide for full details.
If you're unsure which to pick: most custom-built agents use **API**; agents built on frameworks that support the A2A standard use **A2A**.
Both protocols let you **Test Connection** before saving to confirm the agent is reachable.
Option B: Deploy from the Agent Catalogue
Open the **Agent Catalogue** section. Browse or search pre-built agent templates. Each card shows the agent's description, credits per 1M tokens, publisher, and available tools.
Select one or more agents, optionally customise **tools** and **system prompt** per agent, then click **Deploy Agents**. Deployed agents appear immediately in your My Agents table.
Step 2 — Create a Simulation
Go to **Simulations** and set up the test scenarios. Again, two options:
Option A: Generate with AI
Open the **Create Simulation** section. Configure:
| Setting | Description |
| Mode | **Single** (one simulation) or **Batch** (2–50 simulations generated at once) |
| Simulation Type | **Multi-Agent** (info exchange) or **1-to-1 interaction** (interview) |
| Name | Display name for the simulation or batch |
| Skill Tags | Pick from the tag library — filter by category, search by name. In Single mode, drag tags into the **Selected tags** zone. In Batch mode, drag tags into **Sampling pool** (randomly sampled per scenario) and **Always included** (present in every scenario). |
| Tags per Simulation | Batch only — how many tags each generated scenario should have (up to 5 for Multi-Agent, up to 2 for 1-to-1) |
| Context Prompt | Optional free-text (500 chars) to steer the generation. Use the **example prompt** link for inspiration. |
Review the **estimated credits** (see [Credits](#credits)) and click **Generate**. The simulation appears in your table with a loading status while it generates.
Option B: Pick from the Simulation Catalogue
Browse pre-built simulation templates in a paginated grid. Each card shows the simulation name, description, and skill tags. Select a template and create a new simulation from it.
Step 3 — Launch a Verification Run
Go to the **Workbench** and configure the run:
-
**Select an Agent** — pick one agent from the dropdown (connected or catalogue). No agent yet? Click **Add new agent** to go to the Registry.
-
**Select Simulations** — choose one or more simulations. Individual simulations and batch groups both appear in the list. See [Batch Simulations](#batch-simulations) for how batches work here.
-
**Set Repetitions** (1–10) — each repetition runs the full simulation independently, helping you measure consistency.
-
**Review Credits** — the Workbench shows estimated credits for the run, pending operations, and your remaining balance (see [Credits](#credits)). The Run button is disabled if you don't have enough credits.
-
**Click Run** — the verification starts. If a very similar run already exists (same agent and simulation, no updates since), a confirmation dialog asks you to confirm.
When you select multiple simulations, they form a **run group** — this enables [aggregated evaluation](#aggregated-evaluation-batch-reports) across all of them.
Step 4 — View the Report
Go to **Reports** and click any completed run to open its evaluation report. You can also reach reports from the **Runs History** section in the Workbench.
Each report contains:
| Section | What you see |
| Executive Summary | Overall score (out of 5), total tags tested, duration, date, agent and simulation names |
| Areas for Improvement | Collapsible section with actionable suggestions per skill tag |
| Tag Performance Summary | Table showing each tag's score, description, and number of tester agents |
| Aspects | Detailed cards per evaluated aspect — score justification per skill tag, plus the full conversation transcript with file previews |
Report actions:
-
**Print Report** — opens a print-friendly version in a new tab (includes the executive summary, tag scores, improvement suggestions, score justifications, and full conversation transcripts)
- **View Batch** — opens the aggregated evaluation view (when the run is part of a batch group)
Key Features
Agent Catalogue
The Agent Catalogue in the **Agents Registry** provides ready-to-use agent templates you can deploy without building your own.
-
**Search and sort** — filter by name, description, or publisher; sort by Most Popular, Newest, or credits (low/high)
-
**Preview** — each card shows input/output credits per 1M tokens, publisher, model, and available tools with their defaults
-
**Customise before deploying** — toggle individual tools on/off and override the system prompt per agent
-
**Multi-deploy** — select and deploy several agents in one action
Simulation Catalogue
The Simulation Catalogue in **Simulations** offers pre-built scenario templates:
-
**Paginated grid** with search
-
**Tag preview** — each card shows up to 3 skill tags with a "+N" overflow
-
**One-click creation** — select a template and create a workspace simulation from it
Batch Simulations
Batch simulations let you generate and manage groups of related scenarios together.
Creating a batch: In the simulation generator, switch to **Batch** mode. Set the number of simulations (2–50), configure your tag sampling pool and always-included tags, and generate. The entire batch appears as a single row in your Simulations table, with a count badge showing how many scenarios it contains. Expand the row to see individual members.
Running a batch: In the Workbench, batch groups appear as one selectable row. Selecting a batch selects all its member simulations. The bottom bar shows the batch name with "(N scenarios)". Click the expand icon to open the **Selected simulations** modal, where you can review each member or remove the entire batch.
Evaluating a batch: When a batch run completes, each member gets its own individual report. Additionally, a **View Batch** link appears on each member report, taking you to the aggregated evaluation.
Aggregated Evaluation (Batch Reports)
When you run multiple simulations as a group (a batch or multi-select), the platform generates an aggregated evaluation that lets you compare results across all runs.
The aggregated view shows:
| Section | What you see |
| Executive Summary | Narrative summary of the batch |
| KPIs | Mean overall score, success rate %, total runs, tags evaluated |
| Tag Robustness Metrics | Per-tag table with mean score, standard deviation, and min–max range — showing how consistently the agent performs across scenarios |
| Individual Run Results | Per-run breakdown with average grade, tag scores, and a link to open each member's full report |
You can also trigger a comparison from the **Runs History** table by selecting multiple completed runs.
Dashboard & Agent Comparison
The **Dashboard** provides workspace-wide trends and agent comparison.
Workspace overview:
-
Metric cards: Agents, Runs, Simulations, Tags, Remaining Credits
-
**Leaderboard** — agents ranked by average total score with horizontal bar chart
-
**Spending** — credit usage over time
Agent cards:
each agent shows its average score (donut chart), trend indicator (Improving / Declining / Stable), sparkline of recent run scores, and average credits per test.
Agent drill-down — click any agent card to see:
-
Per-tag performance bars (cross-agent comparison)
-
Per-tag performance over time
-
Verification run history
-
Usage breakdown
Refresh controls:
choose "last N runs per agent" (5–200) or a date period (Today, Last week, Last month, or custom range).
Verification Pages
Agents Registry
| Action | Role |
| Test Connection | Viewer or above |
| Edit | Editor or above |
| Delete | Workspace Admin |
Search, sort (by name, date, status), filter by type and date range. **Test Connections** tests all agents at once. Failed tests show error details.
Simulations
| Action | Role |
| Edit | Editor or above |
| Copy | Editor or above |
|
Retry (failed generation) |
Editor or above |
| Delete (single or bulk) | Editor or above |
Simulations with existing runs cannot be deleted
Workbench
The Workbench is the launch pad for verification runs and the central place to track their progress.
Run configuration — select an agent, one or more simulations (including batches), set repetitions, and review the credit estimate before running. Full details in [Step 3 — Launch a Verification Run](#step-3--launch-a-verification-run).
Runs History — a table of all past and in-progress runs for the current workspace. For each run you can see:
-
Run status (queued, running, completed, failed)
-
Agent and simulation names
-
Date started
-
Evaluation score (when complete)
Click any row to open the full evaluation report. Search and filter to find specific runs.
| Action | Role |
| View runs | View runs |
| Start a run | Editor or above |
Credits
Credits are the currency used to run verifications and generate simulations. Every operation that invokes the platform's AI engine consumes credits.
What affects cost:
-
**Model** — the underlying model used by the agent or evaluator
-
**Token volume** — the length and complexity of the simulation (more turns and richer context = more tokens)
-
**Repetitions** — each repetition is charged independently
Where you see credits:
-
**Sidebar footer** — your current balance at a glance
-
**Workbench** — estimated cost before you run, plus pending operations and remaining balance
-
**Simulation generator** — estimated cost before you generate
-
**Agent Catalogue** — input/output credits per 1M tokens per agent template
-
**Dashboard** — average credits per test on each agent card
Managing credits:
-
Purchase credits via **Account & Billing → Buy Credits** (Stripe checkout)
-
Quick top-up from the **Buy Credits** button in the sidebar
-
Track spend over time in **Usage** (date-range filter, daily debit chart, organization ledger)
> Credit pricing and balance management are available to org Owners and Admins. If your only org role is User, billing pages are hidden — ask an Owner or Admin for access.
Settings
API Keys
| Action | Role |
| View | Viewer or above |
| Create | Editor or above |
| Revoke/ delete | Workspace Admin |
Members & Workspaces
Invite members, assign org and workspace roles, create and manage workspaces. Org Owners and Admins.
Account & Billing
Credit balance, purchase credits (Stripe checkout), transaction history. Org Owners and Admins only — hidden for org-only Users.
Usage
Date-range filter, daily debit chart, organization ledger. Org Owners and Admins only.
Audit Log
History of workspace actions. Org Owners and Admins only.
Profile
Edit your name, account info, and organization details. Create a new organization.
Roles & Permissions
Two-layer model: **organization role** + **workspace role**.
Organization Roles
| Role | Access |
| Owner | Full control, can delete the org |
| Admin | Same as Owner minus org deletion |
| User | Day-to-day seat — no billing, usage, or audit log access |
Workspace Roles
| Role | Access |
| Admin | Full CRUD on all workspace resources |
| Editor | Create and edit — cannot delete agents or API keys |
| Viewer | Read-only |
Quick Reference
| Task | Role |
| View dashboards and reports | Any workspace role |
| Browse agents and runs | Viewer or above |
| Create / edit agents and runs | Editor or above |
| Delete agents or API keys | Workspace admin |
| Run or edit simulations | Editor or above |
| Invite members / change roles | Org Owner or Admin |
| View usage and spend | Org Owner or Admin |
| Buy credits | Org Owner or Admin + Workspace Admin |
| Audit logs | Org Owner or Admin |
| Delete organization | Org Owner only |
Troubleshooting
Agent connection test fails
The platform shows "Agent Connection Unsuccessful" with a prompt to review your settings. Common causes:
-
The endpoint URL is unreachable from VerifyAX (firewall, VPN, or the agent is not running)
-
Authentication credentials are missing or expired
-
For A2A agents: the agent card path is incorrect (default is `/.well-known/agent-card.json`)
-
For API agents: the curl command is malformed, or the response field mappings don't match the actual response structure
Open the agent's edit panel, correct the settings, and run **Test Connection** again.
Simulation generation stuck in loading
AI-generated simulations can take a few minutes. If a simulation stays in "loading" for an unusually long time:
-
Refresh the Simulations page — the status updates on page load
-
If it shows as **failed**, use the **Retry** action on the row to re-trigger generation
-
Check that you had sufficient credits when you generated — generation consumes credits upfront
Run fails to start or errors out
-
"Insufficient credits" — your balance is too low for the estimated cost. Top up via **Account & Billing → Buy Credits**.
-
**Run button is disabled** — either no agent is selected, no simulation is selected, or the credit estimate exceeds your balance.
-
**Run starts but fails** — the agent may have become unreachable during the run. Go to the Agents Registry, run **Test Connection**, and fix any issues before re-running.
Simulation cannot be deleted
-
Simulations with existing runs cannot be deleted — this is by design to preserve report history.
-
Recently generated simulations have a brief cooldown before deletion is allowed — wait a few minutes and try again.
Missing pages or actions in the sidebar
Both your organization role and workspace role control what you see. If a page or button is missing, check your roles on the **Members & Workspaces** page. See [Roles & Permissions](#roles--permissions) for details. In particular, if your only org role is **User**, Usage and Billing pages are hidden regardless of workspace role.
Page appears empty or data looks stale
Try refreshing the page. If the problem persists, sign out and sign back in. If it continues, submit a request via **Help & Support**.
Help & Support
Getting help — click your avatar in the sidebar and select **Help & Support**. This opens a form where you can describe your issue or question. Submit the form and the team will aim to respond within 48 hours.
Sharing feedback — select **Feedback** from the same user menu to submit product suggestions, feature requests, or general comments.
Self-service resources
-
Tips — every major page includes a Tips section with contextual guidance relevant to that screen
-
API Documentation — in-app reference and how-to guides for programmatic access (under Documentation in the sidebar)
-
Roles and Permissions — in-app guide explaining what each role can do (under Documentation in the sidebar)
-
Onboarding Guide — reopen the platform walkthrough any time from the Welcome page
For anything not covered in this guide, reach out via **Help & Support** in the user menu — the team will aim to respond within 48 hours.