How to test AI agents before deployment

Unlike traditional software, AI agents impact the real world, and they do so with minimal human supervision. A malfunctioning AI agent can cause enormous and irreparable harm to the company that deploys it. It can enter into contracts with other companies and individuals which compromise its owner’s IP. It can share the personal data of clients with bad actors. It can simply give its owners money away – at scale.

Todays AI agents don’t learn and grow in the way that children do. The LLMs they are based on are not plastic in that way. But they can behave in ways that their developers did not anticipate. So it is vital that organisations deploying AI agents test them thoroughly before deploying them.

The way to do this is to place the agent in a simulation of a real-world scenario – the kind of environment that the agent will be operating in when deployed. Inside that simulation, you can evaluate the agent’s behaviour against pre-defined expectations, and you can identify risks and failure modes.

Here is a practical, step-by-step approach to this kind of pre-deployment testing.

1. Define the agent´s purpose, tasks, and desired behaviours. Specify its success criteria.

You can´t test what you haven't specified. Start by documenting the agent´s purpose and the tasks it is is supposed to fulfil. Define how it is supposed to achieve its tasks, including what tools it is expected to use. As far as possible and reasonable, list the actions that it should never take, and explain how it should handle ambiguous situations. This list can never be complete, as a fully comprehensive list of what not to do would be infinite. But it can and should include the most common failure modes for the type of agent being deployed.

These specifications should include functional requirements (e.g., "the agent should answer billing questions accurately"), safety constraints (e.g., "the agent must never reveal one customer´s data to another customer"), and the behaviour expected when the agent is uncertain how to proceed (e.g., "if your confidence in the next course of action is low, escalate to a human").

These specifications should be expressed as concretely as possible, not as vague principles. For example, "the agent should be helpful to clients" is less testable than "if a user asks about our returns policy, the agent should provide a link to the returns page of our website".

2. Assemble a collection of tasks, enquiries, and prompts that the agent will face when deployed.

This collection should include common requests, adversarial inputs, edge cases, and multi-step scenarios. You can categorise these inputs into buckets, like straightforward, ambiguous, out-of-scope, adversarial, and multi-step, and check that there is a reasonable number of inputs in each bucket. If you have historical data from existing agents, mine that for any unusual requests that have caused failures in the past.

3. Test your agent in simulations inside a sandboxed environment.

You should test your agent in an environment that mirrors the environment that it will be deployed in as closely as possible. This is important because the agent should not be aware that it is being tested. The environment should include any APIs, databases, or tools the agent will be expected to access and use. You want the agent to carry out its normal activities exactly as if it was in deployment, but in a sandboxed environment where it cannot cause any damage to you or your clients.

Creating simulations inside simulated environments like this is a complex and expensive process, and most organisations will use pre-existing service like Verify AX from Conscium.

4. The agent should be tested against a range of criteria.

An AI agent can succeed or fail across a range of metrics. Can it access the tools and APIs that it needs to do its job? Does it retrieve the correct information in a timely manner? Can it read and evaluate the information it is provided? Is it persistent when trying to obtain information from an interlocutor who is confused about what is required, or has a reason to withhold some or all of the required information? Does it comply will all relevant policies? Can it distinguish between information that it can share with interlocutors, and information which must not be disclosed to particular agents and people? Can it resist attempts by interlocutors to persuade it to perform tasks that are out of scope?

Typically, a simulation will involve three or four of these tests, each of which will involve an interaction with another agent. The verification will culminate in a report which includes the full transcripts of the exchanges between the agent being tested and the other agents it interacts with. The report will provide a score for each of the tests, an explanation of why the agent succeeded or failed at each test, and suggestions for how the agent could be improved.

5. Tests should include adversarial interactions.

The verification process must include interactions that try to induce the agent to behave in inappropriate and harmful ways. This kind of red-teaming is the best way to discover the agent’s failure modes ahead of time, in a safe environment.

Examples of adversarial interactions include prompt injection attempts, requests for prohibited content, attempts to manipulate the agent into taking unintended actions, and inputs designed to confuse its reasoning. The verification process must document every failure and indicate whether it requires a fix, or constitutes an acceptable risk.

6. Tests should include multi-step interactions.

The work carried out by AI agents typically involves conversations with multiple steps, and workflows with multiple activities. Verification must test complete journeys, not just individual steps. For example, testing a customer support agent involves simulating entire conversations with users, from initial greetings through problem diagnosis, to resolution, and may well have to deal with the user changing their mind halfway through the process. The agent must maintain context correctly, must not contradict itself, and must be able to handle interruptions or topic changes gracefully.

7. Test your agent under pressure.

As far as possible, the tests should simulate the pressures that the agent will face in deployment. They should detect delays, timeouts, and degradation while in use. For agents in particularly sensitive roles, tests should be duplicated, in case the agent works perfectly for the first user, but behaves differently in successive sessions.

8. Run new tests every time a parameter changes.

Each time you update the agent's base model, add tools, or change its configuration in any way, you should re-verify it. Seemingly minor changes can alter an agent’s behaviour in unexpected ways. Re-testing should be an automatic consequence of changes to the agent’s make-up, and its scores in each test should be compared to check for performance drift.

Testing AI agents is not a one-off exercise, but an ongoing discipline. Verification is a living product, continuously expanding as new failure modes are suggested or discovered, as user needs evolve, and as the agent's capabilities change. Thorough testing reduces reportable incidents, builds trust with stakeholders, and lets you deploy agents with confidence rather than hope.