Conversation
There was a problem hiding this comment.
This was just moved from another location to be in shared-core.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e86592388e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
2 issues found across 48 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/types/src/sdk/featureFlag.ts">
<violation number="1" location="packages/types/src/sdk/featureFlag.ts:6">
P2: The feature-flag identifier was renamed to `AI_TESTS`, which conflicts with the documented `AI_EVALS` rollout gate and can leave evals disabled when operators enable the documented flag.</violation>
</file>
<file name="packages/builder/src/pages/builder/workspace/[application]/agent/[agentId]/tests.svelte">
<violation number="1" location="packages/builder/src/pages/builder/workspace/[application]/agent/[agentId]/tests.svelte:94">
P2: Guard async suite-load responses against agent switches; otherwise stale responses can overwrite state with another agent’s tests.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
There was a problem hiding this comment.
1 issue found across 20 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/shared-core/src/agentTests.ts">
<violation number="1" location="packages/shared-core/src/agentTests.ts:82">
P2: `exact_match` is not actually exact: it lowercases and normalizes whitespace before comparison, so non-identical responses can pass.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
There was a problem hiding this comment.
3 issues found across 9 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/shared-core/src/agentTests.ts">
<violation number="1" location="packages/shared-core/src/agentTests.ts:137">
P2: Whitespace-only reviewer content now passes required validation because the check no longer trims input.</violation>
</file>
<file name="packages/server/src/sdk/workspace/ai/tests/crud.ts">
<violation number="1" location="packages/server/src/sdk/workspace/ai/tests/crud.ts:42">
P2: Keep the name fallback here; `trim()` on an untrusted request field can throw if the case arrives without a name.</violation>
<violation number="2" location="packages/server/src/sdk/workspace/ai/tests/crud.ts:45">
P2: Preserve reviewer IDs when saving the suite; otherwise runs can emit `reviewerId: undefined` for persisted reviewers.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
There was a problem hiding this comment.
1 issue found across 10 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/server/src/sdk/workspace/ai/tests/run.ts">
<violation number="1" location="packages/server/src/sdk/workspace/ai/tests/run.ts:384">
P1: Returning the raw run object drops persistence for test executions, so run history cannot be reconstructed after the request finishes.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Made-with: Cursor
Description
This PR adds the ability to run evaluates of multiple types on your agent instructions.
You can use the following methods.
All tests are ran against the instructions present in the configuration tab at that moment in time.
It's currently gated against the
AI_TESTSfeature flag.Screenshots
Launchcontrol
Summary by cubic
Adds agent tests with a new Tests tab to create cases, run them, and review verdicts, final responses, and tool usage. The feature is behind
AI_TESTSand includes UI, API, run pipeline, validation, and logs updates.New Features
AI_TESTSis enabled.@budibase/shared-coreREVIEWERShelpers with input validation.frontend-coreclient: GET/PUT/api/agent/:agentId/tests, POST/api/agent/:agentId/tests/run(optionalcaseId); server validates inputs and returns 403 when the feature is disabled.Bug Fixes
Written for commit eada0c1. Summary will update on new commits.