How to interpret results
Aucert produces structured test results with confidence scores, severity classifications, and visual evidence. Here's how to read them.
Confidence scores
Every finding includes a confidence score between 0.0 and 1.0. This represents how certain the AI model is about its observation.
| Score range | Meaning | Typical action |
|---|---|---|
| 0.95–1.0 | Very high confidence | Almost certainly a real bug — fix it |
| 0.85–0.95 | High confidence | Likely a real bug — quick manual check recommended |
| 0.70–0.85 | Medium confidence | Possible issue — manual investigation needed |
| Below 0.70 | Low confidence | May be a false positive — verify before acting |
tip
Set your confidence_threshold in aucert.config.yaml to match your team's tolerance. Start at 0.85 (the default) and adjust based on your false positive rate. See configure project for threshold tuning guidance.
What affects confidence?
| Factor | Effect on confidence |
|---|---|
| Clear pass/fail signal | High — login screen shows home screen vs error |
| Ambiguous UI state | Lower — loading spinner that may or may not be transitioning |
| Animation in progress | Lower — screenshot captured mid-transition |
| Complex visual comparison | Lower — subtle layout shift or color difference |
Severity levels
Bug reports are classified into four severity levels:
| Severity | Description | Examples |
|---|---|---|
| Critical | App crash, data loss, security vulnerability | ANR, uncaught exception, data corruption |
| High | Major feature broken, user flow blocked | Login fails, checkout stuck, navigation dead-end |
| Medium | UI issue, minor functional problem | Wrong text displayed, layout shift, slow transition |
| Low | Cosmetic issue, minor UX concern | Truncated label, off-brand color, minor alignment |
Bug report structure
Each bug report includes:
BUG-001: Checkout loading spinner does not resolve
──────────────────────────────────────────────────
Severity: High
Confidence: 71.2%
Test scenario: Cart → Checkout → Payment
Reproduction steps:
1. Add item to cart
2. Tap "Checkout" button
3. Wait for payment form
Expected: Payment form displayed within 3 seconds
Actual: Loading spinner persisted after 5 second timeout
Screenshots:
Step 2: [checkout-button-tap.png]
Step 3: [loading-spinner-stuck.png]
Device context:
Emulator: Pixel 7 API 34
OS: Android 14
Screen: 1080x2400 (420 dpi)
Key fields explained
| Field | Purpose |
|---|---|
| Reproduction steps | Exact actions to trigger the issue — useful for developers |
| Expected vs actual | What the AI expected to see vs what it observed |
| Screenshots | Visual evidence at each step — the primary diagnostic tool |
| Device context | Emulator configuration so you can reproduce locally |
JSON output format
When using --output json, results are structured for programmatic consumption:
{
"run_id": "a1b2c3d4-...",
"scenarios": [
{
"name": "Cart → Checkout → Payment",
"result": "fail",
"confidence": 0.712,
"bug_report": {
"title": "Checkout loading spinner does not resolve",
"severity": "high",
"steps": ["Add item to cart", "Tap Checkout", "Wait for payment form"],
"expected": "Payment form displayed within 3 seconds",
"actual": "Loading spinner persisted after 5 second timeout",
"screenshots": ["step-2.png", "step-3.png"]
}
}
]
}
What's next
- CLI commands — Check results with
aucert status - Configure project — Adjust thresholds
- CI/CD integration — Automate testing