How to interpret results

Aucert produces structured test results with confidence scores, severity classifications, and visual evidence. Here's how to read them.

Confidence scores

Every finding includes a confidence score between 0.0 and 1.0. This represents how certain the AI model is about its observation.

Score range	Meaning	Typical action
0.95–1.0	Very high confidence	Almost certainly a real bug — fix it
0.85–0.95	High confidence	Likely a real bug — quick manual check recommended
0.70–0.85	Medium confidence	Possible issue — manual investigation needed
Below 0.70	Low confidence	May be a false positive — verify before acting

tip

Set your confidence_threshold in aucert.config.yaml to match your team's tolerance. Start at 0.85 (the default) and adjust based on your false positive rate. See configure project for threshold tuning guidance.

What affects confidence?

Factor	Effect on confidence
Clear pass/fail signal	High — login screen shows home screen vs error
Ambiguous UI state	Lower — loading spinner that may or may not be transitioning
Animation in progress	Lower — screenshot captured mid-transition
Complex visual comparison	Lower — subtle layout shift or color difference

Severity levels

Bug reports are classified into four severity levels:

Severity	Description	Examples
Critical	App crash, data loss, security vulnerability	ANR, uncaught exception, data corruption
High	Major feature broken, user flow blocked	Login fails, checkout stuck, navigation dead-end
Medium	UI issue, minor functional problem	Wrong text displayed, layout shift, slow transition
Low	Cosmetic issue, minor UX concern	Truncated label, off-brand color, minor alignment

Bug report structure

Each bug report includes:

BUG-001: Checkout loading spinner does not resolve
──────────────────────────────────────────────────
Severity:     High
Confidence:   71.2%
Test scenario: Cart → Checkout → Payment

Reproduction steps:
  1. Add item to cart
  2. Tap "Checkout" button
  3. Wait for payment form

Expected: Payment form displayed within 3 seconds
Actual:   Loading spinner persisted after 5 second timeout

Screenshots:
  Step 2: [checkout-button-tap.png]
  Step 3: [loading-spinner-stuck.png]

Device context:
  Emulator: Pixel 7 API 34
  OS: Android 14
  Screen: 1080x2400 (420 dpi)

Key fields explained

Field	Purpose
Reproduction steps	Exact actions to trigger the issue — useful for developers
Expected vs actual	What the AI expected to see vs what it observed
Screenshots	Visual evidence at each step — the primary diagnostic tool
Device context	Emulator configuration so you can reproduce locally

JSON output format

When using --output json, results are structured for programmatic consumption:

{
  "run_id": "a1b2c3d4-...",
  "scenarios": [
    {
      "name": "Cart → Checkout → Payment",
      "result": "fail",
      "confidence": 0.712,
      "bug_report": {
        "title": "Checkout loading spinner does not resolve",
        "severity": "high",
        "steps": ["Add item to cart", "Tap Checkout", "Wait for payment form"],
        "expected": "Payment form displayed within 3 seconds",
        "actual": "Loading spinner persisted after 5 second timeout",
        "screenshots": ["step-2.png", "step-3.png"]
      }
    }
  ]
}

What's next

CLI commands — Check results with aucert status
Configure project — Adjust thresholds
CI/CD integration — Automate testing

Confidence scores​

What affects confidence?​

Severity levels​

Bug report structure​

Key fields explained​

JSON output format​

What's next​