Architecture overview
Aucert processes mobile testing through a 5-layer AI pipeline. Each layer is specialized for a distinct phase — from understanding your app to reporting bugs. The pipeline is context-driven: a Knowledge Graph feeds rich app understanding into every stage, and a Device Twin bridges the gap between emulator and real-device behavior.
Click a layer below to see what it does:
Click a layer to see details
How a test run flows through the pipeline
All inter-layer communication uses a structured MCP message envelope carrying task IDs, payloads, confidence scores, and trace IDs. This design means layers can run in-process (monolith) or as separate services (microservices) with a single config flag.
The 5 layers
L1: Generation
The Generation layer designs test scenarios using context from the Knowledge Graph. Rather than writing test scripts manually, the AI analyzes your app's screens, navigation flows, API contracts, and historical bugs to generate comprehensive test plans.
What it produces: A set of test scenarios, each describing a sequence of actions (navigate to screen X, enter text Y, tap button Z) with expected outcomes.
How it's smart: The Knowledge Graph tells L1 which screens exist, how they connect, what APIs they call, and where bugs have occurred before. This means generated tests focus on real user flows and known risk areas, not random exploration.
L2: Execution
The Execution layer runs generated tests on Android emulators. It navigates your app, performs UI interactions (taps, swipes, text input, scrolls), and captures screenshots at each step.
What it produces: A trace of every action taken, the resulting screenshots, timing data, and any errors encountered (crashes, ANRs, network failures).
Device Twin (Phase 2): A predictive model will overlay emulator results with real-device behavior predictions, adjusting for known emulator-to-device divergences.
L3: Analysis
The Analysis layer applies visual reasoning to screenshots and execution logs. Multimodal AI models compare expected behavior against actual results, detecting:
- UI regressions — Layout shifts, missing elements, color changes, broken fonts
- Functional failures — Wrong screen displayed, error states, missing data
- Unexpected states — Infinite loaders, empty states that should have data, truncated text
What it produces: Per-step analysis with confidence scores indicating how certain the model is about each observation.
L4: Decision
The Decision layer determines pass/fail status using a confidence-gated Verification Cascade. Tests scoring above the threshold pass automatically. Ambiguous results escalate through increasingly rigorous (and expensive) verification stages.
| Stage | Method | Cost | Resolution rate |
|---|---|---|---|
| Stage 1 | Self-confidence score | ~$0.001 | ~80% of results |
| Stage 2 | Self-consistency (3x re-evaluation) | ~$0.003 | ~15% |
| Stage 3 | Cross-model vote | ~$0.01 | ~4% |
| Stage 4 | Structured debate | ~$0.05 | ~1% |
Phase 1 uses Stage 1 only (confidence threshold). The full cascade is planned for Phase 2. Target: false positive rate below 5%.
L5: Reporting
The Reporting layer generates structured output: bug reports with reproduction steps, severity classification, annotated screenshots, confidence scores, and dashboard data. Reports integrate with CI/CD pipelines and issue trackers.
What it produces: A test run summary with per-scenario results, a bug report for each failure, and metrics for the dashboard.
Cross-cutting components
Knowledge Graph
The Knowledge Graph is a structured representation of your application — screens, components, API endpoints, and their relationships. It feeds context into the Generation layer, enabling intelligent test creation rather than brute-force exploration.
The KG ingests five source types:
- Code ASTs — Screen definitions, navigation paths, state management
- API schemas — OpenAPI or Protobuf specs mapping endpoint relationships
- UI layouts — Screen hierarchies and interactive elements
- Historical data — Past test results and known bug patterns
- Product requirements — PRDs that define expected behavior
Learn more: Knowledge Graph
Device Twin
The Device Twin bridges the gap between emulator and real-device behavior. It learns from paired test runs (emulator vs device) and applies predictive adjustments to emulator-only results.
The Device Twin is designed but not built in Phase 1. Current testing uses direct emulator execution.
Learn more: Device Twin
What's next
- Knowledge Graph — How app context powers test generation
- Device Twin — Predictive device behavior modeling
- CLI commands — Run tests from the command line