Architecture overview

Aucert processes mobile testing through a 5-layer AI pipeline. Each layer is specialized for a distinct phase — from understanding your app to reporting bugs. The pipeline is context-driven: a Knowledge Graph feeds rich app understanding into every stage, and a Device Twin bridges the gap between emulator and real-device behavior.

Click a layer below to see what it does:

Generation

→

Execution

→

Analysis

→

Decision

→

Reporting

Click a layer to see details

How a test run flows through the pipeline

All inter-layer communication uses a structured MCP message envelope carrying task IDs, payloads, confidence scores, and trace IDs. This design means layers can run in-process (monolith) or as separate services (microservices) with a single config flag.

The 5 layers

L1: Generation

The Generation layer designs test scenarios using context from the Knowledge Graph. Rather than writing test scripts manually, the AI analyzes your app's screens, navigation flows, API contracts, and historical bugs to generate comprehensive test plans.

What it produces: A set of test scenarios, each describing a sequence of actions (navigate to screen X, enter text Y, tap button Z) with expected outcomes.

How it's smart: The Knowledge Graph tells L1 which screens exist, how they connect, what APIs they call, and where bugs have occurred before. This means generated tests focus on real user flows and known risk areas, not random exploration.

L2: Execution

The Execution layer runs generated tests on Android emulators. It navigates your app, performs UI interactions (taps, swipes, text input, scrolls), and captures screenshots at each step.

What it produces: A trace of every action taken, the resulting screenshots, timing data, and any errors encountered (crashes, ANRs, network failures).

Device Twin (Phase 2): A predictive model will overlay emulator results with real-device behavior predictions, adjusting for known emulator-to-device divergences.

L3: Analysis

The Analysis layer applies visual reasoning to screenshots and execution logs. Multimodal AI models compare expected behavior against actual results, detecting:

UI regressions — Layout shifts, missing elements, color changes, broken fonts
Functional failures — Wrong screen displayed, error states, missing data
Unexpected states — Infinite loaders, empty states that should have data, truncated text

What it produces: Per-step analysis with confidence scores indicating how certain the model is about each observation.

L4: Decision

The Decision layer determines pass/fail status using a confidence-gated Verification Cascade. Tests scoring above the threshold pass automatically. Ambiguous results escalate through increasingly rigorous (and expensive) verification stages.

Stage	Method	Cost	Resolution rate
Stage 1	Self-confidence score	~$0.001	~80% of results
Stage 2	Self-consistency (3x re-evaluation)	~$0.003	~15%
Stage 3	Cross-model vote	~$0.01	~4%
Stage 4	Structured debate	~$0.05	~1%

info

Phase 1 uses Stage 1 only (confidence threshold). The full cascade is planned for Phase 2. Target: false positive rate below 5%.

L5: Reporting

The Reporting layer generates structured output: bug reports with reproduction steps, severity classification, annotated screenshots, confidence scores, and dashboard data. Reports integrate with CI/CD pipelines and issue trackers.

What it produces: A test run summary with per-scenario results, a bug report for each failure, and metrics for the dashboard.

Cross-cutting components

Knowledge Graph

The Knowledge Graph is a structured representation of your application — screens, components, API endpoints, and their relationships. It feeds context into the Generation layer, enabling intelligent test creation rather than brute-force exploration.

The KG ingests five source types:

Code ASTs — Screen definitions, navigation paths, state management
API schemas — OpenAPI or Protobuf specs mapping endpoint relationships
UI layouts — Screen hierarchies and interactive elements
Historical data — Past test results and known bug patterns
Product requirements — PRDs that define expected behavior

Learn more: Knowledge Graph

Device Twin

The Device Twin bridges the gap between emulator and real-device behavior. It learns from paired test runs (emulator vs device) and applies predictive adjustments to emulator-only results.

info

The Device Twin is designed but not built in Phase 1. Current testing uses direct emulator execution.

Learn more: Device Twin

What's next

Knowledge Graph — How app context powers test generation
Device Twin — Predictive device behavior modeling
CLI commands — Run tests from the command line

How a test run flows through the pipeline​

The 5 layers​

L1: Generation​

L2: Execution​

L3: Analysis​

L4: Decision​

L5: Reporting​

Cross-cutting components​

Knowledge Graph​

Device Twin​

What's next​