Technical specification

Real-time field safety risk detection and behavior analysis system.

GuardVision Edge moves safety intelligence from identity recognition to context awareness. The Android edge runtime detects observable behavior patterns such as loitering, restricted-zone entry, rapid approach, conflict posture, and dangerous object interaction without building a facial identity database.

Review architecture Privacy boundary

SYSTEM_SPEC_INDEX

Architecture

3 layers

Core modules

Default data path

Local

01System architecture
02CameraX pipeline
03LiteRT inference
04Module responsibilities
05Risk engine

Vision and positioning

From identity surveillance to privacy-preserving context awareness.

The system is intentionally behavior-oriented. It evaluates verifiable physical patterns and site context rather than biometric identity, reducing privacy exposure while keeping reviewable safety signals useful in low-light, masked-face, or cloud-disconnected environments.

Identity out

Anonymous behavior analysis

Risk assessment is based on temporary Track IDs, movement vectors, dwell time, pose sequences, object proximity, and zone rules.

Metadata first

Data minimization

Frames are processed on the edge device. The product keeps event metadata needed for review rather than raw identity data.

Reason required

Explainable security AI

Every alert must include a trigger reason, score band, evidence summary, and operator review context.

System architecture

A decoupled three-layer runtime for low-latency edge AI.

The layers isolate camera IO, heterogeneous inference, and stateful risk decisions so UI rendering, model execution, and behavior logic can be tuned independently.

Layer	Core component	Responsibility
Image Capture Layer	CameraX ImageAnalysis	Capture the image stream, correct rotation, convert YUV frames, and feed the analyzer without blocking UI rendering.
Edge Inference Layer	LiteRT, MediaPipe, ML Kit	Run person, object, and pose models on Android CPU, GPU, or NPU delegates for low-latency feature extraction.
Logic Decision Layer	RiskEngine, MultiObjectTracker	Convert detections into temporal behavior features, anonymous tracks, zone events, scores, and explainable alerts.

CameraX pipeline

Stable frame delivery is the foundation of real-time behavior analysis.

The analysis pipeline must keep the newest useful frame, close ImageProxy instances deterministically, and run outside the UI thread so inference latency does not freeze the app.

Image format

YUV_420_888

Camera frames enter the analyzer in Android's native YUV format before conversion or tensor preparation.

Analysis size

640x640 / 224x224

Preview and analysis streams are separated; analysis frames are normalized to model-specific input dimensions.

Backpressure

KEEP_ONLY_LATEST

Old frames are dropped so the risk engine evaluates current scene state instead of stale buffers.

Android runtime safeguards

Always call imageProxy.close() after each analyzer pass to avoid buffer saturation.
Run ImageAnalysis.Analyzer on a dedicated ExecutorService rather than the UI thread.
Normalize rotation, scaling, and coordinate mapping before detections enter tracking.
Package preprocessed pixels into TensorBuffer or TensorImage for model-safe inference.

LiteRT inference

Heterogeneous Android inference with delegate-aware fallback.

LiteRT is the local inference brain. The runtime should prefer hardware acceleration when available, but remain dependable on CPU paths when device capability is limited.

Delegate selection order

01Qualcomm Hexagon or NPU delegate when supported by the device and model.
02GPU delegate for accelerated vision workloads when NPU is unavailable.
03XNNPACK CPU path as the dependable fallback for broad Android compatibility.

Item	Specification	Detail
Input tensor	224x224 RGB 4D array	Normalized to [0, 1] float tensors or model-specific quantized input.
Core APIs	ObjectDetector / Pose Landmarker	Supports person boxes, object labels, and 33 pose landmarks through MediaPipe-compatible flows.
Confidence threshold	0.5 - 0.7 dynamic	Thresholds should adapt to lighting, scene stability, and scenario sensitivity.

Module responsibilities

Twelve core modules define the MVP engineering contract.

Each module communicates through explicit data structures so perception, tracking, scoring, storage, and overlay rendering can evolve independently.

CameraModule
01
Configure CameraX lifecycle and supply ImageProxy frames.
Frame stream
FramePreprocessor
02
Convert ImageProxy data, scale frames, and prepare tensor input.
TensorImage / TensorBuffer
PersonDetector
03
Detect people with confidence scores and normalized boxes.
List<RectF>
ObjectRiskDetector
04
Detect dangerous objects and correlate them with people or hands.
Object risk labels
PoseAnalyzer
05
Extract pose landmarks and temporal action tensors.
33-point pose features
MultiObjectTracker
06
Manage anonymous Track IDs and velocity vectors.
Track state
ZoneManager
07
Apply polygon rules and point-in-polygon collision checks.
Zone events
BehaviorFeatureExtractor
08
Extract dwell, loitering, approach angle, and second-order behavior features.
Behavior features
RiskEngine
09
Evaluate cross-frame state with weighted behavior rules.
Risk score and level
AlertManager
10
Generate explainable metadata and enforce alert cooldowns.
Reviewable alerts
EventStore
11
Store encrypted anonymous event summaries with Room or SQLCipher.
Local event records
OverlayRenderer
12
Render boxes, tracks, zones, and risk labels on the preview surface.
Operator overlay

Risk engine

Multi-cue scoring prevents weak signals from becoming hard accusations.

The RiskEngine combines time, zone, motion, pose, object, and confidence signals. A single weak cue can raise attention, but high-risk alerts require corroborating evidence.

Weighted signal example

Face masking or camera avoidance

+15

Weak signal; never sufficient by itself.

Sensitive zone entry

+30

Medium spatial cue tied to site configuration.

Loitering over 180 seconds

+40

Strong temporal cue that can push total score above emergency threshold.

Level	Score	Example	Automated response and explanation
Low	0 - 30	Normal passage or queueing	Background event log only.
Medium	31 - 60	Long stay or camera avoidance	Yellow overlay with the reason such as excessive dwell time.
High	61 - 80	Rapid approach or restricted-line crossing	Notify an operator with rapid-approach or zone-crossing explanation.
Emergency	81+	Dangerous object or attack posture	High-priority alert and optional 15-second evidence clip under policy.

Every alert carries a Trigger Reason field so the operator sees why the system raised risk rather than receiving an opaque AI verdict.

Edge performance

Latency, battery, and thermal limits shape the runtime strategy.

The Android app must adapt inference load to scene risk. Lightweight detection runs first; expensive pose and object models activate only when context justifies them.

Dynamic frame throttling

Run around 5 FPS during calm monitoring and raise toward 15 FPS when people or high-risk zones are active.

8-bit quantization

Prefer integer-quantized models to reduce latency, memory bandwidth, and thermal throttling risk.

Layered model scheduling

Start with person detection, then invoke object or pose models only after zone, dwell, or motion cues justify extra compute.

Security and compliance

Privacy-first engineering controls are part of the system contract.

The specification aligns product behavior with local data minimization, mobile security, and AI risk-management expectations without presenting the website as a legal certification.

Taiwan PDPA orientation

Design for data minimization and local processing: raw image handling stays transient unless a configured event evidence policy requires capture.

OWASP MASVS

Protect local storage, permissions, export paths, and device-resident event evidence.

NIST AI RMF

Document explainability, human oversight, false-positive review, and scenario coverage.

No biometric database

Do not store face embeddings, protected-attribute signals, or cross-day identity profiles.

MVP delivery

A phased roadmap keeps implementation measurable.

The final package must show source implementation, model specifications, runtime performance, and privacy self-check evidence.

01Phase 1-2: CameraX pipeline stability and anonymous tracking.
02Phase 3-4: Weighted RiskEngine rules plus pose and object model integration.
03Phase 5-6: NPU performance testing and explainable alert UI completion.

Final deliverables

Android source code for the twelve-module architecture.
Optimized quantized .tflite model package and weight documentation.
Performance report with FPS stability, end-to-end latency, CPU/NPU load, and privacy self-checklist.

Real-time field safety risk detection and behavior analysis system.

From identity surveillance to privacy-preserving context awareness.

Anonymous behavior analysis

Data minimization

Explainable security AI

A decoupled three-layer runtime for low-latency edge AI.

Stable frame delivery is the foundation of real-time behavior analysis.

Android runtime safeguards

Heterogeneous Android inference with delegate-aware fallback.

Delegate selection order

Twelve core modules define the MVP engineering contract.

CameraModule

FramePreprocessor

PersonDetector

ObjectRiskDetector

PoseAnalyzer

MultiObjectTracker

ZoneManager

BehaviorFeatureExtractor

RiskEngine

AlertManager

EventStore

OverlayRenderer