Real-time field safety risk detection and behavior analysis system.
GuardVision Edge moves safety intelligence from identity recognition to context awareness. The Android edge runtime detects observable behavior patterns such as loitering, restricted-zone entry, rapid approach, conflict posture, and dangerous object interaction without building a facial identity database.
SYSTEM_SPEC_INDEX
Architecture
3 layers
Core modules
12
Default data path
Local
- 01System architecture
- 02CameraX pipeline
- 03LiteRT inference
- 04Module responsibilities
- 05Risk engine
Vision and positioning
From identity surveillance to privacy-preserving context awareness.
The system is intentionally behavior-oriented. It evaluates verifiable physical patterns and site context rather than biometric identity, reducing privacy exposure while keeping reviewable safety signals useful in low-light, masked-face, or cloud-disconnected environments.
Identity out
Anonymous behavior analysis
Risk assessment is based on temporary Track IDs, movement vectors, dwell time, pose sequences, object proximity, and zone rules.
Metadata first
Data minimization
Frames are processed on the edge device. The product keeps event metadata needed for review rather than raw identity data.
Reason required
Explainable security AI
Every alert must include a trigger reason, score band, evidence summary, and operator review context.
System architecture
A decoupled three-layer runtime for low-latency edge AI.
The layers isolate camera IO, heterogeneous inference, and stateful risk decisions so UI rendering, model execution, and behavior logic can be tuned independently.
| Layer | Core component | Responsibility |
|---|---|---|
| Image Capture Layer | CameraX ImageAnalysis | Capture the image stream, correct rotation, convert YUV frames, and feed the analyzer without blocking UI rendering. |
| Edge Inference Layer | LiteRT, MediaPipe, ML Kit | Run person, object, and pose models on Android CPU, GPU, or NPU delegates for low-latency feature extraction. |
| Logic Decision Layer | RiskEngine, MultiObjectTracker | Convert detections into temporal behavior features, anonymous tracks, zone events, scores, and explainable alerts. |
CameraX pipeline
Stable frame delivery is the foundation of real-time behavior analysis.
The analysis pipeline must keep the newest useful frame, close ImageProxy instances deterministically, and run outside the UI thread so inference latency does not freeze the app.
Image format
YUV_420_888
Camera frames enter the analyzer in Android's native YUV format before conversion or tensor preparation.
Analysis size
640x640 / 224x224
Preview and analysis streams are separated; analysis frames are normalized to model-specific input dimensions.
Backpressure
KEEP_ONLY_LATEST
Old frames are dropped so the risk engine evaluates current scene state instead of stale buffers.
Android runtime safeguards
- Always call imageProxy.close() after each analyzer pass to avoid buffer saturation.
- Run ImageAnalysis.Analyzer on a dedicated ExecutorService rather than the UI thread.
- Normalize rotation, scaling, and coordinate mapping before detections enter tracking.
- Package preprocessed pixels into TensorBuffer or TensorImage for model-safe inference.
LiteRT inference
Heterogeneous Android inference with delegate-aware fallback.
LiteRT is the local inference brain. The runtime should prefer hardware acceleration when available, but remain dependable on CPU paths when device capability is limited.
Delegate selection order
- 01Qualcomm Hexagon or NPU delegate when supported by the device and model.
- 02GPU delegate for accelerated vision workloads when NPU is unavailable.
- 03XNNPACK CPU path as the dependable fallback for broad Android compatibility.
| Item | Specification | Detail |
|---|---|---|
| Input tensor | 224x224 RGB 4D array | Normalized to [0, 1] float tensors or model-specific quantized input. |
| Core APIs | ObjectDetector / Pose Landmarker | Supports person boxes, object labels, and 33 pose landmarks through MediaPipe-compatible flows. |
| Confidence threshold | 0.5 - 0.7 dynamic | Thresholds should adapt to lighting, scene stability, and scenario sensitivity. |
Module responsibilities
Twelve core modules define the MVP engineering contract.
Each module communicates through explicit data structures so perception, tracking, scoring, storage, and overlay rendering can evolve independently.
CameraModule
01Configure CameraX lifecycle and supply ImageProxy frames.
Frame stream
FramePreprocessor
02Convert ImageProxy data, scale frames, and prepare tensor input.
TensorImage / TensorBuffer
PersonDetector
03Detect people with confidence scores and normalized boxes.
List<RectF>
ObjectRiskDetector
04Detect dangerous objects and correlate them with people or hands.
Object risk labels
PoseAnalyzer
05Extract pose landmarks and temporal action tensors.
33-point pose features
MultiObjectTracker
06Manage anonymous Track IDs and velocity vectors.
Track state
ZoneManager
07Apply polygon rules and point-in-polygon collision checks.
Zone events
BehaviorFeatureExtractor
08Extract dwell, loitering, approach angle, and second-order behavior features.
Behavior features
RiskEngine
09Evaluate cross-frame state with weighted behavior rules.
Risk score and level
AlertManager
10Generate explainable metadata and enforce alert cooldowns.
Reviewable alerts
EventStore
11Store encrypted anonymous event summaries with Room or SQLCipher.
Local event records
OverlayRenderer
12Render boxes, tracks, zones, and risk labels on the preview surface.
Operator overlay
Risk engine
Multi-cue scoring prevents weak signals from becoming hard accusations.
The RiskEngine combines time, zone, motion, pose, object, and confidence signals. A single weak cue can raise attention, but high-risk alerts require corroborating evidence.
Weighted signal example
Face masking or camera avoidance
+15Weak signal; never sufficient by itself.
Sensitive zone entry
+30Medium spatial cue tied to site configuration.
Loitering over 180 seconds
+40Strong temporal cue that can push total score above emergency threshold.
| Level | Score | Example | Automated response and explanation |
|---|---|---|---|
| Low | 0 - 30 | Normal passage or queueing | Background event log only. |
| Medium | 31 - 60 | Long stay or camera avoidance | Yellow overlay with the reason such as excessive dwell time. |
| High | 61 - 80 | Rapid approach or restricted-line crossing | Notify an operator with rapid-approach or zone-crossing explanation. |
| Emergency | 81+ | Dangerous object or attack posture | High-priority alert and optional 15-second evidence clip under policy. |
Every alert carries a Trigger Reason field so the operator sees why the system raised risk rather than receiving an opaque AI verdict.
Edge performance
Latency, battery, and thermal limits shape the runtime strategy.
The Android app must adapt inference load to scene risk. Lightweight detection runs first; expensive pose and object models activate only when context justifies them.
Dynamic frame throttling
Run around 5 FPS during calm monitoring and raise toward 15 FPS when people or high-risk zones are active.
8-bit quantization
Prefer integer-quantized models to reduce latency, memory bandwidth, and thermal throttling risk.
Layered model scheduling
Start with person detection, then invoke object or pose models only after zone, dwell, or motion cues justify extra compute.
Security and compliance
Privacy-first engineering controls are part of the system contract.
The specification aligns product behavior with local data minimization, mobile security, and AI risk-management expectations without presenting the website as a legal certification.
Taiwan PDPA orientation
Design for data minimization and local processing: raw image handling stays transient unless a configured event evidence policy requires capture.
OWASP MASVS
Protect local storage, permissions, export paths, and device-resident event evidence.
NIST AI RMF
Document explainability, human oversight, false-positive review, and scenario coverage.
No biometric database
Do not store face embeddings, protected-attribute signals, or cross-day identity profiles.
MVP delivery
A phased roadmap keeps implementation measurable.
The final package must show source implementation, model specifications, runtime performance, and privacy self-check evidence.
- 01Phase 1-2: CameraX pipeline stability and anonymous tracking.
- 02Phase 3-4: Weighted RiskEngine rules plus pose and object model integration.
- 03Phase 5-6: NPU performance testing and explainable alert UI completion.
Final deliverables
- Android source code for the twelve-module architecture.
- Optimized quantized .tflite model package and weight documentation.
- Performance report with FPS stability, end-to-end latency, CPU/NPU load, and privacy self-checklist.