How SafeOS Guardian Works: A Technical Deep Dive

SafeOS Guardian runs entirely in your browser, performing real-time AI analysis without sending data to any server. In this post, we'll explore the technology stack and architecture that makes this possible.

The Local-First Architecture

Traditional monitoring apps follow a client-server model: your device captures video, sends it to a server, the server processes it, and sends results back. This approach has several drawbacks: latency, privacy concerns, and infrastructure costs.

SafeOS Guardian flips this model. Everything happens on your device:

[Your Device]

↓

Camera/Microphone → MediaDevices API

↓

Video Frames → Canvas Processing

↓

TensorFlow.js → AI Detection

↓

Alert System → Browser Notifications

↓

IndexedDB → Local Storage

Motion Detection: Frame Differencing

Motion detection uses a technique called frame differencing. We capture video frames at regular intervals and compare consecutive frames pixel by pixel. When enough pixels change beyond a threshold, we detect motion.

The algorithm works like this:

Capture a frame from the video stream
Convert it to grayscale (faster to compare)
Compare each pixel to the previous frame
Count pixels that changed beyond the sensitivity threshold
If the count exceeds the motion threshold, trigger an alert

For sleep monitoring, we use "pixel detection" mode with ultra-low thresholds (3-10 pixels). This catches subtle movements like a sleeping baby stirring or chest movements from breathing.

TensorFlow.js: AI in the Browser

TensorFlow.js is a JavaScript library for machine learning that runs in the browser. It can use WebGL for GPU acceleration, making it fast enough for real-time video analysis.

We use TensorFlow.js for:

Motion Analysis: Beyond simple frame differencing, we use ML models to classify motion patterns and reduce false positives from lighting changes.
Audio Classification: We analyze audio frequency bands to detect crying, distress sounds, and unusual noise patterns.
Adaptive Thresholds: Models learn from your environment to automatically adjust sensitivity over time.

Audio Analysis: Cry Detection

Audio analysis uses the Web Audio API to capture microphone input and analyze it in real-time. We use several techniques:

Volume Threshold: Basic detection of sounds above a certain decibel level.
Frequency Analysis: Baby cries have distinctive frequency patterns (typically 300-600 Hz fundamental with harmonics). We use FFT (Fast Fourier Transform) to analyze frequency content.
Pattern Matching: Cries have characteristic duration and repetition patterns. We track these over time to distinguish crying from other sounds.

State Management: Zustand

We use Zustand for state management. It's lightweight (less than 1KB) and works perfectly with React. Our stores manage:

Camera and microphone state
Detection settings and sensitivity
Alert history and acknowledgments
User preferences and profiles

Zustand's persistence middleware automatically saves state to IndexedDB, ensuring your settings survive page refreshes and browser restarts.

Local Storage: IndexedDB

IndexedDB is a low-level browser API for storing structured data. Unlike localStorage, it can handle large amounts of data and supports complex queries. We store:

User settings and preferences
Alert history with timestamps
Monitoring profiles for different scenarios
Sync queue for optional cloud backup

All data stays on your device. If you clear your browser data, it's gone—we have no backup because we never had access in the first place.

The Tech Stack

Here's our complete technology stack:

Next.js 14: React framework with App Router
React 18: UI components with hooks
TypeScript: Type safety throughout
TensorFlow.js: Browser-based machine learning
Zustand: Lightweight state management
Tailwind CSS: Utility-first styling
IndexedDB: Local data persistence
Capacitor: Native mobile app packaging

Performance Considerations

Running AI in the browser presents performance challenges. Here's how we handle them:

Frame Sampling: We don't analyze every frame. Depending on the mode, we sample 2-10 frames per second.
Resolution Scaling: We analyze scaled-down frames (typically 320x240) while displaying full resolution.
Web Workers: Heavy computations run in background threads to keep the UI responsive.
Efficient Models: We use quantized, optimized models designed for edge devices.

What's Next

We're continuously improving SafeOS Guardian. Upcoming features include:

Behavior classification (sleeping, fussy, distressed)
Multi-device sync with end-to-end encryption
Smart home integration (HomeKit, Matter)
Improved cry detection with custom training

Want to contribute? Check out our GitHub repository or reach out to us at team@frame.dev.

— The Frame.dev Team