A Raspberry Pi 4B edge application that uses PIR, light (LDR), and ultrasonic sensors to throttle YOLOv8 inference resolution and frame rate — reducing idle power draw and CPU/SoC thermals, without a GPU and without sacrificing detection quality.
Running a continuous, fixed-resolution computer-vision loop on an ARM SoC like the Raspberry Pi 4B is thermally and energetically expensive — it pushes the Broadcom BCM2711 toward its 80 °C throttling boundary within minutes, even though most of the time the scene is empty or static. The usual fixes (a heatsink, an AI-accelerator hat, or a weaker model) either add hardware cost or sacrifice accuracy.
SAGE-Vision makes compute proportional to scene demand: cheap sensors gate when and how hard the YOLO model runs, so the node idles near-free when nothing is happening and scales inference resolution to subject distance when it is. The target is a measurable reduction in average power draw and core temperature versus an always-on baseline, with no meaningful loss in detection quality — at zero additional hardware cost beyond sensors already on the bench.
The node runs fully offline on the Pi alone. Sensors wire directly to the 40-pin GPIO header (read by the pigpio background process — there is no microcontroller in the live path); a USB camera supplies frames; a two-thread core (sensor harvester + adaptive vision/FSM) does the work, with optional background threads for power/telemetry. Inference runs on tflite-runtime with INT8 YOLOv8-nano models.
| Component | Part | Role |
|---|---|---|
| Compute | Raspberry Pi 4B | edge inference node |
| Camera | USB UVC webcam | video frames |
| Motion | HC-SR501 PIR | wake / presence signal |
| Light | LM393 comparator module | dark/bright gate for CLAHE |
| Distance | HC-SR04 ultrasonic | subject distance → model selection |
| Power measurement | INA219 | whole-Pi power for the energy figure |
| Sensor | Signal | Pi pin | Notes |
|---|---|---|---|
| HC-SR501 PIR | OUT | GPIO 17 (pin 11) | 5V supply; output already 3.3V-safe |
| LM393 light | DO | GPIO 27 (pin 13) | 3.3V supply; active-low (LOW = dark); hardware hysteresis via onboard pot |
| HC-SR04 | TRIG | GPIO 23 (pin 16) | direct connection |
| HC-SR04 | ECHO | GPIO 24 (pin 18) | via 1kΩ/2kΩ voltage divider (steps the 5V echo down to 3.3V) |
| INA219 | SDA / SCL | GPIO 2 / GPIO 3 (pins 3 / 5) | I²C; address 0x40 |
| Rail | Powers | Pi pins |
|---|---|---|
| 5V | HC-SR501, HC-SR04 | 2, 4 |
| 3.3V | LM393, INA219 (logic) | 1, 17 |
| GND | all sensor grounds + divider leg | 6, 9, 14, 20, 25, 30, 34, 39 |
The INA219 sits high-side, inline on the Pi's 5V USB-C feed, broken out with a female + male USB-C breakout pair so no cable is cut.
| Connection | From → To |
|---|---|
| Shunt in | female USB-C VBUS → INA219 VIN+ |
| Shunt out | INA219 VIN- → male USB-C V+ |
| Ground | female GND → male G (common) |
| CC pull-downs | 5.1kΩ from CC1→GND and CC2→GND on the female board (so the charger enables VBUS) |
| Logic | VCC→3.3V, GND→GND, SDA→GPIO 2, SCL→GPIO 3 |
Full wiring, soldering, and bring-up steps are in
docs/HARDWARE_CONNECTIONS.md.
Each sensor has a distinct role, and the readings are filtered before use:
- Ultrasonic (primary): each reading is spike-rejected (a jump beyond
MAX_PLAUSIBLE_JUMP_CMis dropped unless it persists for several samples) then median-filtered over 5 samples, to absorb multipath bounce off walls/furniture. - PIR: a motion / wake trigger only.
- LDR (LM393): the CLAHE low-light gate (digital dark/bright).
"Presence" is a fusion of three signals (logical OR): PIR motion, a close ultrasonic reading (distance < PRESENCE_DISTANCE_CM, ~350 cm), or a YOLO person-detection (the vision vote). Any one signal refreshes presence; absence is declared only when all three have been quiet for PRESENCE_TIMEOUT_S (~12 s).
The vision vote is essential because PIR senses motion, not presence — a motionless person reads identical to an empty room, so without the vision vote the node would wrongly drop to SLEEP on a still occupant.
Inference is controlled through a 5-state finite state machine.
- SLEEP — no inference; the loop polls at ~0.5 s watching for a wake signal. Exits to STANDBY when the PIR fires or the ultrasonic detects an object within
WAKE_DISTANCE_CM(~300 cm), held long enough to pass the debounce gate. - STANDBY — a transitional state entered on waking; no inference. Polls fast (~0.05 s) to clear a brief warm-up (
STANDBY_WARMUP_S, ~200 ms), then picks the active state by distance. - ACTIVE-LO — object present and close (
distance < HILO_CLOSE_CM, ~120 cm). Runs the 320×320 model: a close subject is large in frame, so low resolution suffices and is cheap (low power, low latency). Loop polls at ~0.01 s for responsiveness. This is also the wind-down state — ACTIVE-HI drops here when presence has been quiet for over ~2 s. Exits to ACTIVE-HI if the target moves far, or to SLEEP once presence is absent (> ~12 s). - ACTIVE-HI — object present but far (
distance ≥ HILO_FAR_CM, ~160 cm). Runs the 640×640 model: a distant/small subject needs the higher resolution, but it is expensive (~1 s/frame on the Pi), so the loop is paced slow (~0.40 s). Exits to ACTIVE-LO when the target comes close or presence goes quiet. - WATCHDOG — entered from any state immediately on sensor-health failure: a
pigpiofault, no valid ultrasonic echo for > 2 s, or ≥ 3 consecutive dropped echo readings. Captured frames are CLAHE-preprocessed and inferred with the 640×640 model (worst-case assumption: far and dark). Exits to STANDBY only after the sensors stay healthy for ~1.5 s, so a marginal/flaky sensor cannot flap WATCHDOG ↔ ACTIVE.
Hysteresis note: the close (
HILO_CLOSE_CM, ~120 cm) and far (HILO_FAR_CM, ~160 cm) thresholds differ on purpose — the 120–160 cm gap is a hysteresis band that, together with the transition gate below, stops the model flickering HI↔LO at the boundary.
State decisions are debounced by a single timed TransitionGate: an edge commits only when its condition has held continuously for hold_s and the current state has been occupied for at least dwell_s. Timing the streak (rather than counting loop ticks) makes the debounce behave identically regardless of how fast the loop runs in each state.
The gate guards every flicker-prone edge:
- HI ↔ LO — condition held
HILO_HOLD_S(~0.75 s) and ≥HILO_DWELL_S(~1.5 s) in-state. - SLEEP wake — hysteretic (wake at
WAKE_DISTANCE_CM, stay awake out to the widerPRESENCE_DISTANCE_CM) and confirmed (WAKE_HOLD_S), so a stray PIR pulse can't wake the node. - WATCHDOG recovery — sticky: leave only after the sonar stays healthy for
WATCHDOG_RECOVER_HOLD_Sand the minimum dwell elapses, so a marginal sensor can't flap the failsafe.
Inference runs on tflite-runtime (the lightweight CPU interpreter — not ultralytics/torch, which are too heavy for the Pi), using full-integer INT8 YOLOv8-nano models. INT8 is chosen for the Pi 4B's ARM Neon SIMD unit, which does INT8 multiply-accumulate faster than FP32 and at lower memory bandwidth.
Because a full-integer INT8 model bakes its input resolution in at export time, the adaptive 320/640 switch is achieved by loading two models (one per resolution) and selecting the matching interpreter per FSM state. The pre/post-processing that ultralytics would do internally is reimplemented by hand in NumPy/OpenCV: letterbox → quantize → invoke → dequantize → decode the YOLOv8 head → NMS → map boxes back to the original frame's pixels. Model export to .tflite is done off-device.
Orthogonal to the 5 states, the captured frame is enhanced when the LDR reports darkness (LOW output). The BGR frame is converted to YUV and CLAHE is applied to the Y (brightness) channel only, leaving chroma (U, V) untouched so colour is not distorted — then converted back.
tileGridSize = (8, 8)— splits the frame into an 8×8 grid (64 tiles, ~80×60 px each on a 640×480 frame), equalising each tile against its own local histogram (bilinearly interpolated across tiles to avoid blocky seams).clipLimit = 2.0— the contrast cap. The histogram has 256 bins (8-bit Y); OpenCV clips each bin atclipLimit × (tile_pixels / 256)≈ 2× the average bin height (~4800 px/tile ÷ 256 ≈ 19 px → clip at ~38 px), then redistributes the excess. This bounds the slope of the equalisation curve, which stops near-flat dark regions from amplifying sensor noise.2.0is a deliberately mild value.
WATCHDOG forces CLAHE on regardless (worst-case low-light assumption).
To manage the Pi's limited compute and prevent jitter in frame capture and inference, work is split across threads and pinned to specific cores with os.sched_setaffinity:
| Core | Responsibility |
|---|---|
| 0 | OS tasks + (optional) snapshot writing and cloud telemetry upload |
| 1 | Sensor I/O (GPIO harvester) |
| 2 & 3 | Camera capture, CLAHE, FSM-based inference (tflite-runtime), and GUI — run sequentially |
The implementation uses up to 6 threads — 4 run by default, plus 2 that start only with their opt-in flags:
- Main — launches the worker threads and parks until a shutdown event or
Ctrl+C. - GPIO harvester — continuously samples PIR, LM393 (LDR), and HC-SR04 (ultrasonic).
- Vision engine — camera capture, CLAHE, YOLOv8 inference, and GUI.
- pigpio echo callback — runs in pigpio's own real-time thread (the ISR equivalent), timestamping ultrasonic echo edges.
- Cloud uploader (opt-in,
--cloud) — uploads telemetry to ThingSpeak. - Snapshot writer (opt-in,
--snapshots) — writes detection frames to the SD card.
Cross-thread communication:
| Channel | Type | Between |
|---|---|---|
echo_lock |
mutex | pigpio echo callback ⇄ GPIO harvester |
| shared sensor state | mutex | GPIO harvester ⇄ vision engine |
_latest_record_lock |
mutex | vision engine ⇄ cloud uploader |
_snapshot_queue |
bounded queue (drop-oldest) | vision engine ⇄ snapshot writer |
The hot threads (sensors, vision) never block on slow I/O — they hand work to the background threads via a latest-value snapshot (sensor state, telemetry record) or a bounded queue (snapshots). This scales across cores despite Python's GIL because the heavy sections (TFLite invoke(), OpenCV, socket I/O) release the GIL and run as true native parallel work.
Every loop the vision thread assembles one telemetry record (a dict) and fans it out to non-blocking sinks (so a sink can never stall the FSM). The record:
| Field | Description |
|---|---|
state |
FSM state (SLEEP / STANDBY / ACTIVE-LO / ACTIVE-HI / WATCHDOG) |
model_res |
inference resolution / model used (320, 640, or --- when idle) |
latency_ms |
per-frame inference duration |
cpu_pct, cpu_temp_c |
CPU utilisation % and core temperature °C |
power_w |
whole-Pi power draw (INA219 over I²C; -- W until the sensor is wired) |
distance_cm |
median ultrasonic distance |
detections |
list of (label, confidence%) |
The default sink is the terminal sink, which prints one fixed-format line per loop to the Pi's own console — so the system runs fully offline:
[14:22:07] ACTIVE-LO | model 320 | lat 28.4ms | cpu 47.0% | temp 58.1C | pwr -- W | dist 95.3cm | dets: Student/Person(94.2%)
--cloud— a background thread POSTs power / latency / CPU-temp / distance to a ThingSpeak channel every 20 s (the free-tier rate limit).emit_telemetry()only stashes the latest record, so the network call never runs in the inference loop. The write key is read from a git-ignored.env. Seedocs/SETUP.md, Phase 4.--snapshots— on a person detection, a worker thread writes a JPEG of the frame to./snapshots/(ring-buffered to a file cap, filename carries timestamp/class/conf/state).
Both are opt-in and add Wi-Fi/disk activity, so keep them off during measured benchmark runs (they would bias the INA219's whole-Pi reading).
For demos the node opens a local window on the Pi's HDMI monitor (default; pass --headless to disable it for remote deployment). It shows the live feed with blue detection boxes and a HUD header above the unobstructed video:
SAGE-Vision ACTIVE-LO ← state name colour-coded (green active / grey idle / red watchdog)
Model 320 | Objects: 3 | 28 ms | 31 FPS
cpu 47% | 58C | -- W | dist 95 cm
During SLEEP/STANDBY the video area shows a "SYSTEM IDLE" placeholder. Keys: q quits cleanly, f toggles fullscreen. The GUI runs inside the vision thread (Cores 2 & 3) and adds negligible cost.
The headline metric — energy draw — is measured with an INA219 sitting high-side, inline on the Pi's incoming 5V USB-C feed (see Hardware & Wiring), so it reads the whole-Pi power draw over I²C. The reading fills the power_w telemetry column (watts), and reads -- W until the sensor is wired. The node runs fine without it — power instrumentation is only for benchmarking.
- The 640 INT8 model currently saturates (an INT8 calibration issue), weakening the ACTIVE-HI vision vote; it needs re-export with a representative calibration set.
- PIR and LDR are not health-monitorable — a dead pin reads as a quiet/lit room and is invisible to the WATCHDOG (only the ultrasonic sensor is observable). An out-of-range echo (an empty room) still returns a valid pulse and counts as healthy.
is_darkhas no software debounce, so CLAHE can toggle frame-to-frame at the light threshold (it relies on the LM393's hardware hysteresis).- The median distance filter adds ~0.12 s of lag (accepted for stability).
- The energy benefit is not yet proven — it requires the INA219 rig built and a controlled measurement run against the baseline.
SAGE-Vision/
├── firmware/
│ └── esp32_sensor_node/
│ └── esp32_sensor_node.ino # LEGACY — original ESP32 sensor transmitter (unused)
├── rpi_edge/
│ ├── pi_edge_node.py # Main adaptive inference node (reads GPIO sensors)
│ ├── yolo_tflite.py # Lightweight tflite-runtime YOLOv8 detector
│ ├── requirements.txt # Pi Python dependencies
│ ├── .env.example # Template for the ThingSpeak key (copy to .env)
│ ├── yolov8n_320_int8.tflite # INT8 TFLite model — 320×320 (ACTIVE-LO)
│ └── yolov8n_640_int8.tflite # INT8 TFLite model — 640×640 (ACTIVE-HI / WATCHDOG)
├── test/
│ ├── test_baseline_edge.py # Unoptimised control benchmark (terminal + GUI)
│ └── analyze_log.py # Parse a captured telemetry log into per-run metrics
├── docs/
│ ├── SETUP.md # Installation and execution guide
│ ├── TESTING.md # Benchmarking and validation procedure
│ ├── HARDWARE_CONNECTIONS.md # Wiring tables for all sensors + INA219
│ └── ENGINEERING_LOG.md # Problems faced, solutions, tradeoffs, limitations
├── snapshots/ # Detection JPEGs (created at runtime; git-ignored)
├── .env.example # Template for the ThingSpeak key (copy to .env)
├── .gitignore
└── README.md # This file — project overview & architecture
- docs/SETUP.md — installation and execution (HDMI or VNC display), plus the optional
--cloudsetup. - docs/HARDWARE_CONNECTIONS.md — full wiring for every sensor and the INA219 power rig.
- docs/TESTING.md — benchmarking the adaptive node against the baseline.
| Decision | Rationale |
|---|---|
| On-Pi terminal telemetry + local GUI, offline by default | No network in the live path, no frame-encoding overhead, no second machine |
| Telemetry record + non-blocking sinks | Terminal always; cloud/snapshot sinks drop in without touching the FSM |
Direct GPIO sensors via pigpio |
Removes the ESP32 and serial link; hardware-timestamped echo edges keep distance accurate |
| Two pinned threads + core affinity | Isolates GPIO sensor I/O jitter from the inference loop's timing |
| Sensor-fused presence (PIR ∨ proximity ∨ vision vote) | A still person is invisible to PIR alone; fusion prevents false SLEEP |
| Timed TransitionGate on every edge | Debounce behaves identically regardless of per-state loop pace; kills flicker |
| Two fixed-resolution INT8 models | Full-integer export bakes input size; switching models is the adaptive resolution |
| INT8 TFLite over FP32 PyTorch | 2–4× lower inference time on ARM Neon; no GPU required |
| CLAHE on luma over global brightness | Preserves local contrast for detection; global boost washes out fine edges |
This project's source code is released under the MIT License — you're free to use, modify, and distribute it with attribution.
Note on the model weights: the bundled
yolov8n_*_int8.tflitefiles are derived from Ultralytics YOLOv8, which is licensed AGPL-3.0. The MIT license above covers this project's own code; redistributing or deploying the YOLOv8-derived weights may carry AGPL-3.0 obligations. For non-AGPL use, see Ultralytics' commercial licensing.