A multimodal note-taking system you drive hands-free: speak your notes, and a head-nod commits what you just said. Speech runs through Whisper, head pose runs through a MediaPipe Tasks pipeline, and committed text flows through a plug-in chain over a ZeroMQ bus.
What Makes It Different
Most note-taking tools optimize for keyboard input. GAINS is designed for situations where your hands are busy or a keyboard isn’t practical — standing meetings, lab work, cooking, workshops. Voice captures content; a simple nod commits it, so you never have to touch the machine.
Voice + Nod Capture
Speech is transcribed with Whisper. Head pose — the nod-to-commit signal — is tracked with MediaPipe Tasks, running as a Python vision service. A standard webcam is all that’s required.
Capture → Commit
Speech is transcribed verbatim with Whisper; a nod commits the current utterance into the note. Committed text flows through a plug-in chain — including an optional LLM grammar-fixer that preserves the original wording — before it’s written out. The note is captured text, faithfully transcribed; parsing speech into structured sections / action items is a planned direction, not yet built.
Service Architecture
A ZMQ bridge decouples the desktop shell from the Python inference services, so speech processing and head-pose detection each run as independent workers communicating over a typed message bus (gesture.nod, text.committed, …). Tests covering the ZMQ bridge and service integration live next to the services.
Technical Stack
- Shell: Tauri 2 + Leptos 0.8 desktop app (Rust/WASM UI, cross-platform)
- Services: Python microservices for speech (ASR/TTS) and head pose
- Transport: ZeroMQ plug-in bus between shell and services
- Head pose: MediaPipe Tasks (nod detection)
- Voice: faster-whisper (CTranslate2) ASR + Piper TTS; an optional LLM grammar-fixer (verbatim-preserving) in the commit chain
- Layout:
services/Python workers ·tauri-app/desktop shell
Current State
Modernized in May 2026 after an earlier pause, and still actively maintained: the Phase 1 rebuild (repo consolidation, Tauri 2 + Leptos 0.8 upgrade, MediaPipe Tasks migration, PR-CI) landed mid-May, with dependency bumps merged to main and CI passing through early June 2026. Still pre-release, though: the v0.3.0–v0.6.0 tags predate the modernization and their release-build workflow failed, so there is no published GitHub Release and no signed downloadable build a stranger can install — the voice-and-nod pipeline runs from source. Read the public surface as an active work-in-progress, not a shipping product.
Repository README
GAINS — Gesture-Assisted Intelligent Note Scribe
GAINS is a multi-modal note-taking app: you speak, and nod to commit what you just said. Speech goes through Whisper, head pose through a MediaPipe Tasks pipeline, and committed text flows through a plug-in chain (e.g. grammar fixing) before landing in your notes. Everything runs locally.
Architecture
┌──────────────────────┐ ┌───────────────────────────────────────┐
│ Tauri 2 + Leptos 0.8 │ │ ZeroMQ bus (XSUB/XPUB proxy) │
│ (Rust desktop UI) │───▶│ publishers connect → tcp://*:5556 │
│ listens on 5555 │◀───│ subscribers connect → tcp://*:5555 │
└──────────────────────┘ └───────────────────────────────────────┘
▲ ▲ ▲ ▲ ▲
┌──────────────────────┘ │ │ │ └──────────────────────┐
│ ┌───────────┘ │ └───────────┐ │
▼ ▼ ▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌───────────┐ ┌─────────┐ ┌─────────┐
│ services │ │ services/ │ │ services/ │ │ plugins/│ │ services│
│ /asr │ │ vision │ │ tts │ │ /grammar│ │ /notes │
│ (faster- │ │ (Mediapipe │ │ (Piper) │ │ _guard │ │ writer │
│ whisper) │ │ Tasks) │ │ │ │ /sample│ │ │
└──────────┘ └────────────┘ └───────────┘ └─────────┘ └─────────┘
Events flowing on the bus:
| Topic | Payload | Producer |
|---|---|---|
heartbeat |
{ts} |
bus |
asr.partial |
{text, ts, confidence, start, end, words[]} |
asr |
gesture.nod |
{ts, pitch_deg} |
vision |
text.committed |
{text, ts} |
Tauri shell |
plugin.rewrite |
{text, orig_ts, plugin, ts} |
any plug-in |
tts.play |
{text, ts} |
asr (silence) |
Quick start
# Python (3.11+) — pick the dep groups you need
pip install -e ".[asr,vision,tts,plugins,dev]"
# Run the services (each in its own terminal)
gains-bus # XSUB/XPUB proxy on 5555 / 5556
gains-asr # streaming whisper transcription
gains-vision # MediaPipe Tasks head pose
gains-tts # piper TTS with platform fallback
gains-notes # txt/md/json export
gains-plugins # plug-in runner
# Desktop shell (Tauri 2 + Leptos 0.8)
cd tauri-app
cargo install --locked trunk
cargo install tauri-cli --version "^2" --locked
cargo tauri dev
Configuration
services/asr/config/settings.yaml:
asr_language: en # ISO-639-1 code
asr_model: small # tiny / base / small / medium / large-v3 / large-v3-turbo / distil-large-v3
For GPU acceleration set DEVICE=gpu (uses CTranslate2 + CUDA float16).
For the grammar-guard plugin, set OPENAI_API_KEY and optionally
GRAMMAR_GUARD_MODEL (default gpt-4o-mini).
Plug-ins
Drop a Python module at plugins/<name>/plugin.py that:
- Subscribes to
tcp://localhost:5555(bus XPUB side). - Reads JSON messages; acts on
text.committed. - Publishes back on
tcp://localhost:5556(XSUB side) with an event of shape{"event": "plugin.rewrite", "text": ..., "plugin": "<name>", ...}.
The note exporter splices plugin.rewrite payloads into the session in
place of the original ASR text. See plugins/sample_rewriter/plugin.py
for a 30-line template, and docs/plugins.md for the full reference.
Development
# Python lint + tests
ruff check .
pytest
# Rust backend
cd tauri-app/src-tauri
cargo fmt --check
cargo clippy --all-targets -- -D warnings
cargo check
# Leptos WASM frontend
cd tauri-app
cargo check -p gains-ui --target wasm32-unknown-unknown
The ci.yml workflow runs all of the above on every PR.
Modernization status
See docs/modernization-assessment.md for the full architectural audit and
roadmap. As of branch claude/assess-modernization-61ThF:
- Phase 1 (this branch) — repository consolidation, Python service bug fixes, openai-python v1 migration, MediaPipe Tasks migration, Tauri 2 / Leptos 0.8 upgrade, updater plugin wired up, PR-CI introduced.
- Phase 2 (future) — replace Python services with native Rust crates
(
whisper-rs/sherpa-onnx/ort+ MediaPipe Tasks.taskmodel) so the runtime ships with zero Python dependency. - Phase 3 (future) — replace per-plug-in subprocess model with WASM plug-ins via Extism + Wasmtime.