SUM EQUITIES

GAINS — Gesture-Assisted Intelligent Note Scribe

Hands-free desktop note-taking: speak to capture, nod to commit. Tauri 2 + Leptos shell over Python services (faster-whisper ASR, MediaPipe Tasks head pose, Piper TTS) on a ZeroMQ plug-in bus.

In development last commit 2 days ago · 20 commits / 30d Verified Jun 6, 2026

GAINS — hands-free voice-and-nod note-taking

A multimodal note-taking system you drive hands-free: speak your notes, and a head-nod commits what you just said. Speech runs through Whisper, head pose runs through a MediaPipe Tasks pipeline, and committed text flows through a plug-in chain over a ZeroMQ bus.

What Makes It Different

Most note-taking tools optimize for keyboard input. GAINS is designed for situations where your hands are busy or a keyboard isn’t practical — standing meetings, lab work, cooking, workshops. Voice captures content; a simple nod commits it, so you never have to touch the machine.

Voice + Nod Capture

Speech is transcribed with Whisper. Head pose — the nod-to-commit signal — is tracked with MediaPipe Tasks, running as a Python vision service. A standard webcam is all that’s required.

Capture → Commit

Speech is transcribed verbatim with Whisper; a nod commits the current utterance into the note. Committed text flows through a plug-in chain — including an optional LLM grammar-fixer that preserves the original wording — before it’s written out. The note is captured text, faithfully transcribed; parsing speech into structured sections / action items is a planned direction, not yet built.

Service Architecture

A ZMQ bridge decouples the desktop shell from the Python inference services, so speech processing and head-pose detection each run as independent workers communicating over a typed message bus (gesture.nod, text.committed, …). Tests covering the ZMQ bridge and service integration live next to the services.

Technical Stack

  • Shell: Tauri 2 + Leptos 0.8 desktop app (Rust/WASM UI, cross-platform)
  • Services: Python microservices for speech (ASR/TTS) and head pose
  • Transport: ZeroMQ plug-in bus between shell and services
  • Head pose: MediaPipe Tasks (nod detection)
  • Voice: faster-whisper (CTranslate2) ASR + Piper TTS; an optional LLM grammar-fixer (verbatim-preserving) in the commit chain
  • Layout: services/ Python workers · tauri-app/ desktop shell

Current State

Modernized in May 2026 after an earlier pause, and still actively maintained: the Phase 1 rebuild (repo consolidation, Tauri 2 + Leptos 0.8 upgrade, MediaPipe Tasks migration, PR-CI) landed mid-May, with dependency bumps merged to main and CI passing through early June 2026. Still pre-release, though: the v0.3.0v0.6.0 tags predate the modernization and their release-build workflow failed, so there is no published GitHub Release and no signed downloadable build a stranger can install — the voice-and-nod pipeline runs from source. Read the public surface as an active work-in-progress, not a shipping product.

Repository README

GAINS — Gesture-Assisted Intelligent Note Scribe

GAINS is a multi-modal note-taking app: you speak, and nod to commit what you just said. Speech goes through Whisper, head pose through a MediaPipe Tasks pipeline, and committed text flows through a plug-in chain (e.g. grammar fixing) before landing in your notes. Everything runs locally.

Architecture

┌──────────────────────┐    ┌───────────────────────────────────────┐
│ Tauri 2 + Leptos 0.8 │    │   ZeroMQ bus (XSUB/XPUB proxy)       │
│  (Rust desktop UI)   │───▶│   publishers connect → tcp://*:5556  │
│   listens on 5555    │◀───│   subscribers connect → tcp://*:5555 │
└──────────────────────┘    └───────────────────────────────────────┘
                                       ▲ ▲ ▲ ▲ ▲
                ┌──────────────────────┘ │ │ │ └──────────────────────┐
                │            ┌───────────┘ │ └───────────┐            │
                ▼            ▼             ▼             ▼            ▼
        ┌──────────┐  ┌────────────┐ ┌───────────┐  ┌─────────┐ ┌─────────┐
        │ services │  │ services/  │ │ services/ │  │ plugins/│ │ services│
        │   /asr   │  │   vision   │ │    tts    │  │ /grammar│ │ /notes  │
        │ (faster- │  │ (Mediapipe │ │  (Piper)  │  │  _guard │ │ writer  │
        │ whisper) │  │  Tasks)    │ │           │  │  /sample│ │         │
        └──────────┘  └────────────┘ └───────────┘  └─────────┘ └─────────┘

Events flowing on the bus:

Topic Payload Producer
heartbeat {ts} bus
asr.partial {text, ts, confidence, start, end, words[]} asr
gesture.nod {ts, pitch_deg} vision
text.committed {text, ts} Tauri shell
plugin.rewrite {text, orig_ts, plugin, ts} any plug-in
tts.play {text, ts} asr (silence)

Quick start

# Python (3.11+) — pick the dep groups you need
pip install -e ".[asr,vision,tts,plugins,dev]"

# Run the services (each in its own terminal)
gains-bus            # XSUB/XPUB proxy on 5555 / 5556
gains-asr            # streaming whisper transcription
gains-vision         # MediaPipe Tasks head pose
gains-tts            # piper TTS with platform fallback
gains-notes          # txt/md/json export
gains-plugins        # plug-in runner

# Desktop shell (Tauri 2 + Leptos 0.8)
cd tauri-app
cargo install --locked trunk
cargo install tauri-cli --version "^2" --locked
cargo tauri dev

Configuration

services/asr/config/settings.yaml:

asr_language: en          # ISO-639-1 code
asr_model: small          # tiny / base / small / medium / large-v3 / large-v3-turbo / distil-large-v3

For GPU acceleration set DEVICE=gpu (uses CTranslate2 + CUDA float16).

For the grammar-guard plugin, set OPENAI_API_KEY and optionally GRAMMAR_GUARD_MODEL (default gpt-4o-mini).

Plug-ins

Drop a Python module at plugins/<name>/plugin.py that:

  1. Subscribes to tcp://localhost:5555 (bus XPUB side).
  2. Reads JSON messages; acts on text.committed.
  3. Publishes back on tcp://localhost:5556 (XSUB side) with an event of shape {"event": "plugin.rewrite", "text": ..., "plugin": "<name>", ...}.

The note exporter splices plugin.rewrite payloads into the session in place of the original ASR text. See plugins/sample_rewriter/plugin.py for a 30-line template, and docs/plugins.md for the full reference.

Development

# Python lint + tests
ruff check .
pytest

# Rust backend
cd tauri-app/src-tauri
cargo fmt --check
cargo clippy --all-targets -- -D warnings
cargo check

# Leptos WASM frontend
cd tauri-app
cargo check -p gains-ui --target wasm32-unknown-unknown

The ci.yml workflow runs all of the above on every PR.

Modernization status

See docs/modernization-assessment.md for the full architectural audit and roadmap. As of branch claude/assess-modernization-61ThF:

  • Phase 1 (this branch) — repository consolidation, Python service bug fixes, openai-python v1 migration, MediaPipe Tasks migration, Tauri 2 / Leptos 0.8 upgrade, updater plugin wired up, PR-CI introduced.
  • Phase 2 (future) — replace Python services with native Rust crates (whisper-rs / sherpa-onnx / ort + MediaPipe Tasks .task model) so the runtime ships with zero Python dependency.
  • Phase 3 (future) — replace per-plug-in subprocess model with WASM plug-ins via Extism + Wasmtime.

Related work