A multimodal note-taking system that lets you control document creation through hand gestures, voice, and AI. Point to scroll, pinch to select, swipe to navigate — while speaking your notes naturally.
What Makes It Different
Most note-taking tools optimize for keyboard input. GAINS is designed for situations where your hands are busy or a keyboard isn’t practical — standing meetings, lab work, cooking, workshops. Gestures handle navigation and structure while voice captures content.
Gesture Recognition
Hand tracking uses MediaPipe Hands, with the recognizer running as a Python service. A standard webcam is all that’s required.
Voice-to-Structure
Speech isn’t just transcribed — it’s parsed into structured sections, action items, and key points. The gesture layer controls where and how content is placed.
Service Architecture
A ZMQ bridge decouples the desktop shell from the Python inference services, so gesture recognition, speech processing, and summarization each run as independent workers. Tests covering the ZMQ bridge and sprint-level service integration live next to the services.
Technical Stack
- Shell: Tauri desktop app (cross-platform)
- Services: Python microservices for gesture, speech, and summarization
- Transport: ZeroMQ bridge between shell and services
- Gesture: MediaPipe Hands
- Voice: ASR + structured summarization via LLM
- Layout:
GAINS/app core ·services/Python workers ·tauri-app/desktop shell
Repository README
GAINS — Gesture-Assisted Intelligent Note Scribe
GAINS is an innovative note-taking application that leverages gesture recognition to enhance user interaction and productivity. It integrates advanced AI algorithms to interpret user gestures, allowing for a seamless and intuitive note-taking experience across multiple platforms.