Multi-Modal Robot Arm Control

Python OpenCV C++ Random Forest MediaPipe Whisper API Embedded Systems

Multi-Modal Robot Arm Control

Advanced control system for a robotic arm combining multiple human input channels into a unified actuation pipeline.

Control Modalities

1. EMG Gesture Control

  • BITalino EMG signal acquisition at high sampling rate
  • Notch (50/60 Hz) + bandpass (20–400 Hz) filtering
  • Random Forest gesture classification with confidence thresholds
  • Real-time oscilloscope for electrode placement verification

2. Computer Vision Hand Tracking

  • MediaPipe Hands landmark inference
  • Per-finger OPEN/CLOSED heuristic detection
  • Angle mapping (0°/180°) serialized to microcontroller
  • OpenCV integration for real-time processing

3. Speech / STT Control

  • Whisper API transcription from audio windows
  • Fuzzy intent matching to gesture commands
  • Extensible command mapping system

Architecture

┌─────────────────┐        ┌──────────────────┐
│ emg_control.py  │──CSV──▶│                  │
├─────────────────┤        │  Microcontroller │
│ hand_control.py │──CSV──▶│    (main.cpp)    │──▶ Servos
├─────────────────┤        │                  │
│ stt_control.py  │──CSV──▶│                  │
└─────────────────┘        └──────────────────┘

Key Features

  • Multi-input fusion: EMG, Computer Vision, and Speech inputs
  • Consistent protocol: 5-channel angle communication (thumb, index, middle, ring, pinky)
  • Model integrity: Paired scaler + model safeguards
  • Real-time visualization: EMG oscilloscope for debugging
  • Safety controls: Emergency stop pathway and idle timeout

Technical Stack

  • Firmware: C++ (PlatformIO) with serial CSV parser
  • Signal Processing: NumPy, SciPy for filtering and feature extraction
  • ML: scikit-learn Random Forest with custom feature engineering
  • Vision: OpenCV + MediaPipe for hand pose estimation

Source Code

GitHub Repository