Mnemosyne

A real-time audio transcription and diarization system that can capture both system audio and microphone input simultaneously.

Features

Real-time audio transcription using OpenAI's Whisper large-v3 model
Speaker diarization using pyannote.audio 3.1
Automatic summarization using Llama 3.2 (via Ollama)
Supports multiple audio sources simultaneously (e.g., system audio + microphone)
Real-time display of transcriptions with speaker identification
Exports transcripts in markdown format with timestamps
Progress tracking and status updates during processing

Prerequisites

Python 3.10 (required for compatibility with pyannote.audio)
NVIDIA GPU with CUDA support (recommended)
PipeWire audio system (for Linux)
Node.js and npm (for frontend)
Hugging Face account and API token (for diarization model)
Ollama with Llama 3.2 model (for summarization)

Installation

Install Ollama and Llama 3.2:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull Llama 2 model
ollama pull llama3.2

Create and activate a Python virtual environment:

python3.10 -m venv venv
source venv/bin/activate

Install PyTorch and torchaudio:

pip install --upgrade pip wheel setuptools
pip install torch --index-url https://download.pytorch.org/whl/cu118
pip install torchaudio --index-url https://download.pytorch.org/whl/cu118

Install Python dependencies:

pip install Cython
pip install "numpy>=1.22,<1.24"
pip install -r backend/requirements.txt

Install frontend dependencies:

cd frontend
npm install

Configuration

Create a .env file in the root directory:

HUGGINGFACE_TOKEN=your_token_here
OLLAMA_HOST=http://localhost:11434  # Ollama API endpoint

Hugging Face Setup:
- Go to https://huggingface.co/settings/tokens
- Create a new token with read access
- Copy the token and paste it in your .env file
Accept terms for these models:
- https://huggingface.co/pyannote/speaker-diarization-3.1
- https://huggingface.co/pyannote/segmentation-3.0
Verify Ollama setup:

# Check Ollama is running
curl http://localhost:11434/api/tags

# Verify Llama 3.2 is available
ollama list

Running the Application

Start Ollama (if not running):

systemctl start ollama

Start the backend:

cd backend
PYTHONPATH=. uvicorn src.api.main:app --reload

Start the frontend (in a new terminal):

cd frontend
npm start

Open http://localhost:3000 in your browser

Usage

Select audio sources:
- Choose system audio source(s) to capture desktop audio
- Choose microphone input to capture your voice
- You can select multiple sources to capture simultaneously
Click "Start Recording" to begin transcription
- The system will record audio from selected sources
- Audio is saved as a WAV file in the recordings directory
Click "Stop Recording" when finished
- The system will process the recording
- Status updates show progress
- Transcription appears with speaker identification
- A summary is generated using Llama 2
- Files are saved in the transcripts directory

Output Files

The system generates several files:

recordings/recording_[timestamp].wav - The recorded audio file
transcripts/transcript_[timestamp].md - The transcript with:
- Speaker identification
- Timestamps
- Full transcript
- Generated summary

Features in Detail

Audio Capture

Multiple source recording
Proper audio mixing
16-bit WAV format
Automatic gain control

Transcription

Using Whisper large-v3 model
Word-level timestamps
High accuracy for multiple languages
Optimized for GPU processing

Speaker Diarization

Using pyannote.audio 3.1
Advanced speaker separation
Handles multiple speakers
Optimized clustering parameters

Summarization

Using Llama 3.2 via Ollama
Context-aware summaries
Maintains speaker attribution
Handles long conversations

User Interface

Real-time status updates
Clear speaker identification
Timestamp display
Processing progress indicators
Device selection interface

Troubleshooting

Common Issues

No system audio sources available:
- Make sure PipeWire is running: systemctl --user status pipewire
- Check available sources: pw-cli list-objects | grep -A 3 "Monitor"
GPU memory errors:
- Free up GPU memory by closing other applications
- Monitor GPU usage with nvidia-smi
Installation errors:
- Make sure you're using Python 3.10
- Install PyTorch before other dependencies
- Check CUDA compatibility with nvidia-smi
Audio quality issues:
- Check input device levels
- Verify proper device selection
- Monitor audio peaks during recording
Summarization issues:
- Verify Ollama is running: systemctl status ollama
- Check Llama 3.2 model is installed: ollama list
- Verify API endpoint in .env file

Getting Help

If you encounter issues:

Check the console output for error messages
Look for similar issues in the project's issue tracker
Include relevant error messages and system information when reporting issues

License

[Insert License Information]

Acknowledgments

This project uses:

OpenAI Whisper for transcription
Pyannote Audio for speaker diarization
Faster Whisper for optimized inference
CTC Forced Aligner for timestamp alignment
Ollama for running Llama 2
Llama 2 for summarization

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mnemosyne

Features

Prerequisites

Installation

Configuration

Running the Application

Usage

Output Files

Features in Detail

Audio Capture

Transcription

Speaker Diarization

Summarization

User Interface

Troubleshooting

Common Issues

Getting Help

License

Acknowledgments

About

Releases

Packages

Languages

corpetty/mnemosyne

Folders and files

Latest commit

History

Repository files navigation

Mnemosyne

Features

Prerequisites

Installation

Configuration

Running the Application

Usage

Output Files

Features in Detail

Audio Capture

Transcription

Speaker Diarization

Summarization

User Interface

Troubleshooting

Common Issues

Getting Help

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages