Audio Analyzer and Splicer
January 2024 - Present
This project focuses on processing and analyzing audio and video data collected from wearable sensors. It involves generating transcripts from media files and extracting video or audio clips for each word identified in the transcripts. The provided tools and scripts support research in wearable technology and human-computer interaction.
- Operating System: Windows 10, macOS, or Linux
- Conda: Required for managing the Python environment
- Python Version: Python 3.9 (managed via Conda)
- FFmpeg: Must be installed and accessible via Conda
Download and install Anaconda or Miniconda:
Open your terminal or command prompt and create a new Conda environment:
conda create -n whisper_env python=3.9
Activate the environment:
conda activate whisper_env
Install PyTorch in the Conda environment:
conda install pytorch torchvision torchaudio cpuonly -c pytorch
Install the whisper-timestamped
package from GitHub:
pip install git+https://github.com/linto-ai/whisper-timestamped
Install FFmpeg via Conda:
conda install -c conda-forge ffmpeg
Run the following commands to install additional packages:
pip install numpy
pip install ffmpeg-python
pip install moviepy
After setting up the environment, activate the whisper_env
environment before running your scripts:
conda activate whisper_env
You can then use the scripts to generate transcripts and extract word-level clips from your audio or video files.
This project enables advanced analysis of wearable sensor data through speech transcription and media processing. Make sure to keep your Conda environment activated when working on related scripts to ensure all dependencies are properly loaded.