This repository contains an active learning pipeline for efficiently downloading, processing, and managing datasets for object detection tasks using YOLO. The pipeline automates tasks such as video downloading, video formatting, frame extraction, label uploading, and dataset organization.
- Overview
- Pipeline Workflow
- Setup Instructions
- Configuration
- Execution
- CVAT Integration
- Scheduling Periodic Executions
- Dependencies
- Troubleshooting
- License and Contribution
The active learning pipeline automates the following:
- Downloading raw video data from YouTube.
- Formatting videos for compatibility with further processing.
- Extracting frames from videos and generating labels using YOLO inference.
- Uploading the extracted data to labeling platforms (e.g., CVAT).
- Downloading completed labels from the labeling platform.
- Cleaning datasets to ensure consistency.
The pipeline consists of the following steps:
- Download Videos: Fetch relevant videos from YouTube based on predefined queries.
- Fix Video Format: Ensure downloaded videos meet the required format for further processing.
- Process Videos:
- Extract frames from videos.
- Run YOLO inference to generate labels for object detection.
- Save processed frames and labels in an organized structure.
- Upload Labels: Send generated data to CVAT for manual refinement and validation.
- Download Labels: Retrieve the manually labeled datasets from CVAT.
- Clean Labels: Remove frames that lack corresponding labels to maintain dataset integrity.
- Python: Install Python 3.9 or later.
- Dependencies: Install required libraries from
requirements.txt
:pip install -r requirements.txt
- Environment Variables:
- Create a
.env
file in the root directory with the following:CVAT_USER=<Your CVAT Username> CVAT_PASSWORD=<Your CVAT Password> CVAT_PROJECT_ID=<Your CVAT Project ID> MODEL_PATH=<Path to YOLO Model>
- Create a
data/
: Contains configurations, logs, and datasets.modules/
: Contains modular Python scripts for each step in the pipeline.pipeline.log
: Stores execution logs.
The pipeline is configured via data/config.yaml
. Update the configuration as needed:
paths:
downloads: "downloads" # Directory to save downloaded videos
output: "assets/frames" # Directory to save processed data
youtube:
queries:
- "Resumen la liga EA"
- "Resumen Serie A"
- "Resumen Bundesliga"
num_videos: 3 # Number of videos to download per query
resolution: "720" # Video resolution
video:
framerate: 180 # Extract one frame every 3 minutes
cvat:
task_name: "auto" # Task name ("auto" generates name based on date)
annotations_format: "YOLOv8 Detection 1.0"
steps:
download_videos:
enabled: true
fix_video_format:
enabled: true
process_videos:
enabled: true
upload_labels:
enabled: true
download_labels:
enabled: true
clear_empty_labels:
enabled: true
Run the pipeline using the main.py
script:
python main.py
-
Downloading Videos:
- Downloads videos using YouTube queries from the configuration.
- Tracks downloaded videos to avoid duplicates.
-
Fixing Video Format:
- Converts videos to the required pixel format (
yuv420p
).
- Converts videos to the required pixel format (
-
Processing Videos:
- Extracts frames at regular intervals.
- Uses YOLO for initial object detection and label generation.
- Saves frames and labels in the
output
directory.
-
Uploading Labels:
- Uploads the generated frames and labels to CVAT.
-
Downloading Labels:
- Retrieves validated labels from CVAT for further processing.
-
Cleaning Labels:
- Ensures only frames with valid labels are retained.
CVAT (Computer Vision Annotation Tool) is an open-source web-based tool for annotating videos and images for computer vision tasks. This pipeline integrates CVAT to streamline the labeling process by automating the following:
- Uploading extracted frames and initial labels for refinement.
- Downloading the manually labeled datasets for training models.
-
Upload Labels:
- The pipeline creates a CVAT task for the extracted frames and uploads the frames along with YOLO-generated initial labels.
- Requirements:
- Ensure CVAT credentials (
CVAT_USER
andCVAT_PASSWORD
) are set in the.env
file. - Define the
CVAT_PROJECT_ID
in.env
corresponding to the CVAT project where tasks will be created.
- Ensure CVAT credentials (
-
Manual Annotation:
- Once frames are uploaded, log into the CVAT web interface.
- Navigate to the created task under the specified project.
- Refine the labels manually to ensure accuracy.
-
Download Labels:
- After completing annotations, the pipeline downloads the labeled data in the format specified in the
annotations_format
configuration (e.g.,YOLOv8 Detection 1.0
).
- After completing annotations, the pipeline downloads the labeled data in the format specified in the
CVAT_USER
: Your CVAT username or email.CVAT_PASSWORD
: Your CVAT password.CVAT_PROJECT_ID
: The ID of the CVAT project where tasks will be created.
- Upload: The pipeline automatically uploads images and initial annotations to CVAT.
- Annotate: Use CVAT's web interface to refine labels manually.
- Download: The pipeline fetches completed labels, organizing them for further use.
To run the pipeline periodically, use a scheduler like cron or GitHub Actions.
- Open the cron editor:
crontab -e
- Add an entry to run the pipeline daily at midnight:
0 0 * * * /usr/bin/python3 /path/to/main.py >> /path/to/pipeline.log 2>&1
-
Create
.github/workflows/pipeline.yml
:name: Active Learning Pipeline on: schedule: - cron: "0 0 * * *" # Run daily at midnight (UTC) workflow_dispatch: # Manual execution jobs: pipeline: runs-on: ubuntu-latest steps: - name: Checkout repository uses: actions/checkout@v4 - name: Set up Python uses: actions/setup-python@v4 with: python-version: '3.9' - name: Install dependencies run: | pip install -r requirements.txt - name: Run Pipeline env: CVAT_USER: ${{ secrets.CVAT_USER }} CVAT_PASSWORD: ${{ secrets.CVAT_PASSWORD }} CVAT_PROJECT_ID: ${{ secrets.CVAT_PROJECT_ID }} MODEL_PATH: ${{ secrets.MODEL_PATH }} run: | python main.py
-
Add your secrets in the repository settings under
Settings > Secrets and variables > Actions
.
The pipeline uses the following key libraries:
- yt-dlp: For downloading videos from YouTube.
- ffmpeg: For video formatting and processing.
- ultralytics: For YOLO inference and label generation.
- CVAT SDK: For interacting with the CVAT API to upload/download annotations.
- dotenv: For managing environment variables.
- yaml/json: For configuration and logging.
-
Missing Dependencies:
- Ensure all dependencies are installed using
pip install -r requirements.txt
.
- Ensure all dependencies are installed using
-
Invalid Configuration:
- Verify
config.yaml
and.env
for correctness.
- Verify
-
Access Issues in CVAT:
- Check if your CVAT credentials and project ID are valid.
-
Task Not Found in CVAT:
- Ensure that the project ID matches an existing CVAT project.
Refer to pipeline.log
for detailed error messages and execution details.
This project is licensed under the MIT License. See LICENSE
for more details.
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature-name
). - Commit your changes and push the branch.
- Open a pull request.