Skip to content

likhithayinala/Highly-Optimized-NN-using-PyCUDA

Repository files navigation

Optimizing MLP Kernels

This project involves implementing various versions of a Multilayer Perceptron (MLP) kernel to improve execution time while maintaining accuracy.

Project Structure

  • resources/
  • scripts/
    • infer.sh
    • profile.sh
    • pytorch_train.sh
  • src/
    • application_training/
      • train.py
    • kernels/
      • __init__.py
      • improved_kernel.py
      • naive_kernel.py
      • tensor_kernel.py
    • model_weights/
      • model.pt
    • results/
      • batch_times_mem.png
      • batch_times_proc.png
      • report_1024.ncu-rep
      • report.ncu-rep
      • time_taken_hidden_proc.png
      • time_taken_hidden.png
    • analysis.py
    • config.yaml
    • inference.py
    • model.py
    • profiler.py
    • utils.py
  • .gitignore
  • ReadME.md

Description

The goal is to optimize different implementations of MLP kernels to achieve better execution times without sacrificing accuracy.

Directories and Files

  • scripts/: Contains shell scripts for training, inference, and profiling.
    • pytorch_train.sh: Script to train the pytorch MLP model to test performance.
    • infer.sh: Script to perform inference using the trained model.
    • profile.sh: Script to profile the MLP kernels.
  • src/: Source code and related files.
    • application_training/: Training application scripts.
      • train.py: Training script for the pytorch MLP model to test performance.
    • kernels/: Different MLP kernel implementations.
      • naive_kernel.py: Basic MLP kernel implementation.
      • improved_kernel.py: Optimized MLP kernel.
      • tensor_kernel.py: MLP kernel using tensor operations.
    • model_weights/: Contains saved model weights.
      • model.pt: Trained model file.
    • results/: Profiling reports and result images.
      • batch_times_mem.png, batch_times_proc.png: Batch time analysis plots.
      • report.ncu-rep, report_1024.ncu-rep: Profiling reports.
      • time_taken_hidden.png, time_taken_hidden_proc.png: Hidden layer time analysis plots.
    • analysis.py: Script for analyzing results and plotting graphs.
    • config.yaml: Configuration file for the project.
    • inference.py: Script to run inference.
    • model.py: Defines the MLP model architecture.
    • profiler.py: Profiling utilities.
    • utils.py: Helper functions.
  • tensor_kernel_correctness_check: Verifies that the output of tensor_kernel matches actual numpy implementation.

Getting Started

Prerequisites

  • Python 3.x
  • PyTorch
  • CUDA Toolkit (if using GPU acceleration)

Installation

Clone the repository and navigate to the project directory:

git clone https://github.com/yourusername/e4750-2024fall-project-nerf.git
cd e4750-2024fall-project-nerf

Install the required Python packages:

pip install -r requirements.txt

Running the Project

  • Training:

    bash scripts/pytorch_train.sh
  • Inference:

    bash scripts/infer.sh
  • Profiling:

    bash scripts/profile.sh

Results

Results and analysis can be found in the src/results/ directory.

License

This project is part of the EECSE4750 course.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published