Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confirm deprendencies for latency performance #13

Closed
escorciav opened this issue Jan 8, 2024 · 4 comments
Closed

Confirm deprendencies for latency performance #13

escorciav opened this issue Jan 8, 2024 · 4 comments

Comments

@escorciav
Copy link

escorciav commented Jan 8, 2024

Hi,

The results on mobile are quite appealing.

  1. Could you kindly confirm the dependencies?
  2. Have you noticed performance improvement/degradation with a more recent computational stack?
    • Pytorch 1.11.0 seems a bit old. Did you use the stable version?
    • Actually, I'm more curious if you try torch.compile (dunno if it plays nice with CoreML).
    • What about new hardware?

FYI just create a fork to port the model onto Qualcomm QNN/SNPE via onnx. Did anyone do that before?

Cheers,
Victor

@escorciav
Copy link
Author

escorciav commented Jan 8, 2024

Relevant issue #3

@escorciav
Copy link
Author

BTW, the requirements.txt does not mentioned einops.
I fixed it via pip install einops => got einops-0.7.0

Didn't check if einops was mentioned in README :)

@Amshaker
Copy link
Owner

Amshaker commented Jan 8, 2024

Hi @escorciav ,

Thanks for your interest in our work!

  • The dependencies are mentioned in the requirements.txt except einops if I remember correctly.

  • Yes, I prepared a demo in ICCV'23 with PyTorch 2.0 and the latest CoreML version (coremltools==6.3.0). The latency just increased by ~0.13 ms compared to the latency reported in the paper for most of the models (not only SwiftFormer) on my device (iPhone 14 pro max - iOS 17.1).

  • I am not sure if torch.compile will enhance the latency or not. What we tried is with and without .contiguous(). The gain in latency is less than 0.1 ms without '.contiguous()'. But it will be interesting to check torch.compile.

  • We measured the latency on iOS and the throughput on NVIDIA A100 GPUs, we have not tried other platforms.

  • I am planning to optimize the model more using mobile deep-learning frameworks (PyTorch Mobile). I am sure that with this kind of optimization and quantization, SwiftFormer can run even faster. In this case, we can use SwiftFormer not only for classification, but also as a backbone for mobile detection and segmentation tasks with the help of lightweight frameworks. Also, running SwiftFormer on different hardware devices is an interest now for me. I will work on that very soon, I will keep you updated with our findings!.

Best regards,
Abdelrahman.

@escorciav
Copy link
Author

escorciav commented Jan 9, 2024

Thanks a lot for the detailed & open reply. Very pleased!

Heads up: PyTorch Mobile might get deprecated. The new public attempt is called ExecuTorch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants