Spatiotemporal Dynamic Duo Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP The code will be published soon.