"Science is an error-correcting process."
β Charles S. Peirce
I'm a PhD student specializing in Machine Learning and Signal Processing, with a particular focus on Audio-Language Learning and Audio Information Retrieval. My research involves contrastive learning, zero-shot learning, multimodal learning, language-based audio retrieval, and audio classification.
- Audio-Language Learning focuses on building systems that connect audio signals with natural language, enabling seamless interpretation and interaction across these modalities. It involves techniques that align audio features with textual representations, allowing for tasks such as audio captioning, language-based audio retrieval, and audio question answering.
- Audio Information Retrieval is the process of analyzing and retrieving information from audio content, such as music, speech, or environmental sounds. It encompasses tasks like sound classification, music recommendation, and similarity-based retrieval, facilitating the organization, retrieval, and utilization of audio data in industries ranging from entertainment to security.
- π² Machine Learning / Deep Learning (PyTorch, MLflow, Ray Tune, scikit-learn, etc.)
- π Data Analysis (NumPy, SciPy, pandas, etc.)
- π Audio & Text Analysis (librosa, NLTK, etc.)
- π Visualization (matplotlib, etc.)
- π Software Development (Django, Spring, Hibernate, etc.)
- π» Programming (Python, Java, JavaScript, SQL, etc.)
- π H. Xie, K. Khorrami, O. RΓ€sΓ€nen and T. Virtanen, "Text-Based Audio Retrieval by Learning From Similarities Between Audio Captions," in IEEE Signal Processing Letters, vol. 32, pp. 221-225, 2025, doi: 10.1109/LSP.2024.3511414. π₯π₯π₯
- π H. Xie, K. Khorrami, O. RΓ€sΓ€nen, and T. Virtanen, "Integrating Continuous and Binary Relevances in Audio-Text Relevance Learning," in Proc. Detect. Classif. Acoust. Scenes Events Work. (DCASE), 2024, pp. 201-205. arXiv
- π H. Xie, K. Khorrami, O. RΓ€sΓ€nen, and T. Virtanen, "Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances," in Proc. Detect. Classif. Acoust. Scenes Events Work. (DCASE), 2023, pp. 226-230. arXiv
- π H. Xie, O. RΓ€sΓ€nen, and T. Virtanen, "On Negative Sampling for Contrastive Audio-Text Retrieval," in Proc. Int. Conf. Acoustic., Speech and Signal Process. (ICASSP), 2023, pp. 1-5. arXiv
- π H. Xie, S. Lipping, and T. Virtanen, "Language-based Audio Retrieval Task in DCASE 2022 Challenge," in Proc. Detect. Classif. Acoust. Scenes Events Work. (DCASE), 2022, pp. 216-220. arXiv
- π H. Xie, O. RΓ€sΓ€nen, K. Drossos, and T. Virtanen, "Unsupervised Audio-Caption Aligning Learns Correspondences Between Individual Sound Events and Textual Phrases," in Proc. Int. Conf. Acoustic., Speech and Signal Process. (ICASSP), 2022, pp. 8867-8871. arXiv
- π H. Xie, O. RΓ€sΓ€nen, and T. Virtanen, "Zero-Shot Audio Classification with Factored Linear and Nonlinear Acoustic-Semantic Projections," in Proc. Int. Conf. Acoustic., Speech and Signal Process. (ICASSP), 2021, pp. 326-330. arXiv
- π H. Xie and T. Virtanen, "Zero-Shot Audio Classification via Semantic Embeddings," in IEEE/ACM Trans. Audio Speech Lang. Process., vol. 29, pp. 1233-1242, 2021. arXiv
- π H. Xie and T. Virtanen, "Zero-Shot Audio Classification Based on Class Label Embeddings," in Proc. Work. Appl. Signal Process. Audio and Acoustic. (WASPAA), 2019, pp. 264-267. arXiv
- π§βπ» Task coordinator for Language-based Audio Retrieval in DCASE Challenge 2024 (Task 8).
- π§βπ» Task coordinator for Automated Audio Captioning and Language-based Audio Retrieval in DCASE Challenge 2023 (Task 6).
- π§βπ» Task coordinator for Automated Audio Captioning and Language-based Audio Retrieval in DCASE Challenge 2022 (Task 6).
- π§ Drop me an email at [email protected]