Clotho is an audio dataset used for intermodal translation (audio-to-text) tasks. It is widely utilized in the (Detection and Classification of Acoustic Scenes and Events) challenges. 📂 Key Data Components
Categorized into development, validation, and evaluation sets for training and testing machine learning models. 📥 How to Download
Mention the diversity of the audio (natural sounds, urban environments, etc.) and the linguistic variety of the captions.
You can also download specific evaluation (1.2 GB) or analysis (14.4 GB) subsets. 🛠️ Producing a Write-up
The request to "Download 736 740 zip" most likely refers to downloading the , a prominent audio captioning collection often cited in research papers by its specific page range, 736–740 . 🎧 The Clotho Dataset
Reference the original paper: Drossos, K., Lipping, S., & Virtanen, T. (2020). "Clotho: an Audio Captioning Dataset." Proc. IEEE ICASSP, pp. 736-740 .
Five unique human-annotated descriptions for every audio clip.
Thousands of sound samples ranging from 15 to 30 seconds.
Clotho is an audio dataset used for intermodal translation (audio-to-text) tasks. It is widely utilized in the (Detection and Classification of Acoustic Scenes and Events) challenges. 📂 Key Data Components
Categorized into development, validation, and evaluation sets for training and testing machine learning models. 📥 How to Download
Mention the diversity of the audio (natural sounds, urban environments, etc.) and the linguistic variety of the captions. Download 736 740 zip
You can also download specific evaluation (1.2 GB) or analysis (14.4 GB) subsets. 🛠️ Producing a Write-up
The request to "Download 736 740 zip" most likely refers to downloading the , a prominent audio captioning collection often cited in research papers by its specific page range, 736–740 . 🎧 The Clotho Dataset Clotho is an audio dataset used for intermodal
Reference the original paper: Drossos, K., Lipping, S., & Virtanen, T. (2020). "Clotho: an Audio Captioning Dataset." Proc. IEEE ICASSP, pp. 736-740 .
Five unique human-annotated descriptions for every audio clip. 📥 How to Download Mention the diversity of
Thousands of sound samples ranging from 15 to 30 seconds.
Select at least 2 products
to compare