Clotho is an audio dataset used for intermodal translation (audio-to-text) tasks. It is widely utilized in the (Detection and Classification of Acoustic Scenes and Events) challenges. 📂 Key Data Components

Categorized into development, validation, and evaluation sets for training and testing machine learning models. 📥 How to Download

Mention the diversity of the audio (natural sounds, urban environments, etc.) and the linguistic variety of the captions.

You can also download specific evaluation (1.2 GB) or analysis (14.4 GB) subsets. 🛠️ Producing a Write-up

The request to "Download 736 740 zip" most likely refers to downloading the , a prominent audio captioning collection often cited in research papers by its specific page range, 736–740 . 🎧 The Clotho Dataset

Reference the original paper: Drossos, K., Lipping, S., & Virtanen, T. (2020). "Clotho: an Audio Captioning Dataset." Proc. IEEE ICASSP, pp. 736-740 .

Five unique human-annotated descriptions for every audio clip.

Thousands of sound samples ranging from 15 to 30 seconds.

Scroll To Top
Close

Download 736 740 Zip Instant

Clotho is an audio dataset used for intermodal translation (audio-to-text) tasks. It is widely utilized in the (Detection and Classification of Acoustic Scenes and Events) challenges. 📂 Key Data Components

Categorized into development, validation, and evaluation sets for training and testing machine learning models. 📥 How to Download

Mention the diversity of the audio (natural sounds, urban environments, etc.) and the linguistic variety of the captions. Download 736 740 zip

You can also download specific evaluation (1.2 GB) or analysis (14.4 GB) subsets. 🛠️ Producing a Write-up

The request to "Download 736 740 zip" most likely refers to downloading the , a prominent audio captioning collection often cited in research papers by its specific page range, 736–740 . 🎧 The Clotho Dataset Clotho is an audio dataset used for intermodal

Reference the original paper: Drossos, K., Lipping, S., & Virtanen, T. (2020). "Clotho: an Audio Captioning Dataset." Proc. IEEE ICASSP, pp. 736-740 .

Five unique human-annotated descriptions for every audio clip. 📥 How to Download Mention the diversity of

Thousands of sound samples ranging from 15 to 30 seconds.

Login

Register

Shopping Cart

Close

No products in the basket.

Download 736 740 zip WhatsApp

Select at least 2 products
to compare