AI Watchdog: YT-Temporal-180M
Explore original journalism about this data set through AI Watchdog, The Atlantic’s ongoing investigation into the generative-AI industry.
YT-Temporal-180M is a collection of 5,494,771 YouTube videos. The data set was compiled by a team of researchers at the University of Washington and the Allen Institute for AI to train a multimodal model called Merlot, and was released in 2021. YT-Temporal-180M specifies the time stamp of every word spoken in a video. A script for downloading the data set is hosted by Hugging Face, an AI-development hub, and the actual data set is hosted by Google. The data set has been downloaded more than 1,200 times.