AI Watchdog: CelebV-Text and CelebV-HQ

Explore original journalism about this data set through AI Watchdog, The Atlantic’s ongoing investigation into the generative-AI industry.


CelebV-Text and CelebV-HQ are data sets of YouTube videos, containing 13,712 and 13,844 videos, respectively (with no videos in common, according to our analysis). The videos were apparently chosen for their clear depictions of celebrity faces, and are intended for use in training video-generation and video face-editing models. The data sets were released in 2022 by SenseTime, a Chinese company that makes facial-recognition technology. The company has been sanctioned by the United States.