Weโre excited to announce a major milestone from the SEACrowd teamโthe launch of SEA-VL, the largest open-source vision-language (VL) dataset specifically designed to represent the cultural diversity of Southeast Asia ๐ง๐ณ๐ฐ๐ญ๐น๐ฑ๐ฎ๐ฉ๐ฑ๐ฆ๐ฒ๐พ๐ฒ๐ฒ๐ต๐ญ๐ธ๐ฌ๐น๐ญ๐ป๐ณ.
๐ Read the Paper
Weโve published our full methodology and findings on arXiv:
๐ Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia
๐ SEA-VL Dataset on Hugging Face: Explore SEA-VL
๐ Why SEA-VL?
Most vision-language datasets today reflect Western-centric imagery and language, leaving Southeast Asian cultures underrepresented and misinterpreted.
SEA-VL is our open-source initiative to change thatโdesigned to better represent the languages, traditions, and everyday realities of Southeast Asian communities.
๐ Highlights
- ๐ธ 1.3 million culturally relevant image-text pairs
- ๐ Covers all 11 Southeast Asian countries
- ๐๏ธ 50ร larger than any previous SEA-focused VL dataset
- ๐ Hosted on Hugging Face: Explore SEA-VL
๐ ๏ธ How We Built SEA-VL
We combined several approaches to balance scale with cultural fidelity:
- Crowdsourcing โ High cultural accuracy, but slow and resource-intensive
- Image Crawling โ ~85% cultural relevance and highly scalable
- Image Generation โ Still fails to reflect SEA cultures authentically and poses licensing challenges
๐ก Why This Matters
- โ AI trained on SEA-VL understands local contexts, languages, and traditions
- โ Community contributions prevent cultural misrepresentation or erasure
- โ We empower Southeast Asian communities to shape how AI sees the region
๐ฃ Help Us Spread the Word
Weโve announced SEA-VL on our social channelsโplease reshare and help us grow!
๐ฆ Twitter/X | ๐ผ LinkedIn | ๐ Facebook | ๐ฆ Bluesky |
๐ Whatโs Next?
We extend our deepest thanks to the contributors across Southeast Asia who made this possible.
This is only the beginningโPhase 2 is on the horizon, and we invite researchers, practitioners, and community members to collaborate with us. Stay tuned on our Discord!
- ๐ง SEACrowd on Hugging Face
- ๐ฌ Join our Discord
- ๐ ๏ธ GitHub
- ๐ SEA-VL Launch Page
Together, letโs build AI that reflects the full spectrum of human culture for Southeast Asia.