Our current efforts are focused on:
- Running data collection and model building efforts to advance AI in SEA
- Building a community of SEA researchers through the ACL special interest group SIGSEA
- Supporting early-career AI enthusiasts in the region through the SEACrowd Apprentice Program
One of our key contributions has been to build and maintain a comprehensive catalogue of publicly available SEA datasets, including standardized dataloaders to make them easily accessible for model training and benchmarking purposes. Our work has the highest language coverage of Southeast Asia so far.
Check out what indigenous and non-indigenous languages are under our study.
🤔 What’s Next?
🖼️ SEA-VL: Developing Culturally Relevant Vision-Language Models for Southeast Asia
Following the success of our SEACrowd project, we’re excited to announce SEA-VL, a new open-source initiative to build a SEA-specific vision-language model.
Phase 2 has begun! Check out our project page to contribute. Stay tuned on our Discord for updates!
SEA-VL is a large and ambitious initiative, so we have decided to split it into two phases. In Phase 1 of SEA-VL, we collected self-taken, culturally-relevant images with descriptions about the shared image in the respective local language. This dataset was compiled into an open-access SEA-relevant image dataset, the largest of its kind to date. This dataset will serve as the foundation for Phase 2, where we’ll develop instruction-tuning VL datasets and build a SEA-specific vision language model (VLM) using the constructed dataset.
- Phase 1 has been wrapped up. Paper. [Announcement] [Project Page]
- Phase 2 is up from May 2025 - Feb 2026! [Project Page]
🌏 Special Interest Group in Southeast Asian NLP (SIGSEA)
The SEACrowd community has launched SIGSEA, which aims to promote research, collaboration, and update sharing on Southeast Asian NLP. In the future, we also can hold our own SEA workshops & events in ACL conferences! 💪
We’re collecting expressions of interest for membership. As a member, you’ll receive regular updates on research, events, and opportunities in the region.
Everyone can join (no need for ACL membership). Sign up today to join SIGSEA via this form! 🫶
Update (25/11/2024): SIGSEA got approved by ACL! Visit our website.
🧑🎓 SEACrowd Apprentice (Pilot) Program
On-going since 08/2024.
This program targets early-career AI enthusiasts from underserved Southeast Asian communities, who are looking to gain their first substantial research experience. Many face challenges such as limited access to research tools, mentorship, and AI developments.
Our program addresses these gaps by providing research problems for participants to solve in small teams, guided by experienced mentors. Through hands-on projects and learning key concepts, participants work toward writing a publication for top AI conferences like ACL. The program also emphasizes critical thinking, collaboration, and academic writing to prepare participants for success in AI research.
✔️ Past Projects
- 10/2024 to 03/2025. Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia. Under review. [Announcement] [Project Page]
- 11/2023 to 06/2024. SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages. Accepted in EMNLP 2024. [Announcement]