Bringing the open source AI community together
On July 11, 2025 the Linux Foundation AI & Data Day Singapore convened at TikTok's One Raffles Quay office. Co‑hosted by TikTok, the event attracted 252 registrants with around 60 in‑person participants, representing developers, researchers, tech companies and open source contributors. Under the theme "Agent for SWE" (Software Engineering), sessions explored how intelligent agents are reshaping software‑development workflows, and shared practical experiences building and deploying AI‑powered systems.
SWE‑Perf: Benchmarking code optimization
TikTok researcher Qian Liu introduced SWE‑Perf, a benchmark designed to evaluate how well large language models optimise code performance. While many models excel at code generation and bug fixing, real‑world performance optimisation remains poorly understood. SWE‑Perf curates 140 instances derived from performance‑improving pull requests across popular GitHub repositories. Each instance includes relevant codebases, target functions, performance tests, expert patches and reproducible environments. Evaluations using representative methods (such as Agentless and OpenHands) reveal a sizable gap between existing models and expert‑level optimisation, signalling significant research opportunities.
GenAI and production-ready agent infrastructure
The program showcased the community's shift from prototype‐level generative AI hacks toward robust agent‑orchestration frameworks suitable for production use. Adrian Cole of Tetrate summarised key considerations for taking generative AI systems into production, emphasising observability, reliability and integration with existing infrastructure. The following talks offered complementary perspectives:
- Tatsuya Kurosaka (Hitachi) demonstrated OAuth 2.1 and token‑introspection mechanisms for securing ModContext Protocols, highlighting the importance of robust authentication and access control.
- Sam Chen (OpenCSG) discussed the lifecycle management of large language model (LLM) assets: while models may come and go, well‑governed, high‑quality data remains the true intellectual property.
- Junping Du (Datastrato) presented Apache Gravitino, showing how unified metadata layers simplify big‑data and data‑lake architectures.
- Vishnu N. C. (TheAlpha.dev) illustrated that reinforcement‑learning loops can be accessible and cost‑effective: a single‑GPU, US $30 training run pushed an agentic retrieval augmented generation (RAG) system beyond the baseline.
- Florian Fan, Yaya Xia (Ant Group) provided a comprehensive overview of the 2025 open source LLM landscape, noting that, from an AI perspective, the Envoy AI Gateway now sits closer to the application tier than to core model‑serving infrastructure.
These presentations underscored that building reliable AI agents requires equal attention to infrastructure, data governance and practical engineering, not just model development.
Web3 ecosystem
The agenda also broadened beyond narrow software engineering concerns to address data sovereignty in the emerging Web3 ecosystem. Henry Wang (Singapore Internet Governance Forum) argued that today's focus on cryptocurrency speculation obscures the deeper promise of Web3: empowering users to control their data. He highlighted a protocol called SOLID, created by Tim Berners‑Lee that stores personal information in user‑owned "pods" and grants selective access to applications. This approach emphasises privacy and creates economic incentives that preserve diverse voices.
Una Wang's LingoAI exemplified this vision. Their edge‑first translation platform processes more than 500 languages directly on users' devices, keeping sensitive data private. Contributors license their linguistic data for AI training and receive compensation, illustrating a sustainable data‑sharing model. These sessions challenged attendees to consider ethical and equitable frameworks for future AI agents.
Community feedback and outlook
Several open source projects, such as Gravitino, RustCoder and LingoAI, gained visibility and likely new contributors. By aligning the open source community's efforts with real‑world software engineering challenges, the day catalysed collaboration and advanced the ecosystem.
Conclusion and next steps
LF AI & Data Day Singapore 2025 demonstrated the power of community‑driven innovation. Thanks to the organizing team (including Andreas Spanner, Cruise Chen, Junping Du, Lin Yan, Tina Tsou, Jianbo Wu and Yongli Choi), speakers, sponsors and volunteers. The event delivered insights across research, infrastructure, ethics and community building. Attendees and readers are encouraged to:
- Explore the SWE‑Perf benchmark and contribute to ongoing research on AI‑driven code optimisation. https://arxiv.org/abs/2507.12415
- Engage with LF AI & Data slack channels to continue conversations and stay informed about future events. https://join.slack.com/t/lfaifoundation/shared_invite/zt-3a72d2avo-jkz2r3mgwXXKYcjuFfHNMQ
- Contribute to open source projects highlighted during the event, whether by code contributions, documentation or testing. https://lfaidata.foundation/
- Watch session recordings (some of which are pending release) to revisit technical talks and share knowledge with colleagues. https://lfaidata.foundation/
We're only at the beginning of the "Agent for SWE" journey. By continuing to share knowledge, improve infrastructure, and focus on ethical data practices, the open source AI community can develop innovative, inclusive solutions powered by agents that will shape the future of software engineering.
