IMPORTANT: Building C, not B. On floor 3. Follow the signs to the right of Goody cafe! AI agents are powerful, but they still fail at real work. They don't follow deterministic workflows and lack tacit knowledge. Agent Skills tackle this problem by injecting transferable procedural knowledge into agent context to let them tackle hard problems including coding and beyond. We're the creator of SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks, the largest benchmark for agents skills, and Sundial, the largest registry and toolbox to build and improve skills. Over the last month, Benchflow collected 86 tasks across 11 professional domains from 180 contributors. 80% of our task contributors are PhDs or senior professionals. Sundial gathered more than 50,000 community skills and helped dozens of teams turn their workflows into skills. Built on this momentum, we introduce Skillathon Episode #1. It’s the first Agent Skills Hackathon in 2026 and will be hosted in Founders, Inc.. The Skillathon is part of the growing research community around skills, and it's goal is to bring builder together to craft quality skills and tasks to evaluate them. This will help understand the best practices to make effective skills, and extending the SkillsBench benchmark with new tasks and domains contributed by practitioners who know their fields best. Speakers: Bence Nagy from Anthropic Xiangyi Li on behalf of Nous Research Xiangyi Li on SkillsBench Belinda Mo on behalf of Sundial Furqan Rydhan, co-founder of Founders, Inc., thirdweb, Nebula Tracks: Data track: your goal is to come up with a realistic task scenario that is complex enough that it fails the most frontier models and agents, or takes a lot of effort and a long horizon to solve. In the setting of the task, you will need to come up with an agent skill for tasks in the domain you are developing. For example, say you try to pivot tables in an excel, instead of making an atomic skill like how to make pivot tables, try to modify the anthropic's default xlsx skill or create another complete skill set. Below are the list of tracks we are hosting for this hackathon. The taxonomy is grouped by a mix of skill sets and roles in the economy. Computer Science. Software engineering, machine learning, cybersecurity Physical world. Robotics, manufacturing, energy, infra. Professional. Healthcare, finance, office suite, insurance. Natural Science. Physics, mathematics, chemistry, biology, etc. OpenClaw. Design orchestrator skills that coordinate multiple modules into coherent build pipelines for game development — asset generation, character design, world building, dialogue systems, modding tools, testing harnesses, or live game operations. This track treats gaming not as a training environment, but as a rich, real-world domain for composable AI tooling. examples are available on SkillsBench. The task / skill format we use is optional for the purpose of hacking. Create whatever skill you like and install to what ever agent. We recommend checking out this tweet by anthropic on updates on Skill creator: https://x.com/RLanceMartin/status/2028901056818930171 Continual learning track: There have been many ways to improve models or prompts like Recursive Language Model, GEPA on the in context learning layer, or RL on the model layer. one example available (smolclaw.com) https://x.com/xdotli/status/2030219765630071022?s=20 Prizes: $1k for the 1st place A brand new PS 5 for the 2nd place we will record demos and post and feature every one! Organized by: - Xiangyi Li: Founder of BenchFlow, author of SkillsBench, Harbor, Terminal-Bench etc. - Roey Ben Chaim: Staff Engineer at Zenity (ex-Microsoft), organizer of AI Tinkerers Tel Aviv. - Belinda Mo: Founder of Sundial, previously founded Viva Translate. BS and MS Stanford. - Florent Tavernier: Founder of Sundial. Previously founded Self Protocol and Atelier Missor. - Grace Zhang: Founder of World Intelligence, multimodal data infrastructure for physical AI. Host of Physical AI Hack.

Skillathon - The First Agent Skills Hackathon

Organizers

Quality Score