SKILL.md

(issue)

I bet this gets us pretty far, maybe surprisingly far. It also seems to have a pretty clear capability ceiling vs approaches that touch the weights. At least, gains are bounded by how strong ICL is. To be fair, we may be surprised. Do we really know how strong ICL is?

Skills are a convention to make users do better context engineering. Imagine a very diligent user driving Claude and adding appropriate context. Claude’s retrieval over Skills might be worse than this. But consider a much lower baseline: a not-very-diligent user driving Claude (and we may note, most users are not so diligent).

Here Skills will be a strict improvement. Skills are a way of pulling context from users. (E.g. example scripts implicitly convey stuff about the env, best practices, etc., that a user wouldn't otherwise put in a prompt.)

Skills make good centaurs. Helpfully, they offer some guarantees about what kind of behavior to expect from Claude (if Claude follows instructions well enough). They also give you a standard format, so nice for sharing. Though neatly attributing improvements – and so, iterating on Skills – seems hard.

Where will gains come from? You could use Skills a couple ways. (1) Package you-specific info for Claude. Skills are, after all, a special case of Anthropic’s telegraphed nearterm approach to “memory.” Crudely, ICL with a dynamically managed context window, acting over a persisted filesystem, with end-to-end RL from which skillful file creation/retrieval emerges. Presumably there are other efforts – but whenever he is asked Dario comes off suspiciously ICL-maximalist. (2) Try to “teach” Claude stuff, i.e. extend the capability frontier. I imagine this would work mostly as a way to elicit stuff which Claude can already do, but so far unreliably. As with any scaffolding, there’s the risk of overspecifying/constraining the model. Will we give newer Claudes the same Skills?

Skills are more convention than hard affordance. But because Claude is wedded to its native harness, it would be a mistake to just consider the content in Skill files. We must also account for Claude’s natural tendencies (or: “habits”) using Skills. Presumably Claude is trained to respect Skill files (a special case of steerability), and to be good at navigating filesystems which follow the Skill convention. It will be interesting to see how much of this you can get “for free” from other models that aren’t trained to use Skills, but are good at navigating filesystems and following instructions. OAI adopted Skills, if quietly, within a month, Willison reports

I wonder if Claude can be trained for superhuman navigation over Skills. Then you could give Claude a really huge corpus of Skill content. If Claude is very, very good at finding and combining relevant Skills, does this describe something which feels different-in-kind from ideal context engineering?