In practice, exact matches are rare, since many requests include dynamic data like names, dates, or other variables. Butter’s cache supports templated message content, meaning a single cache entry may serve several requests so long as they are structurally compatible.
For example, Say hello to Erik
and Say hello to Joe
could be represented with a single template: Say hello to {{name}}
-> Hello {{name}}
. Then, in the subsequent request Say hello to Steve
, Butter detects the variable binding name=Steve
. Then, by applying this binding to the template, Butter can build the corresponding response: Hello Steve
.
We’re working on detecting variables and templated structure automatically, but you can force the creation of a template by adding the Butter-Bindings
header to your request. This replaces all instances of dynamic data with placeholder tokens: {{name}}
, in this case.
Butter-Bindings expects a string-encoded, unnested JSON object with key-value string pairs
# Create cache
curl -X POST $BASE_URL/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Butter-Auth: Bearer $BUTTER_API_KEY" \
-H "Butter-Bindings: {\"name\": \"Erik\"}" \
-d '{"messages":[{"content":"Say hello to Erik","role":"user"}],"model":"gpt-4o"}'
# Cache hit
curl -X POST $BASE_URL/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Butter-Auth: Bearer $BUTTER_API_KEY" \
-d '{"messages":[{"content":"Say hello to Joe","role":"user"}],"model":"gpt-4o"}'