Messages

Butter’s design as an LLM proxy relies on the fact that the context passed to most agents respects the chronological order of message histories. Messages, containing text content and a corresponding role, form the building blocks of LLM context. As many inference agents follow an append-only history across multiple turns, the agent’s context window always consists of all prior messages. Given this, it is natural to think of the trajectories followed by an agent while completing a given task as following one path in the tree of all possible trajectories. Agents may start out following a common path to solve a shared task, but may diverge once faced with unique circumstances. Butter takes advantage of this natural tree structure when serving responses to queries. Starting from the root node, Butter matches each message in the request to a branching path down the cache tree. If the whole query matches and a child node is available, it will be served as a cached response. Otherwise, the request has diverged and is forwarded, unmodified, to the LLM. Unmatched and the newly generated messages are then appended to the cache from the point of deepest match.

Getting Started

Concepts

Reference

Integration Guides

Careers