Models
We support two families of models,lite
and main
. The lite
family has fewer parameters, and is highly accurate for shorter requests (less than 16k tokens). The main
family is designed specifically to improve accuracy for long context tasks (>16k tokens).
Model | Speed | Max Input | Use Case |
---|---|---|---|
auto | 128k tok | Auto-route based on input size | |
relace-apply-2.5-lite | ~10k tok/s | 16k tok | Fast & accurate on short context |
relace-apply-3 | ~7.5k tok/s | 128k tok | Highly accurate on long context |
auto
option for best performance.
OpenAI Compatible Endpoint
If the Relace REST API is inconvenient, we also support an OpenAI compatible endpoint for our apply models.The user message must include
<code>
and <update>
tags following the format above. The <instruction>
tag is optional.Fallbacks
We recommend using GPT-4.1-mini with predictive edits as a fallback. This option is 10-20x slower than Relace’s apply models, but it’s useful for redundancy. Relace apply models also return a400
error code when input exceeds context limits (see table above). For these cases, GPT-4o-mini’s 1M token context window is a reliable fallback option.
However, even frontier LLMs struggle with long context. We recommend proactive refactoring of files >32k tokens to improve output quality.Authorizations
Relace API key Authorization header using the Bearer scheme.
Body
application/json
Initial code and edits to apply
The original code that needs to be modified
The code changes to be applied to the initial code
Choice of apply model to use
Available options:
auto
, relace-apply-2.5-lite
, relace-apply-2
Optional single line instruction for to disambiguate the edit snippet. e.g. Remove the captcha from the login page
Whether to stream the response back
Optional metadata for logging and tracking purposes.