Given a user request for how to change a codebase, you want to retrieve only the files relevant to implementing that request.
This is important for two reasons:
We trained our reranker on hundreds of thousands of user query and code pairs to make it best in class for AI codegen applications.
We evaluated our model on two retrieval benchmarks — an inhouse dataset consisting of query/codebase pairs for prompt-to-app tasks, and a more general dataset consisting of open source GitHub PRs.
Recall@k tells you of the k relevant files for the example, how many of those relevant files were ranked in the top k results.
For codegen, high recall is essential because failure to pass relevant files into context entirely breaks the generation.
Many people start doing retrieval by just passing the query/codebase pair into a model with a huge context window, like Gemini Flash-Lite, and use a prompt to score the relevance of each file.
We beat the accuracy of Gemini 2.0 Flash-Lite at 2x the speed and 2/3 the cost.
Our code reranker returns a list of objects with filename and relevance score, like this:
You’ll need to choose an appropriate threshold based on the regime you are in:
For case 1, only the ordering matters — you basically just feed in all the top files up to the model’s token limit into context.
We have a token_limit
parameter to make this easy. Just set it to the context length of your generation model.
For case 2, you set a relevance threshold based on the recall you want to hit.
Strategy | Recall | Threshold | Token Savings |
---|---|---|---|
Conservative | 99%+ | ≤ 0.02 | ~25% |
Balanced | 95%+ | ≤ 0.08 | ~45% |
Aggressive | 80-90% | 0.15-0.25 | up to 70% |
For more code examples and endpoint info, see the API reference.