token_limit
parameter to make this easy. Just set it to the context length of your generation model.
For case 2, you set a relevance threshold based on the recall you want to hit.
Strategy | Recall | Threshold | Token Savings |
---|---|---|---|
Conservative | 80%+ | ≤ 0.02 | ~25% |
Balanced | 95%+ | ≤ 0.08 | ~45% |
Aggressive | 80-90% | 0.15-0.25 | up to 70% |