Vector search finds the right chunk but ranks it badly and waves through near-garbage that's vaguely on-topic. Reranking fixes that with a second model - a cross-encoder that reads the query and each chunk together instead of comparing two frozen vectors. The whole trick is affording an expensive judge by only running it on the cheap stage's survivors.
Read article