Must the embedding dimension grow linearly with the number of relevant documents?
| n (Relevant Docs) | Lower Bound d* | Ratio d*/n | Constraints |
|---|---|---|---|
| 2 | 2 | 1.0 | 996 |
| 5 | 5 | 1.0 | 2,475 |
| 10 | 10 | 1.0 | 4,900 |
| 20 | 20 | 1.0 | 9,600 |
| 30 | 30 | 1.0 | 14,100 |
| 50 | 50 | 1.0 | 22,500 |
A dual encoder maps queries and documents to d-dimensional embeddings. Retrieval separation requires that every relevant document has higher inner-product similarity with the query than every irrelevant document:
⟨q, ri⟩ > ⟨q, zj⟩ for all i in [n], j in [m]
The question: must d grow linearly with n for this to be achievable?