Table of Contents
This recent Google paper talks about building better information retrieval models by using synthetic data. This is done using a new method called pairwise query generation.
They give an LLM a document and ask it to generate two queries. One of the queries is relevant and the other is irrelevant. Doing this encourages the model to learn the difference between a relevant and an irrelevant match.
In other words, focusing on what makes one query relevant and another not relevant teaches the model how to understand more about what makes something relevant to a query.
What’s the point? This method of training models to better understand relevance could significantly improve the quality of Google search. It’s expected to be especially useful for new topics or search queries Google has not seen before.
What does this mean for SEO? This technique should help Google’s systems get even better at understanding user intent and which content meets that intent. It will become even more important for us to deeply understand and meet the needs of our audience.
This originally started off as an entry in Marie's Notes. It was interesting enough for me to publish it as a full article!