21.5 C
Saturday, June 15, 2024

Microsoft Research: Ranking Answers Using Generative Retrieval

A novel conversational question-answering paradigm developed by Microsoft Research was detailed in a recent report.

Microsoft has unveiled a new conversational question-answering model that performs better than competing approaches by providing prompt, precise responses while consuming noticeably less resources.

What is suggested is a novel method for ranking content passages using a system they term Generative Retrieval For Conversational Question Answering, or GCoQA.

The next step, according to the researchers, is to investigate how to use it for general web search.

Generative Retrieval For Conversational Question Answering

The subsequent word or phrase is predicted by an autoregressive language model.

In this model, “identifier strings”—which, to put it simply, are representations of passages in a document—are used as autoregressive models.

In this instance, they make use of the section names and page titles to indicate the subject matter of each respective page and text passage.

The experiment was done using Wikipedia data since it can be trusted that the page titles and section titles are descriptive.

They are used to specify both the overall subject of a document and the subject of any individual sections that make up a portion of the text.

It resembles, then, utilizing the title element to discover what a webpage is about and the headings to discover what the sections of a webpage are about when utilized in the real world.

The “identifiers” are a mechanism to express all of that knowledge in an encoded form that is mapped to the titles and passages on the webpage.

To produce the answers to the questions, the retrieved passages are eventually fed into another autoregressive model.

Generative Retrieval

The study report claims that for the retrieval component, the model employs a method known as “beam search” to create identifiers (representations of webpage sections), which are then rated in order of the likelihood that they are the solution.
Researchers report:

“…instead of generating a single identity, we construct many ones using beam search, a widely utilized technique.

We can create a ranking list of created IDs based on these scores because each generated identifier is given a language model score.

The ranking identifiers might readily match a list of passage rankings.

The study report continues by stating that the procedure might be considered a “hierarchical search.”

Once those passages are retrieved, another autoregressive model generates the answer based on the retrieved passages.

Comparison With Other Methods

When compared to many other widely used approaches, the researchers discovered that GCoQA performed better.

It proved helpful in getting over restrictions (bottlenecks) in previous approaches.

The way we answer questions in conversation is expected to change significantly as a result of this new approach.

As an illustration, it is 10 times more efficient than existing models in terms of memory utilization, and it is also 10 times faster.

Researchers report:

“…the use of our method in practice is more practical and effective.”

The Microsoft researchers later conclude:

“Benefiting from fine-grained cross-interactions in the decoder module, GCoQA could attend to the conversation context more effectively.

Additionally, GCoQA has lower memory consumption and higher inference efficiency in practice.”

Limitations Of GCoQA

Before this model can be used, there are a few issues that must be resolved.

They discovered that GCoQA was restricted by the “beam search” method, which hindered its capacity to recall “large-scale passages.”

Increasing the beam size didn’t help either because it made the model run more slowly.

Wikipedia is trustworthy when it comes to using headings in a meaningful way, but that is one of its limitations.

However, applying it on websites other than Wikipedia could lead to problems for the model.

The section heads on many Internet pages don’t adequately describe the content of a piece, despite the fact that this is what publishers and SEOs are expected to accomplish.

GCoQA Is A Promising New Technology

Ultimately, the researchers stated that the performance gains are a strong win. The limitations are something that need to be worked through.

The research paper concludes that there are two promising areas to continue studying:

“(1) investigating the use of generative retrieval in more general Web search scenarios where identifiers are not directly available from titles; and (2) examining the integration of passage retrieval and answer prediction within a single, generative model in order to better understand their internal relationships.”

Related Articles


Please enter your comment!
Please enter your name here

Latest Articles