8.5 C
Pakistan
Sunday, February 25, 2024

Research Paper Generating AI Using the Nougat Model

Introduction

Large language models (LLMs) like GPT-4 have recently made great strides in this area, demonstrating impressive skills to produce meaningful text. However, effectively parsing and comprehending research papers continues to be a very difficult issue for AI. Complex formatting, mathematical formulas, tables, graphics, and subject-specific terminology can all be found in research papers. Information is very densely packed, and significant semantics are formatted into the formatting.

I’ll show you how a brand-new model from Meta called Nougat can aid in accurately parsing research papers in this article. The LLM pipeline that extracts and summaries all of the paper’s tables is then added to it.

This situation has great potential. Research papers and publications contain a large amount of data and material that has not been properly processed. Their use in many diverse applications, such as LLM retraining, is made possible by accurate parsing.

Nougat Model

Researchers at Meta AI created the Nougat visual transformer model, which can turn photos of document pages into structured text [1]. It outputs text in a lightweight markup language from a rasterized image of a document page.

Nougat’s main benefit is that it only uses the document picture and doesn’t require OCR text. This enables it to correctly retrieve semantic structure, such as mathematical equations. To understand the conventions of research paper formatting and language, it is trained on millions of academic publications from arXiv and PubMed.

The figure below from [1] shows how math equations written in PDF are reproduced in Latex and rendered correctly.

Nougat uses a visual transformer encoder-decoder architecture. The encoder uses a Swin Transformer to encode the document image into latent embeddings. The Swin Transformer processes the image in a hierarchical fashion using shifted windows. The decoder then generates the output text tokens autoregressively using self-attention over the encoder outputs.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles