Nvidia Ingest Aims To Make It Easier To Extract Structured Information From Documents

Nvidia Ingest Aims to Make it Easier to Extract Structured Information from Documents

Last updated: 2025/01/22 at 7:43 AM

News Room Published 22 January 2025

Nvidia Ingest is a new microservice aimed at processing document content and extracting metadata into a well-defined JSON schema. Ingest is able to process PDFs, Word, and PowerPoint documents and extract structured information from tables, charts, images, and text using optical character recognition.

To use Nvidia Ingest, you provide it with a JSON job description of the payload to ingest. You can then retrieve the results as a JSON dictionary with metadata for all extracted objects, processing annotations, and timing/trace information.

Nvidia has not provided figures about Ingest performance but says it is scalable and can use multiple processing methods to improve accuracy or increase throughput. For PDF documents, Ingest can use pdfium, Unstructured.io, or Adobe Content Extraction Services.

For example, using nv-ingest-cli, the command line tool used to interact with Nvidia Ingest, you specify how to process a document using the --task argument, which includes an extract_method option:


nv-ingest-cli 
... 
  --task='extract:{"document_type": "pdf", "extract_method": "pdfium", "extract_text": true, "extract_images": true, "extract_tables": true, "extract_tables_method": "yolox"}' 
...

Nvidia explicitly states that you cannot use Ingest to create a pipeline to carry through a sequence of operations on the documents in the payload. Yet, you can run various pre- or post-processing transformations, including text splitting and chunking, filtering, embedding generation, and image offloading. This means you can use multiple --task arguments for the same nv-ingest-cli execution. For example, you can add a dedup (de-duplication) step by using:


nv-ingest-cli 
... 
  --task='extract:{...} 
  --task='dedup:{"content_type": "image", "filter": true}' 
...

The tool can be used on a single document specified with the --doc argument or on a set of documents simultaneously by providing a JSON-formatted dictionary describing the batch payload.

All extracted data are stored in an output directory containing a subdirectory for each document type, e.g., image, text, structured, etc. Each ingested document generates a JSON metadata file with the extracted content; source metadata including source name, location, type, etc.; and content metadata. Content metadata includes both general and type-specific content metadata. For example, for images, you get the image type, any caption, the location, size, and so on; for text, you get a summary, a list of keywords, the language, etc.; for tables, you get the format, location, the content as text, any caption or title, etc.

Nvidia Ingest requires a number of supporting services, both from Nvidia and open-source projects, including redis, yolox, otel-collector for open telemetry, prometheus, grafana, and more. They are packaged as a Docker Compose application to make deployment easier. It also requires support for CUDA and the Nvidia Container Toolkit and a minimum of two H100 or A100 GPUs with at least 80GM memory.

Nvidia Ingest Aims to Make it Easier to Extract Structured Information from Documents

Leave a Reply Cancel reply

Stay Connected

Latest News

Amazon can now buy products from other websites for you

Analyst says Apple’s shipments may drop 10 million after new Huawei launches · TechNode

How to Completely Disappear From the Internet

Video: Choosing the Best Mac For You

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News