Ingesting PDFs using OpenAI GPT-4 Vision (2024)

Ingesting PDFs using OpenAI GPT-4 Vision (1)

  • Report this article

Matthew Groff Ingesting PDFs using OpenAI GPT-4 Vision (2)

Matthew Groff

Principal AI Engineer @ Umbrage, Part of Bain & Company | AI Capability Lead

Published Feb 18, 2024

+ Follow

PDFs are everywhere, but getting information out of them can be tough, especially when they're packed with charts and tables. That's why I started using OpenAI's GPT-4 Vision to make things easier by converting PDFs into Markdown, a format that's much simpler for computers to read.

Traditional tools for pulling text from PDFs are hit or miss. They might miss important details, especially if the PDF has lots of visuals. This inconsistency is a big problem when you're trying to understand or use the information in those PDFs.

Markdown is great for this because it's straightforward and structured, making it easy for AI to understand. OpenAI even uses Markdown to talk to ChatGPT, which shows how useful it is.

Recommended by LinkedIn

OpenAI Dev Day: What got announced, and what it means Simon Smith 7 months ago
OpenAI's GPT Store - The Latest and Greatest GPTs Sarah Huard 4 months ago
OpenAI's First Developer Conference Unleashes… Bharath Gopinath 6 months ago

Here's what I did: First, I turned each page of the PDF into an image. This way, I didn't lose anything, like charts or images, that I might miss if I just tried to pull out the text. Then, I used GPT-4 Vision to read those images and turn them into Markdown text. GPT-4 Vision is smart enough to handle complex layouts and visuals, so I ended up with Markdown that kept the original PDF's content and structure.

I wrapped all this up into a few Python scripts to automate the process. There's one script to turn the PDF into images, another to convert those images to Markdown with GPT-4 Vision, and a third to clean up the Markdown and get rid of anything we don't need, like placeholder images or page numbers. There's even an optional script that puts all the cleaned-up Markdown into one document.

This method isn't perfect, but it's a big step forward in making PDFs more accessible and easier to work with. Manually converting PDFs to Markdown by hand isn't realistic on a large scale, and just pulling out the text and chopping it up into chunks isn't enough, especially if you're missing out on important visual information.

Check out the GitHub repo for the scripts I mentioned. I hope this method helps you see the potential of AI in making it easier to work with PDFs and other documents. Feel free to reach out to me on LinkedIn if you have questions or want to chat about it.

Help improve contributions

Mark contributions as unhelpful if you find them irrelevant or not valuable to the article. This feedback is private to you and won’t be shared publicly.

Contribution hidden for you

This feedback is never shared publicly, we’ll use it to show better contributions to everyone.

Taitan Nguyen

Know Your Data | Discover Opportunities | Deliver Value

1mo

  • Report this comment

Thanks! You mentioned the conversion is not perfect but I am curious if you have a measure of how well the resulting Markdown data compared to the original PDFs?

Like Reply

1Reaction

Matthew Groff

Principal AI Engineer @ Umbrage, Part of Bain & Company | AI Capability Lead

3mo

  • Report this comment
Like Reply

1Reaction 2Reactions

See more comments

To view or add a comment, sign in

Sign in

Stay updated on your professional world

Sign in

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Insights from the community

  • Call Center Administration What are the best ways to handle customer questions about machine learning?
  • Software Engineering How can AI software systems be designed to resist adversarial examples?
  • Artificial Intelligence How can you choose the right debugging tool for an AI app?
  • Software Engineering How can software engineers improve transparency in AI systems?
  • IT Audit How can IT auditors communicate and report on AI and ML audit findings and recommendations?
  • Machine Learning What are the best ways to ensure transferability in machine learning?
  • Data Governance How can you label data that does not fit into predefined categories?
  • Algorithms You want to build a recommendation engine in Julia. What are the best tools to use?
  • Machine Learning What are some impressive ML projects for your portfolio?
  • Technological Innovation How do you select the right AI and ML techniques for your data?

Others also viewed

  • OpenAI's First Developer Conference Unleashes Game-Changing Updates Bharath Gopinath 6mo
  • OpenAI Playground Aris Ihwan 9mo
  • OpenAI Doubles Down on Agent Behavior and Hosts First Devday David Norris 6mo
  • Machine Learning Orientation for Motivated Non-Coders: A Half-Day of Reading Larry O'Brien 1y
  • What is Auto-GPT and why does it matter? Ana L. 1y
  • Fine-tuning GPT-3.5 Turbo: A short intro for software engineers artiqode 9mo
  • "Strategic Moves Catapulted This GPT To The Top Of OpenAI's Charts!" Orren Prunckun 4mo
  • OpenAI DevDay 2023 Highlights Ganapathy Shankar 7mo
  • How to use OpenAIHelp with an Excel formula, a step-by-step approach. Vincent Healy 10mo
  • Two-Minute Recap of OpenAI DevDay + Insights Andrei Puni 7mo

Explore topics

  • Sales
  • Marketing
  • Business Administration
  • HR Management
  • Content Management
  • Engineering
  • Soft Skills
  • See All
Ingesting PDFs using OpenAI GPT-4 Vision (2024)

FAQs

Can GPT-4 Vision read PDF? ›

No requirement to train a custom model: GPT-4 Vision is a pre-trained model that can be used to extract structured data from PDF documents without the need to train a custom model for your specific document types.

Can ChatGPT 4 read PDF files? ›

Can GPT-4 read a PDF? Yes, GPT-4 can read a PDF file. However, you need to pay USD20 per month to upgrade to ChatGPT Plus.

Can GPT-4 summarize a PDF? ›

Yes, ChatGPT can summarize PDF files using its PDF summarization feature, which is available in ChatGPT Plus. Can I give ChatGPT a PDF? Yes, you can provide ChatGPT with a PDF document for summarization.

Can ChatGPT pull data from a PDF? ›

To use ChatGPT for PDF data extraction, you first need to convert your PDF files into a text-based format. Once your data is in text form, you can use an automation platform like Zapier to integrate with ChatGPT and forward the converted text.

Can I upload a PDF to GPT? ›

You can upload PDF files as attachments to your custom Copilot GPT. Other file types (such as images, Word documents, etc.) are not currently supported.

Can you ask ChatGPT to summarize a PDF? ›

You can create an account if you don't have one. Step 3 Once logged in to your ChatGPT account, navigate to the chat field at the bottom of the page and enter the command "TLDR." Paste your PDF's URL in the same chat field, then press the "Send" button. ChatGPT will automatically begin to summarize your PDF file.

What is the limit of GPT-4 PDF? ›

All files uploaded to a GPT or a ChatGPT conversation have a hard limit of 512MB per file.

How many PDF pages can GPT read? ›

GPTs can take VERY long PDFs - over 900 pages! (Tested in the Playground) : r/ChatGPTPro.

Can ChatGPT-4 read PDF reddit? ›

Ever since the fiasco that happened a few days ago Chat GPT 4 has lost all abilities to read documents. No matter what format (PDF, . docx, txt, ppt , ect). It will simply return "Error reading documents".

Is paying for ChatGPT 4 worth it? ›

The free tier of ChatGPT is good, but GPT-4, at $20 per month via ChatGPT Plus, can be a good deal smarter and more accurate. GPT-4, OpenAI's most powerful artificial intelligence large language model (LLM), is available through a subscription to ChatGPT Plus, which costs $20 a month.

How to use ChatGPT effectively in PDF? ›

To chat with your PDF on ChatGPT, you need to use a ChatGPT plugin like the AskYourPDF plugin. You will first install the AskYourPDF plugin for ChatGPT, then upload your PDF to ChatGPT and start asking questions about your PDF by using the document ID to identify your PDF.

Can GPT-4 read handwriting? ›

GPT-4V did the best job of transcribing my notes, making the fewest mistakes interpretting my handwriting and almost perfectly following my instructions regarding formatting.

Can GPT-4 convert PDF to Excel? ›

By understanding queries or instructions, GPT interprets document content to extract data accurately, catering to specific user needs. GPT enhances PDF to Excel conversion in key areas: - Fact Extraction: It identifies and extracts precise facts from text by analyzing document context and structure.

Can I feed a PDF to ChatGPT? ›

Of course, you probably already know this. However, with its latest update, ChatGPT now allows users to upload documents, PDFs, and spreadsheets directly into the platform for analysis.

Can ChatGPT 4 scan documents? ›

Applications of ChatGPT-Powered Document Chatbots

Some of the application areas include: Customer support: You can scan an entire warranty document, and will never have to train your chatbot on the claims process.

What GPT-4 Cannot do? ›

GPT4 cannot really hear, and it cannot really talk. Voice input is transcribed into text by a separate model, 'Whisper,' and then fed to GPT4. The output is read by another model.

Can you upload documents to GPT-4? ›

ChatGPT-4 can analyze any file you upload, whether it's a PowerPoint presentation, an Excel spreadsheet, a research paper or a photo. If you upload a spreadsheet with financial data, for example, you can ask ChatGPT to create a visual graph of the numbers.

What AI can read PDFs? ›

Best AI PDF Tools: Feature Comparison
AI PDF ToolPrice
🥇Adobe Acrobat$12.99/Month
🥈PDFelement$6.66/Month
🥉Unriddle$16/Month
4Myreader$6/Month
4 more rows
Mar 31, 2024

Can chatgpt4 read word documents? ›

Yes, ChatGPT summarizes Word documents to help users go through lengthy files within seconds.

Top Articles
Latest Posts
Article information

Author: Horacio Brakus JD

Last Updated:

Views: 5661

Rating: 4 / 5 (51 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Horacio Brakus JD

Birthday: 1999-08-21

Address: Apt. 524 43384 Minnie Prairie, South Edda, MA 62804

Phone: +5931039998219

Job: Sales Strategist

Hobby: Sculling, Kitesurfing, Orienteering, Painting, Computer programming, Creative writing, Scuba diving

Introduction: My name is Horacio Brakus JD, I am a lively, splendid, jolly, vivacious, vast, cheerful, agreeable person who loves writing and wants to share my knowledge and understanding with you.