- Report this article
Matthew Groff
Matthew Groff
Principal AI Engineer @ Umbrage, Part of Bain & Company | AI Capability Lead
Published Feb 18, 2024
+ Follow
PDFs are everywhere, but getting information out of them can be tough, especially when they're packed with charts and tables. That's why I started using OpenAI's GPT-4 Vision to make things easier by converting PDFs into Markdown, a format that's much simpler for computers to read.
Traditional tools for pulling text from PDFs are hit or miss. They might miss important details, especially if the PDF has lots of visuals. This inconsistency is a big problem when you're trying to understand or use the information in those PDFs.
Markdown is great for this because it's straightforward and structured, making it easy for AI to understand. OpenAI even uses Markdown to talk to ChatGPT, which shows how useful it is.
Recommended by LinkedIn
Here's what I did: First, I turned each page of the PDF into an image. This way, I didn't lose anything, like charts or images, that I might miss if I just tried to pull out the text. Then, I used GPT-4 Vision to read those images and turn them into Markdown text. GPT-4 Vision is smart enough to handle complex layouts and visuals, so I ended up with Markdown that kept the original PDF's content and structure.
I wrapped all this up into a few Python scripts to automate the process. There's one script to turn the PDF into images, another to convert those images to Markdown with GPT-4 Vision, and a third to clean up the Markdown and get rid of anything we don't need, like placeholder images or page numbers. There's even an optional script that puts all the cleaned-up Markdown into one document.
This method isn't perfect, but it's a big step forward in making PDFs more accessible and easier to work with. Manually converting PDFs to Markdown by hand isn't realistic on a large scale, and just pulling out the text and chopping it up into chunks isn't enough, especially if you're missing out on important visual information.
Check out the GitHub repo for the scripts I mentioned. I hope this method helps you see the potential of AI in making it easier to work with PDFs and other documents. Feel free to reach out to me on LinkedIn if you have questions or want to chat about it.
Help improve contributions
Mark contributions as unhelpful if you find them irrelevant or not valuable to the article. This feedback is private to you and won’t be shared publicly.
Contribution hidden for you
This feedback is never shared publicly, we’ll use it to show better contributions to everyone.
Like
Celebrate
Support
Love
Insightful
Funny
Taitan Nguyen
Know Your Data | Discover Opportunities | Deliver Value
1mo
- Report this comment
Thanks! You mentioned the conversion is not perfect but I am curious if you have a measure of how well the resulting Markdown data compared to the original PDFs?
1Reaction
Matthew Groff
Principal AI Engineer @ Umbrage, Part of Bain & Company | AI Capability Lead
3mo
- Report this comment
More details on my blog site https://groff.dev/blog/ingesting-pdfs-with-gpt-vision
1Reaction 2Reactions
See more comments
To view or add a comment, sign in
Sign in
Stay updated on your professional world
Sign in
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Insights from the community
- Call Center Administration What are the best ways to handle customer questions about machine learning?
- Software Engineering How can AI software systems be designed to resist adversarial examples?
- Artificial Intelligence How can you choose the right debugging tool for an AI app?
- Software Engineering How can software engineers improve transparency in AI systems?
- IT Audit How can IT auditors communicate and report on AI and ML audit findings and recommendations?
- Machine Learning What are the best ways to ensure transferability in machine learning?
- Data Governance How can you label data that does not fit into predefined categories?
- Algorithms You want to build a recommendation engine in Julia. What are the best tools to use?
- Machine Learning What are some impressive ML projects for your portfolio?
- Technological Innovation How do you select the right AI and ML techniques for your data?
Others also viewed
- OpenAI's First Developer Conference Unleashes Game-Changing Updates Bharath Gopinath 6mo
- OpenAI Playground Aris Ihwan 9mo
- OpenAI Doubles Down on Agent Behavior and Hosts First Devday David Norris 6mo
- Machine Learning Orientation for Motivated Non-Coders: A Half-Day of Reading Larry O'Brien 1y
- What is Auto-GPT and why does it matter? Ana L. 1y
- Fine-tuning GPT-3.5 Turbo: A short intro for software engineers artiqode 9mo
- "Strategic Moves Catapulted This GPT To The Top Of OpenAI's Charts!" Orren Prunckun 4mo
- OpenAI DevDay 2023 Highlights Ganapathy Shankar 7mo
- How to use OpenAIHelp with an Excel formula, a step-by-step approach. Vincent Healy 10mo
- Two-Minute Recap of OpenAI DevDay + Insights Andrei Puni 7mo