how to Extract data from pdf by using python

Question

I want to know how to extract data from pdf by using python language on pycharm .I tried to code by using pycharm by importing from pypdf2 but yet it is not showing results.

Can you show what code you have so far? – kpie Feb 22 '22 at 05:11 — kpie, Feb 22 '22 at 05:11

score 1 · Answer 1 · answered Feb 22 '22 at 05:23

PyPDF2, PyPDF3, and PyPDF4 are all unmaintained. I would recommend taking a look at this question and trying one of the many different methods discussed.

According to the PyPDF2 documentation, the extractText() method "works well for some PDF files, but poorly for others, depending on the generator used". Without seeing your code, a large factor in why your code is not working may be incompatibility with the PDF file itself.

score 0 · Answer 2 · answered Feb 22 '22 at 05:36

0

Use this code

from PyPDF2 import PdfFileReader

reader = PdfFileReader(filename)
pageObj = reader.getNumPages()
for page_count in range(pageObj):
    page = reader.getPage(page_count)
    page_data = page.extractText()

answered Feb 22 '22 at 05:36

Shubham Korade

167
7

how to Extract data from pdf by using python

2 Answers2