Skip to content
This repository has been archived by the owner on Jan 6, 2025. It is now read-only.

can't get read_pdf to work on Windows #140

Closed
topper-123 opened this issue Oct 12, 2018 · 10 comments
Closed

can't get read_pdf to work on Windows #140

topper-123 opened this issue Oct 12, 2018 · 10 comments
Labels

Comments

@topper-123
Copy link

topper-123 commented Oct 12, 2018

Assumedly others can get read_pdf to work, so this is just an error report to fix some bug instance.

Anyway, when I - after downloading foo.pdf into the working directory - do camelot.read_pdf('foo.pdf'), I get

>>> camelot.read_pdf('foo.pdf')
c:\users\tp\miniconda3\envs\py36\lib\site-packages\camelot\io.py in read_pdf(filepath, pages, flavor, **kwargs)
     89     p = PDFHandler(filepath, pages)
     90     kwargs = remove_extra(kwargs, flavor=flavor)
---> 91     tables = p.parse(flavor=flavor, **kwargs)
     92     return tables

c:\users\tp\miniconda3\envs\py36\lib\site-packages\camelot\handlers.py in parse(self, flavor, **kwargs)
    144             parser = Lattice(**kwargs) if flavor == 'lattice' else Stream(**kwargs)
    145             for p in pages:
--> 146                 t = parser.extract_tables(p)
    147                 tables.extend(t)
    148         return TableList(tables)

c:\users\tp\miniconda3\envs\py36\lib\site-packages\camelot\parsers\lattice.py in extract_tables(self, filename)
    336             return []
    337
--> 338         self._generate_image()
    339         self._generate_table_bbox()
    340

c:\users\tp\miniconda3\envs\py36\lib\site-packages\camelot\parsers\lattice.py in _generate_image(self)
    205         subprocess.call(
    206             gs_call, stdout=open(os.devnull, 'w'),
--> 207             stderr=subprocess.STDOUT)
    208
    209     def _generate_table_bbox(self):

c:\users\tp\miniconda3\envs\py36\lib\subprocess.py in call(timeout, *popenargs, **kwargs)
    265     retcode = call(["ls", "-l"])
    266     """
--> 267     with Popen(*popenargs, **kwargs) as p:
    268         try:
    269             return p.wait(timeout=timeout)

c:\users\tp\miniconda3\envs\py36\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors)
    707                                 c2pread, c2pwrite,
    708                                 errread, errwrite,
--> 709                                 restore_signals, start_new_session)
    710         except:
    711             # Cleanup if the child failed starting.

c:\users\tp\miniconda3\envs\py36\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
    995                                          env,
    996                                          os.fspath(cwd) if cwd is not None else None,
--> 997                                          startupinfo)
    998             finally:
    999                 # Child is launched. Close the parent's copy of those pipe

FileNotFoundError: [WinError 2] The system cannot find the file specified

Digging into the debugger, I see that in the Lattice object, the temp image file does not exist. So when at one point camelot.parsers.lattice.Lattice._generate_image is called, the error occurs, because the file in image_file doesn't exist:

>>> self  # just to show where we are in the debugger
<camelot.parsers.lattice.Lattice object at 0x09D60030>
>>> self.imagename
'C:\\Users\\TP\\AppData\\Local\\Temp\\tmpe7b1ts_i\\page-1.png'
>>> import pathlib
>>> pathlib.Path(self.imagename).exists()
False

So apparantly the file in question hasn't been created.

I could diagnsose the issue some more, if I could get some guidance...

@vinayak-mehta
Copy link
Contributor

Thanks for the report @topper-123! Did you install Camelot using pip or conda? Did you install the dependencies before that? (specifically ghostscript, which is used to convert a PDF into an image) If yes, what is the name of your ghostscript executable and is it in the Windows PATH variable?

You can do this check to see if ghostscript was installed correctly.

I suspect that the subprocess call to ghostscript is failing silently.

@topper-123
Copy link
Author

I've got ghostscript installed, but gswin32c.exe -version finds nothing. So It's most probably a path issue. I'll look into getting the path set correctly up.

Maybe Camelot should on the first call to ghostscript do a existence check for ghostscript (a la gswin32c.exe -version), so a error mesage could be given for that instead of missing file?

@vinayak-mehta
Copy link
Contributor

Do you have 32-bit Windows? Otherwise, you should look for gswin64c.exe.

@aakashdusane
Copy link

I'm facing the same issue. My gWin64c.exe seems to be working, although when I try opening it through cmdPromt I get "App can't run on your PC" Dialogue and "access denied" in cmd-line which is strange considering the .exe file works fine if accessed on its own.

@mattmurray
Copy link

I'm facing the same issue. My gWin64c.exe seems to be working, although when I try opening it through cmdPromt I get "App can't run on your PC" Dialogue and "access denied" in cmd-line which is strange considering the .exe file works fine if accessed on its own.

I had the same issue. I eventually got it working by adding "C:\Program Files\gs\gs9.25\bin" to my Path

@vinayak-mehta
Copy link
Contributor

vinayak-mehta commented Oct 19, 2018

@aakashdusane @mattmurray Did you check out this part of the install documentation? https://camelot-py.readthedocs.io/en/master/user/install.html#for-windows It says that ghostscript needs to be on the PATH.

I guess I should fix and merge #133 soon, which changes the ghostscript subprocess call to an API call.

@aakashdusane
Copy link

aakashdusane commented Oct 19, 2018 via email

@topper-123
Copy link
Author

topper-123 commented Oct 20, 2018

Alright, I've figured out what it was. It was/is two issues:

1:
The path issue that is being discussed above. I can fix this myself easily enough, though IMO the camleot error message should be made more specific to the issue of Ghostscript not being found....

2:
My system is 64-bit, but I downloaded the 32-bit version of GhostScript. My version of GhostScript isn't picked up, because the relavant code in Camelot looks like this:

if info['system'] == 'windows':
    bit = info['machine'][-2:]  # '64' on my machine
    gs_call.insert(0, 'gswin{}c.exe'.format(bit))

This means Camelot doesn't pick up on my ghostScript, because the file is gwsin32c.exe, not gswin64c.exe..

You support python2.7, but FYI, python3 shutil module has a which command to make existence checks easy on windows, so so the above could be replaced with:

if info['system'] == 'windows':
    gswin64, gswin32 = 'gswin64c.exe', 'gswin32c.exe'
    if shutil.which(gswin64):
        gs_call.insert(0, gswin64)
    elif shutil.which(gswin32):
        gs_call.insert(0, gswin32)
    else:
        raise OSError("GhostScript not found. Check that it is installed and the path is set correctly")

I don't know how to do that in Python2.7, but presumably there is a way.

@vinayak-mehta
Copy link
Contributor

Thanks for the suggestion @topper-123, I've created a PR!

@baridhi
Copy link

baridhi commented Dec 13, 2019

Hey, I ran into the same issue. I'm not very familiar with OS related aspects. However, I understood that Python is on path1, say whereas the gswin64 is in path2. How do I make Python see that different folder in C drive? I tried sys.path.append("C:\Program Files\gs") but to no avail. Thanks in anticipation.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants