You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello. First thanks for this terrific library, I get much better results than with Tabula, and this will be quite decisive to implement the open source Brazilian localization of the Odoo ERP (in the OCA foundation).
extracting pages 200 to 210...
Traceback (most recent call last):
File "./extract_csv.py", line 25, in <module>
extract_csv('efd_icms_ipi', 262)
File "./extract_csv.py", line 18, in extract_csv
pages='%s-%s' % (i, limit), line_size_scaling=80)
File "/home/rvalyi/.local/lib/python3.6/site-packages/camelot/io.py", line 101, in read_pdf
tables = p.parse(flavor=flavor, **kwargs)
File "/home/rvalyi/.local/lib/python3.6/site-packages/camelot/handlers.py", line 154, in parse
t = parser.extract_tables(p)
File "/home/rvalyi/.local/lib/python3.6/site-packages/camelot/parsers/lattice.py", line 364, in extract_tables
table = self._generate_table(table_idx, cols, rows, v_s=v_s, h_s=h_s)
File "/home/rvalyi/.local/lib/python3.6/site-packages/camelot/parsers/lattice.py", line 304, in _generate_table
table = table.set_edges(v_s, h_s, joint_close_tol=self.joint_close_tol)
File "/home/rvalyi/.local/lib/python3.6/site-packages/camelot/core.py", line 460, in set_edges
self.cells[L][J].bottom = True
IndexError: list index out of range
As a naive and brutal counter measure I changed the line 459 to:
while J < K and L <= len(self.cells) and J <= len(self.cells[L]):
self.cells[L][J].bottom = True
So far it looks it makes it extract the data properly...
The text was updated successfully, but these errors were encountered:
@rvalyi Can you also share the code that produced this error? I tried camelot --format csv --output legendado.csv -p 200-210 lattice legendado.pdf and it worked without any errors.
EDIT: Nvm, I found the advanced settings you used in the traceback. It happens with -scale 80.
indeed I used the line_size_scaling=80 option in the Python API. Thanks for the fix, I will test again soon.
I also have other tables where Camelot produce no such stacktrace but introduce line breaks and blank cells. I can live by implementing heuristics later in my code, but are you interested in such new bug reports?
Yes, please report other issues that you've experienced. I'm fixing an issue that introduces line breaks for v0.5.0, along with some other text-to-cell assignment behaviors. You can check https://github.com/socialcopsdev/camelot/milestone/3 for the complete list. Please check that you have a bug that isn't present in the list or in the issue tracker.
Hello. First thanks for this terrific library, I get much better results than with Tabula, and this will be quite decisive to implement the open source Brazilian localization of the Odoo ERP (in the OCA foundation).
But when extracting pages 200 to 210 from this pdf (Brazilian fiscal stuff) http://sped.rfb.gov.br/arquivo/download/2322 using the latest Github version (rev 7bdd9a3)
I get this stack trace:
As a naive and brutal counter measure I changed the line 459 to:
So far it looks it makes it extract the data properly...
The text was updated successfully, but these errors were encountered: