-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for streamed parsing of very large excelx sheets #69
Conversation
…only print if it is already loaded.
…are small enough and it was easier to keep styles in.
Conflicts: lib/roo/excelx.rb
@rlburkes I tried using your fork but I got an exception:
|
@christopherchiu I updated with a few changes, if you continue to see issue can you provide more details and open an issue? |
drop present?
@christopherchiu I fixed this in my latest merge referenced above. @rlburkes thanks for the important work here. @Empact We'd greatly appreciate this fork merging back into master. I'll contribute to maintaining it with @rlburkes if needed. Our company ( @christopherchiu is on my team ) will also contribute. Long live roo :) |
@fareesh thanks, fixed. Shouldn't happened since options is defaulted to and empty hash, but it doesn't hurt to be consistent and err on side of caution. |
I have some code like so: spreadsheet = Roo::Excelx.new(file,{},:ignore)
(3..spreadsheet.last_row).each do |row_index,index|
# make [[:col1,:col2,:col3],["a","b","c"]] become {:col1 => "a", :col2 => "b", :col3 => "c"}
row = Hash[[spreadsheet.row(1),spreadsheet.row(row_index)].transpose]
....
end I want to do the same thing using the How would I go about adding a parameter that makes |
@fareesh The current behavior of padding will pad from column 1(A) up to the last detected column. If you wish to pad after the last detected column (to a specified width) you can do so by padding the yielded row within your application. (desired_size-row.size).times { row << nil } # or whatever you prefer to pad end of array |
@@ -63,16 +63,21 @@ def to_type(format) | |||
module_function :to_type | |||
end | |||
|
|||
class ExceedsMaxError < Exception; end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not descend from Exception, which is for low-level exceptions. Better to descend from StandardError or below. http://blog.nicksieger.com/articles/2006/09/06/rubys-exception-hierarchy/
…works). rspec breaks on tyhpheous gem -- comment out in spec_helper to get excelx spec to run
Doc and defaults
Pl can someone shed light on how to use the new each_row_streaming call to parse large XL files. |
You should be able to see sample code in the readme. On Thursday, November 28, 2013, Mohith Thimmaiah wrote:
|
Done. Opened. On Thu, Dec 5, 2013 at 5:37 AM, Ben Woosley [email protected]:
|
@ephekt, I'm using 1.13.2 version of 'roo' but I can't find any |
@extazystas You will need to use the fork this pull request originated from. @Empact any update on getting this merged? |
@rlburkes, thanks - now I can use this methods. But could you please provide an example how to get only document header? With origin gem I did it that way:
|
@extazystas, make sure and check out the readme, @ephekt provided some nice documentation. This should help get you started.
|
@rlburkes great, thank you! |
@Empact any chance we can work to get this merged? We would love to get off our fork of Roo. |
@rlburkes Could you rebase the PR to start? Or point out the relevant PR if it's not this one? |
Hey @rlburkes I just pushed a major refactor of Roo::Excelx that makes all the file access lazy-loaded, e.g. sheets, comments, relationships files are only read as they are accessed. There's still a good amount in your PR I would like to get in (e.g. the each_row_streaming could still be useful). Would you mind porting your changes over to the current codebase? The minimal_load option should no longer be necessary. |
@Empact I went ahead and merged master, thanks for all your work refactoring it is definitely headed in a good direction. No longer need |
Support for streamed parsing of very large excelx sheets
@rlburkes Thank you! |
Changes should address the following issues:
Add interface to support streamed reading of an arbitrarily large xlsx document:
Roo::Utils
has a generic interface for yielding elements (streamed) of a given name to a caller.Roo::Excelx::Cell
has a coordinate that can provide row, column information to the consumer.Added a bunch of basic test coverage for the
Roo::Excelx
andRoo::Utils
classes to make sure I didnt break anything while doing minimal refactoring.Hopefully you don't mind, I cleaned up a couple inconsistencies with parens, formatting, etc.