Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pandas FutureWarning and enhance wranglers performance #1590

Merged
merged 7 commits into from
Apr 13, 2024

Conversation

rsmb7z
Copy link
Collaborator

@rsmb7z rsmb7z commented Apr 12, 2024

Pull Request

  • Fixed pandas FutureWarning for TradeTickDataWrangler.process_bar_data
  • Implemented prepare_event_and_init_timestamps which prepares ts_event and ts_init directly from DatetimeIndex which is already ns. This has improved the performance significantly, less than half time.
  • Optional sort_data flag for process_bar_data with default to True

Following is the results on my dataset:

  • TradeTickDataWrangler 1970563 rows, processed in 7s (previously 32s)
  • QuoteTickDataWrangler 6822199 rows, processed in 13s (previously 112s)

@rsmb7z rsmb7z marked this pull request as draft April 12, 2024 15:18
@cjdsellers cjdsellers changed the title Fix pandas FutureWarning and Enhance Wranglers performance Fix pandas FutureWarning and enhance wranglers performance Apr 12, 2024
@cjdsellers
Copy link
Member

Hey @rsmb7z

The changes look good - I see the PR is in draft mode, did you still intend to add more changes?

@rsmb7z
Copy link
Collaborator Author

rsmb7z commented Apr 12, 2024

Hey @rsmb7z

The changes look good - I see the PR is in draft mode, did you still intend to add more changes?

Hi @cjdsellers
Yes figured out further performance improvements. Will change to 'Ready' once pushed.

@rsmb7z
Copy link
Collaborator Author

rsmb7z commented Apr 13, 2024

Hi @cjdsellers
The performance have further improved as follows.

  • 1970563 bars loaded as TradeTick in 7.00s (from 32s)
  • 1970563 bars loaded as QuoteTick in 12.89s (from 112s)

I have also added optional flag sort_data set default to True for keeping current behavior. This helps further improve because DataEngine would be sorting eventually as well.

@rsmb7z rsmb7z marked this pull request as ready for review April 13, 2024 14:14
@cjdsellers cjdsellers merged commit ce38cc3 into nautechsystems:develop Apr 13, 2024
9 checks passed
@rsmb7z rsmb7z deleted the pr_240412 branch April 14, 2024 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants