Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow speed #560

Closed
IshanDindorkar opened this issue Jun 11, 2019 · 8 comments
Closed

Slow speed #560

IshanDindorkar opened this issue Jun 11, 2019 · 8 comments

Comments

@IshanDindorkar
Copy link

Hello @argenisleon / @FavioVazquez -

I am running couple of operations like mentioned below -
df.rows.select(fbdt("id", "integer")).table()
df.rows.select(fbdt("id", "float")).table()

I am using notebook for performing these operations. As per console, the operation is in progress but it is slow in spitting out results. The dataset on which I am running operations has close to 90 million records. My question is there any configuration which I can use to speed up computation of spark job.
FYI, I am using an Ubuntu machine with 4 chores and 32 GB RAM.
Could you please advise.

Thank you for your support.

Best-
Ishan

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label question to this issue, with a confidence of 0.69. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@argenisleon
Copy link
Collaborator

Hi,

Definitely this a slow operation. How much time is taking? Maybe we could explorer how to speed this up?

@IshanDindorkar
Copy link
Author

It took 2-3 hours approximately for an individual operation.

@argenisleon
Copy link
Collaborator

argenisleon commented Jun 11, 2019

Looking at the function https://github.com/ironmussa/Optimus/blob/master/optimus/functions.py#L398 we use fastnumbers library to speed things up when handling ints and floats. The function is super simple I am not sure if we have room for improvements here.

@FavioVazquez can you take a look on this?

@IshanDindorkar
Copy link
Author

@argenisleon @FavioVazquez - did you guys made some changes to address this issue in the new release? Just curious!

@argenisleon
Copy link
Collaborator

argenisleon commented Jun 12, 2019

Not at the moment. We are thinking about other approaches. Any thought about this?

@argenisleon
Copy link
Collaborator

We could improve performance If we implement #322

@argenisleon
Copy link
Collaborator

Hi @IshanDindorkar,

We rebuild the profiler to improve parallelism. In local mode, we cut the time by 1/3.
Hope this can help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants