Slow speed #560

IshanDindorkar · 2019-06-11T06:00:04Z

I am running couple of operations like mentioned below -
df.rows.select(fbdt("id", "integer")).table()
df.rows.select(fbdt("id", "float")).table()

I am using notebook for performing these operations. As per console, the operation is in progress but it is slow in spitting out results. The dataset on which I am running operations has close to 90 million records. My question is there any configuration which I can use to speed up computation of spark job.
FYI, I am using an Ubuntu machine with 4 chores and 32 GB RAM.
Could you please advise.

Thank you for your support.

Best-
Ishan

issue-label-bot · 2019-06-11T06:00:08Z

Issue-Label Bot is automatically applying the label question to this issue, with a confidence of 0.69. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

argenisleon · 2019-06-11T06:33:03Z

Hi,

Definitely this a slow operation. How much time is taking? Maybe we could explorer how to speed this up?

IshanDindorkar · 2019-06-11T08:00:26Z

It took 2-3 hours approximately for an individual operation.

argenisleon · 2019-06-11T14:30:12Z

Looking at the function https://github.com/ironmussa/Optimus/blob/master/optimus/functions.py#L398 we use fastnumbers library to speed things up when handling ints and floats. The function is super simple I am not sure if we have room for improvements here.

@FavioVazquez can you take a look on this?

IshanDindorkar · 2019-06-12T05:32:13Z

@argenisleon @FavioVazquez - did you guys made some changes to address this issue in the new release? Just curious!

argenisleon · 2019-06-12T05:49:14Z

Not at the moment. We are thinking about other approaches. Any thought about this?

argenisleon · 2019-07-24T21:07:44Z

We could improve performance If we implement #322

Close #602 #599 #593 #583 #560 #544 #380

argenisleon · 2019-08-20T02:15:29Z

Hi @IshanDindorkar,

We rebuild the profiler to improve parallelism. In local mode, we cut the time by 1/3.
Hope this can help

issue-label-bot bot added the question label Jun 11, 2019

argenisleon added the help wanted label Jun 11, 2019

argenisleon added a commit that referenced this issue Aug 20, 2019

Merge pull request #609 from ironmussa/develop

18e0bed

Close #602 #599 #593 #583 #560 #544 #380

argenisleon closed this as completed Aug 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow speed #560

Slow speed #560

IshanDindorkar commented Jun 11, 2019

issue-label-bot bot commented Jun 11, 2019

argenisleon commented Jun 11, 2019

IshanDindorkar commented Jun 11, 2019

argenisleon commented Jun 11, 2019 •

edited

Loading

IshanDindorkar commented Jun 12, 2019

argenisleon commented Jun 12, 2019 •

edited

Loading

argenisleon commented Jul 24, 2019

argenisleon commented Aug 20, 2019

Slow speed #560

Slow speed #560

Comments

IshanDindorkar commented Jun 11, 2019

issue-label-bot bot commented Jun 11, 2019

argenisleon commented Jun 11, 2019

IshanDindorkar commented Jun 11, 2019

argenisleon commented Jun 11, 2019 • edited Loading

IshanDindorkar commented Jun 12, 2019

argenisleon commented Jun 12, 2019 • edited Loading

argenisleon commented Jul 24, 2019

argenisleon commented Aug 20, 2019

argenisleon commented Jun 11, 2019 •

edited

Loading

argenisleon commented Jun 12, 2019 •

edited

Loading