Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiler fails when datetime or numeric types are detected #641

Closed
arpit1997 opened this issue Aug 25, 2019 · 6 comments · Fixed by #646
Closed

Profiler fails when datetime or numeric types are detected #641

arpit1997 opened this issue Aug 25, 2019 · 6 comments · Fixed by #646
Labels

Comments

@arpit1997
Copy link
Contributor

When running the profiler with infer=True with actual columns having datetime and integer types it fails with traceback attached.

I am attaching a file which I experimented with.
crimes-in-boston.zip

from optimus import Optimus
op = Optimus()
df2 = op.read.csv('crime.csv')
op.profiler.run(df2, '*', infer=True)

Traceback:

TypeError                                 Traceback (most recent call last)
<ipython-input-5-e4a0041fc71c> in <module>
----> 1 op.profiler.run(df2, '*', infer=True)

~/Projects/Optimus/optimus/helpers/decorators.py in timed(*args, **kw)
     26     def timed(*args, **kw):
     27         start_time = timeit.default_timer()
---> 28         f = method(*args, **kw)
     29         _time = round(timeit.default_timer() - start_time, 2)
     30         logger.print("{name}() executed in {time} sec".format(name=method.__name__, time=_time))

~/Projects/Optimus/optimus/profiler/profiler.py in run(self, df, columns, buckets, infer, relative_error, approx_count)
    199 
    200                 if col["column_dtype"] == "date":
--> 201                     hist_year = plot_hist({col_name: hist_dict["years"]}, "base64", "years")
    202                     hist_month = plot_hist({col_name: hist_dict["months"]}, "base64", "months")
    203                     hist_weekday = plot_hist({col_name: hist_dict["weekdays"]}, "base64", "weekdays")

TypeError: list indices must be integers or slices, not str

I believe the error at https://github.com/ironmussa/Optimus/blob/master/optimus/helpers/columns_expression.py#L93, It always goes to string datatype if statement. function is_column_a takes df, column_name and dtypes as args. df have all types as string and it matches with string only.
@argenisleon Correct me if I am wrong.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.92. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@issue-label-bot issue-label-bot bot added the bug label Aug 25, 2019
@argenisleon
Copy link
Collaborator

@arpit1997 you are right. The data inference process is done after the general starts are calculated. The hist needs to know the data type to correctly calculate the hist.

I will push a fix tomorrow.

Thanks for the report.

@arpit1997
Copy link
Contributor Author

@argenisleon Any update on this?

@argenisleon
Copy link
Collaborator

argenisleon commented Aug 28, 2019

The bug is fixed in this branch https://github.com/ironmussa/Optimus/tree/feature/profiler_improvements

I have not properly tested it yet. Tomorrow I will merge it to master

@argenisleon
Copy link
Collaborator

@arpit1997 This is now in master and available in a new release

@arpit1997
Copy link
Contributor Author

arpit1997 commented Aug 29, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants