IndexSelector: use dynamic options #219

achimgaedke · 2022-05-11T14:08:01Z

addresses loading timeouts for large datasets as described in #205

This is rough and ready, keen to get some feedback.

oegedijk · 2022-05-11T18:42:38Z

Cool, really nice approach! I actually noticed that dynamic dropdown thing in the documentation the other day as well. Very nifty!

Wonder if there are any performance tradeoffs though such that for small datasets (or rather maybe memory usage of self.idxs) we would use the old_fashioned dropdown and for larger ones the dynamic one.

But if it is perfomative enough than we can just default to the dynamic one...

oegedijk · 2022-05-11T19:10:26Z

This seems quite a bit slower, but has the benefit of being case sensitive:

oegedijk · 2022-05-11T19:15:28Z

Probably makes sense to wrap this into an IndexDropdownComponent so that is easy to replicate across components.

oegedijk · 2022-05-11T20:09:25Z

so this seems to work (check the feature input component index selector on the whatif tab)...

oegedijk · 2022-05-11T20:28:47Z

hmm, tests seem to fail on ubuntu. May check tomorrow...

achimgaedke · 2022-05-12T12:32:20Z

Glad you like it.

Why not use the IndexSelector everywhere?

Maybe there's something I'm missing in the code. It's likely that I actually haven't found all index selection locations.

The limit of options returned (1000) is probably best to make configurable... a bit like plot_sample.

achimgaedke · 2022-05-12T12:35:43Z

Btw, very pleased with the load time of the application utilising 1M data points.

oegedijk · 2022-05-12T12:43:29Z

Good question, and answer is that I forgot I had already written IndexSelector, but am apparently good at reinventing the wheel :)

oegedijk · 2022-05-12T12:56:50Z

I was looking at the FeatureInputComponent which didn't use the IndexSelector, my bad. Anyway, will have a look later. Also at this weird stochastic ubuntu bug.

oegedijk · 2022-05-14T20:10:31Z

Okay, I now extended the IndexSelector to all existing ExplainerComponents, and also added a functionality that that when the index is set directly the options simply get set to [index]. Before if the index was set from e.g. a random index selector and the chosen index was not already in the dropdown, it would not show.

Since you have ready access to a million row index dataset, could you test performance of this self.explainer.get_index_list().str.contains(search_value, case=False)][:self.max_idxs_in_dropdown].tolist() vs the islice approach you had first?

I would like search to be case-insensitive, so if islice is much faster on big datasets, then we could add an .index_list_lower() and compare search_value.lower() to it...

oegedijk · 2022-05-23T18:46:39Z

bumped sklearn to 1.1 and python to 3.8 and voila tests are passing :)

achimgaedke · 2022-05-24T11:50:06Z

Thanks for keeping up the momentum on tbis and #217. I'm keen on trying it out, but realistically i might have a chance to get to it mid next week. The next bottle necks for 100s of thousands of data points are the ROC and PR AUC plots, esp the latter. I haven't had a closer look for the reasons. Not having the index repeatedly/at all in the JSON layout file is a great step forward. Cheers!

…

On Mon, 23 May 2022, 20:46 Oege Dijk, ***@***.***> wrote: bumped sklearn to 1.1 and python to 3.8 and voila tests are passing :) — Reply to this email directly, view it on GitHub <#219 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABTGKNRT3VFSBKULN77XUIDVLPHBVANCNFSM5VVCII6Q> . You are receiving this because you authored the thread.Message ID: ***@***.***>

achimgaedke · 2022-06-12T02:04:49Z

I have reviewed the changes.

I propose some changes to the IndexSelector class from a software-engineering perspective, mainly from a consistency perspective and a bug in the conditions between the layout and the callback creation.

There's one difference in the use of max_idxs_in_dropdown: The original fix proposes to send dropdown options always as a callback result when requested. This version goes the middle way: allowing the dropdown values to be included in the app layout (multiple times).

Is a purely callback-based option too slow in some cases? If not, I'd suggest keeping only that one.

All in all, I'm worried about startup performance (and hitting a timeout with JSON becoming too slow to load) rather than runtime performance. Do we want two different thresholds?

I am not too concerned with the CPU performance difference of iclice vs pandas.DataFrame.contains for now. One would also need to consider memory consumption.

oegedijk · 2022-06-14T18:43:47Z

I think some subtle bug may have been introduced with the cleanup, but will have a closer look tomorrow..

thanks again for the work!

…ke/explainerdashboard into dynamic-index-selector

oegedijk · 2022-06-15T08:59:21Z

Alright, looks good to me, let's merge it!

Thanks again for this awesome initiave!

Will give you a shout out on linkedin once I'll release this version...

oegedijk · 2022-06-15T12:40:51Z

just released it as version 0.4.0

Achim Gaedke added 2 commits May 12, 2022 02:05

IndexSelector: use dynamic options

9e4cf51

converted more index dropdowns to dynamic

228bbc8

adds InputDropdownComponent

06e82c4

updates IndexSelector for dynamic search

e454605

Oege Dijk added 2 commits May 23, 2022 17:04

bump sklearn to v1.1 and adjusts pipeline tests

4267f92

Merge branch 'master' into dynamic-index-selector

d1cc311

Achim Gaedke and others added 5 commits June 12, 2022 18:46

IndexSelector: use dynamic options

75bdabc

converted more index dropdowns to dynamic

692607f

adds InputDropdownComponent

3a344f7

updates IndexSelector for dynamic search

4a30dd9

software eng fixup and cleanup

c2c9187

Oege Dijk added 4 commits June 15, 2022 10:21

Merge branch 'dynamic-index-selector' of https://github.com/achimgaed…

212ccfc

…ke/explainerdashboard into dynamic-index-selector

casts index_list to str by default

ea046d0

removes itertools import

e2a4ced

bump to 0.4.0

3f37629

oegedijk merged commit c247f47 into oegedijk:master Jun 15, 2022

achimgaedke deleted the dynamic-index-selector branch November 24, 2022 03:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexSelector: use dynamic options #219

IndexSelector: use dynamic options #219

achimgaedke commented May 11, 2022

oegedijk commented May 11, 2022

oegedijk commented May 11, 2022

oegedijk commented May 11, 2022

oegedijk commented May 11, 2022

oegedijk commented May 11, 2022

achimgaedke commented May 12, 2022

achimgaedke commented May 12, 2022

oegedijk commented May 12, 2022

oegedijk commented May 12, 2022

oegedijk commented May 14, 2022

oegedijk commented May 23, 2022

achimgaedke commented May 24, 2022 via email

achimgaedke commented Jun 12, 2022

oegedijk commented Jun 14, 2022

oegedijk commented Jun 15, 2022

oegedijk commented Jun 15, 2022

IndexSelector: use dynamic options #219

IndexSelector: use dynamic options #219

Conversation

achimgaedke commented May 11, 2022

oegedijk commented May 11, 2022

oegedijk commented May 11, 2022

oegedijk commented May 11, 2022

oegedijk commented May 11, 2022

oegedijk commented May 11, 2022

achimgaedke commented May 12, 2022

achimgaedke commented May 12, 2022

oegedijk commented May 12, 2022

oegedijk commented May 12, 2022

oegedijk commented May 14, 2022

oegedijk commented May 23, 2022

achimgaedke commented May 24, 2022 via email

achimgaedke commented Jun 12, 2022

oegedijk commented Jun 14, 2022

oegedijk commented Jun 15, 2022

oegedijk commented Jun 15, 2022