Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Promote FITSNE from experimental #3805

Closed
Tracked by #4139
divyegala opened this issue Apr 29, 2021 · 15 comments · Fixed by #4361
Closed
Tracked by #4139

[FEA] Promote FITSNE from experimental #3805

divyegala opened this issue Apr 29, 2021 · 15 comments · Fixed by #4361
Labels
feature request New feature or request inactive-30d

Comments

@divyegala
Copy link
Member

I'll attach embeddings of some popular datasets using cuML t-sne (FFT), cuML t-sne (BH), and sklearn t-sne (BH)

@divyegala divyegala added feature request New feature or request ? - Needs Triage Need team to review and classify labels Apr 29, 2021
@divyegala
Copy link
Member Author

Boston Dataset:

  1. cuML FFT
    boston_90_30_fft

  2. cuML BH
    boston_90_30_barneshut

  3. sklearn BH
    boston_90_30_sklearn

@divyegala
Copy link
Member Author

Breast Cancer Dataset:

  1. cuML FFT
    breast_cancer_90_30_fft

  2. cuML BH
    breast_cancer_90_30_barneshut

  3. sklearn BH
    breast_cancer_90_30_sklearn

@divyegala
Copy link
Member Author

Diabetes Dataset:

  1. cuML FFT
    diabetes_90_30_fft

  2. cuML BH
    diabetes_90_30_barneshut

  3. sklearn BH
    diabetes_90_30_sklearn

@divyegala
Copy link
Member Author

Digits Dataset:

  1. cuML FFT
    digits_90_30_fft

  2. cuML BH
    digits_90_30_barneshut

  3. sklearn BH
    digits_90_30_sklearn

@divyegala
Copy link
Member Author

Iris Dataset:

  1. cuML FFT
    iris_90_30_fft

  2. cuML BH
    iris_90_30_barneshut

  3. sklearn BH
    iris_90_30_sklearn

@dantegd dantegd changed the title [FEA] Promote FITSNE from experimenatl [FEA] Promote FITSNE from experimental Apr 29, 2021
@zbjornson
Copy link
Contributor

It looks like a lot of those differences can be reduced by adjusting late_exaggeration.

It also looks like there are flyaways in the Iris dataset in cuML B-H.

@github-actions
Copy link

github-actions bot commented Jun 2, 2021

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@cjnolet
Copy link
Member

cjnolet commented Sep 22, 2021

Just want to provide an update here for some tasks that should be completed before we officially promote the FFT TSNE from experimental (and potentially make it the default option). We've shown that the FFT algorithm works well on toy datasets but there has been evidence to suggest that 1) it may not always be providing better results over Barnes-hut, and 2) It may not be faster. To my knowledge, we have not done any formal benchmarks or scale analysis of the FFT algorithm, outside of the initial results that @zbjornson has been gracious enough to perform.

At the very minimum, we should want the results to be at least as correct and performant as Barnes-hut in order to make the FFT variant the default option. If more stable results but not quite as performance, we could probalby still move FFT out of experimental but not make it the default.

From my perspective, there are reall two tasks that remain:

  1. Benchmark performance and scale on both real-world datasets and toy datasets
  2. Evaluate the stability / correctness on real-world datasets and at scale

One very easy place to try this is https://github.com/clara-parabricks/rapids-single-cell-examples/blob/master/notebooks/hlca_lung_gpu_analysis.ipynb

@lowener
Copy link
Contributor

lowener commented Nov 12, 2021

I ran TSNE on a few real-world datasets to observe the embeddings and the performance.

  • From what I found, FFT has correct results. Not always better or worse than B-H.
  • FFT is most of the time slower than Barnes-Hut when the number of iterations is lower than 1500. After that treshold, Barnes-Hut is slower than FFT.

I'll add here the results, starting with the dataset that you linked. (shape = (65462, 20))

Screenshot 2021-11-12 at 17-08-18 hlca_lung_gpu_analysis2 - Jupyter Notebook
Screenshot 2021-11-12 at 17-07-54 hlca_lung_gpu_analysis2 - Jupyter Notebook

@lowener
Copy link
Contributor

lowener commented Nov 12, 2021

On a Spotify dataset from Kaggle (link) with shape (2017, 8) and 500 iterations

Screenshot 2021-11-12 at 17-01-12 tsne - Jupyter Notebook

@lowener
Copy link
Contributor

lowener commented Nov 12, 2021

On a Credit card fraud detection from Kaggle too (link) with multiple iterations choices. Shape=(3492, 30)

Screenshot 2021-11-11 at 14-42-47 autoencoders-tsne - Jupyter Notebook

Screenshot 2021-11-11 at 14-54-46 autoencoders-tsne - Jupyter Notebook

Screenshot 2021-11-11 at 17-22-06 autoencoders-tsne - Jupyter Notebook

Screenshot 2021-11-11 at 17-23-13 autoencoders-tsne - Jupyter Notebook

@cjnolet
Copy link
Member

cjnolet commented Nov 15, 2021

@lowener, these plots and benchmarks look great and it's nice to see the performance gap isn't prohibitively large between FFT and BH on these datasets. So long as they are converging in a reasonable number of iterations (which look to be similar-ish across BH and FFT), this could be evidence for supporting FFT as the default.

Ideally, before we officially promote FFT to a default, we should also address this concern (since these datasets don't seem to be showing such a signficant slowdown across differing numbers of iterations): #3865 (comment)

@zbjornson
Copy link
Contributor

The runtimes for these examples are only a few seconds. When I benchmarked larger datasets that take between 30 seconds an 10 minutes a V100, FFT was between 1.3 and 2.2x faster than B-H.

Now that K-L divergence can be calculated, it would be cool to plot that over each iteration. The plot in #3058 is partly wrong, but exact, FFT and B-H all seemed to converge at different rates.

@cjnolet
Copy link
Member

cjnolet commented Nov 15, 2021

@zbjornson,

I fully agree we should be comparing larger datasets. It also looks like the early stopping was removed from barnes-hut at some point, which invalidates the benchmarks above. Out of curiosity, what datasets did you use for benchmarking?

rapids-bot bot pushed a commit that referenced this issue Nov 18, 2021
@zbjornson
Copy link
Contributor

zbjornson commented Nov 18, 2021

@cjnolet I was using synthetic N-dimensional Gaussian blobs with between 4 and 50 dimensions, 100k and ~20M rows. I was using a fixed number of iterations without early stopping. (I have some concerns about the min grad norm for early stopping but haven't had time to dig into it yet.)

edit

Your comment here #4361 (comment)

While the FFT implementation is slower iteration for iteration

is the opposite of my observations.

vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this issue Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request inactive-30d
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants