Skip to content

Commit

Permalink
[SYSTEMDS-3670] TSNE PCA preprocessing
Browse files Browse the repository at this point in the history
This commit adds a comment and example script of TSNE with PCA preprocessing
According to Scikit Learn then PCA preprocessing reduces the dimensions
TSNE has to work with and, therefore, improve performance.

LDE Project Part 1 WS 2023/2024

Closes #1991
  • Loading branch information
Baunsgaard committed Jan 31, 2024
1 parent 2cd782f commit 610222c
Show file tree
Hide file tree
Showing 2 changed files with 48 additions and 0 deletions.
10 changes: 10 additions & 0 deletions scripts/builtin/tSNE.dml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,16 @@
# This function performs dimensionality reduction using tSNE algorithm based on
# the paper: Visualizing Data using t-SNE, Maaten et. al.
#
# There exists a variant of t-SNE, implemented in sklearn, that first reduces the
# dimenisonality of the data using PCA to reduce noise and then applies t-SNE for
# further dimensionality reduction. A script of this can be found in the tutorials
# folder: scripts/tutorials/tsne/pca-tsne.dml
#
# For direct reference and tips on choosing the dimension for the PCA pre-processing,
# you can visit:
# https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/manifold/_t_sne.py
# https://lvdmaaten.github.io/tsne/
#
# INPUT:
# -------------------------------------------------------------------------------------------
# X Data Matrix of shape
Expand Down
38 changes: 38 additions & 0 deletions scripts/tutorials/tsne/pca-tsne.dml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#-------------------------------------------------------------
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
#-------------------------------------------------------------

#
# tSNE dimensional reduction technique with PCA pre-processing,
# inspired from the sklearn implementation of tSNE:
# https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html


# Load data
data = read($X)

# Pre-process data with PCA
[PCA, components, centering, scalefactor] = pca(X=data, K=$k)

# Do tSNE with PCA output
Y = tSNE(X=PCA)

# Save reduced dimensions
write(Y, $Y)

0 comments on commit 610222c

Please sign in to comment.