Historically, we have heavily relied on 2-dimensional (2D) displays in order to both obtain and communicate large amounts of data in concise, understandable plots. As the volume and complexity of data grows, the traditional 2D charts and graphs are becoming inadequate in demonstrating the multi-faceted nature of various datasets. In this respect, Virtual Reality (VR) holds potential in offering a new, more effective approach for data visualization.
To this end, the goal of this program is to create a tool for visualizing 3-Dimensional Principle Component Analysis (PCA) in virtual reality. Such a tool would be useful in gaining more understanding for various data and creating new hypotheses. Overall, the relatively successful analysis and graphical rendering of the large datasets demonstrate the potential of VR in data visualization and its increasing importance in the future.
The primary interface used will be HTC Vive.
Basic Information for Developers and Users
This project is completed and is no longer maintained.
Individual developers who wish to independently expand the project should follow the subsequent steps:
- Download Unity. Ideally the version would be Unity 2017.3.1f1, but other versions should work as well.
- Download the Unity Package file. The name is Project_Package.unitypackage.
- Import the Unity Package into Unity and begin developing.
Individual users should follow these separate steps:
- Download the Unity Build Application. Navigate to the application in the ProjectBuild folder. The executible is named BioSNTR_Plot_Points.exe
- Set up the HTC Vive, start the application, and enjoy.
This project was built using Unity3D, and the scripting was all done in C#.
The PCA implementations are from Accord 3.8 and target the .Net Framework v3.5; the reason why I chose to not target more modern frameworks (v4.5+) is because Unity has support only for .Net Framework v3.5 in Unity 2017.3.
PrincipalComponentAnalysis pca = new PrincipalComponentAnalysis();
//Computes N number of Principal components
//N is the number of data points/entrys
pca.Learn(inputMatrix);
//Transforms the initial data by projecting it into three dimensions
//using the found principle component axises
pca.NumberOfOutputs = 3;
double[][] result = pca.Transform (inputMatrix);
//Return the transformed data. Contains the coordinates of the data points after projection
return result;
When the application first starts the user should see a main menu that looks something like the following. This menu is only viewable outside the headset as, through personal experience, inputting information via keyboard and mouse is easier than via the Vive.
Main Menu Screen Through the HeadSet
Main Menu Screen Through Computer Screen
In these instructions, I will go over the meaning of each of the possible inputs on the menu and how to properly format said inputs.
- Scale - Scale is how large or small one wants the subsequent plot to be. Larger scales results in more space between datapoints while smaller scales results in the opposite.
- Directory - This is where one will input the absolute directory path that leads to the folder that contains the data file.
- File Name - The name of the data file that contains the relevant data. IMPORTANT: the file name must include the file type (.txt, .csv, etc.)!
- Flip Data - The data from the file is read in the following manner: the horizontal rows are the individual data points while the vertical columns are the various dimensions of the data. As a consequence, occasionally, the data in the file will be the transpose of what the user actually wants to plot. In such situations, click the flip data button so that it shows 'true'.
- Coor_Data - The application is not very powerful in computing the PCA and projecting the data into three-dimensions. For small inputs it will be fine, but in cases where the input file is too large or some other reason, the application will likely crash. In such situations, it would be better to first compute the PCA projection and coordinate data in some other software / language such as Python or R before importing that as the datafile. If this happens, click the coor_data button so that is shows the appropriate value.
- More Options - Clicking this button will open a second menu with more inputs, including the button that will actually plot the PCA projection. This second menu should look like this:
-
Exclude Columns - Occasionally, there will be several columns of data that the user does not want to include in the graphical rendering or PCA computation. For example, for data files in the form of .csv or .txt, the user would not want to include the first column that contains the numbering of the rows of data. In these situations, list the various columns that should be excluded, using commas and spaces to separate the numbers (i.e. 0, 1, 2, 3, 4). The first column in the data is column 0, the second is 1, etc. IMPORTANT: If the data is flipped from the previous 'flip-data' functionality, the columns are in reference to the columns in the transpose of the data (i.e the rows in the original data). Columns headers are expected and inherently excluded from the PCA projection / graphical rendering.
-
Know Cat.? - If there is a column that contains categories / bins that individual data points can be placed into and the user has a desire to show these categories, then click this box to show the appropriate value.
-
Cat. Column - If the previous value is true, then input the column where the categorys are. If the previous value is false, this option will be grayed-out. IMPORTANT: Do not double input the category column into both the exclude column option and here.
-
Calculate PCA - This graphically renders the data from the input file according to the previous inputs. Clicking on this button will move the user to a new scene where the previous options will not be available. To go back to this input scene, see User Interaction - Menu.
-
Back - Return to the previous menu.
VR adaptation for the Vive is only present in the graph scene. The following instructions apply for User Interaction:
- There should be a constant, active laser coming out of the front of the HTC Vive controller.
- Menu - Hitting the application menu button will reveal a main menu where the user can perform extra actions and view the graph legend (if there is one). The extra actions include 'close menu', 'change input', and 'quit'. The first closes the menu, the second moves back to the main menu scene for a new input, and the third quits the application. The user can select options using the laser and the hair trigger (index finger). The menu will follow the user's vision.
Legend Menu (Graph Menu Screen 2)
-
Movement - The user can use the touchpad to move in the horizontal plane. To move vertically, the user should press the grip buttons. The right grip button will move the user upwards while the left grip button will move the user downwards. Movement is in short bursts in order to mitigate motion sickness.
-
Selecting points to view - If the laser points to a data point, then the name of the data point will automatically show up in a visible text element. The name of the data point is the category the data point falls in; if there are no categories then the name defaults to the data number from the input file.
When directed to a point, the laser prints out the datapoint's name
These examples will demonstrate what to input into the various inputfields and what the end result should look like. All example data files mentioned are located on my computer in the absolute directory path \Users\vrab\Desktop\BioSNTR.
In this example we will plotting data regarding irises from a datafile named iris.csv.
I fill out the various inputs of the main menu accordingly:
- Scale - 10. Through experimentation, I have determined that a scale of 10 makes the graph look nice.
- Directory - \Users\vrab\Desktop\BioSNTR
- File Name - iris.csv
- False. I do not want the transpose of the data to be plotted
- False. iris.csv does not contain coordinate data
Iris First Input Screen
On the second page of the main menu (numbered according to instructions above):
-
- I don't want to include the first column that simply numbers the rows of data.
- True. I have a column that categorizes my various points.
-
- The column that contains the category names is column 5. Notice that I didn't double input the category column into the exclude columns list.
Iris Second Input Screen
After inputting the values, I hit the calculate PCA button, and the application calculates and projects the data onto the first 3 principle components. Like so:
In this example we will be plotting coordinate data regarding mouse embryo development from coord_data.csv. The original data file was too large, and I used numpy to calculate these points. Source: Deng, Qiaolin, et al. “Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells.” Science, vol. 343, no. 6167, 10 Jan. 2014, pp. 193 –196., doi:10.1126/science.1245316.
- Scale - 30
- Directory - \Users\vrab\Desktop\BioSNTR
- File Name - coord_data.csv
- False
- True. coord_data.csv does contain coordinate data
Coord_Data First Input Screen
-
- I don't want to include the first column that simply numbers the rows of data.
- True
- 4
Coord_Data Second Input Screen
Final plot:
In this example we will be plotting the transpose of the data in Processed_Data.csv. Source: Tsai MH, Chen X, Chandramouli GV, Chen Y, Yan H, Zhao S, Keng P, Liber HL, Coleman CN, Mitchell JB, Chuang EY: Transcriptional responses to ionizing radiation reveal that p53R2 protects against radiation-induced mutagenesis in human lymphoblastoid cells. Oncogene 2006, 25:622-632.
- Scale - 5
- Directory - \Users\vrab\Desktop\BioSNTR
- File Name - Processed_Data.csv
- True
- False
Processed_Data First Input Screen
- 0
- False
- N/A
Processed_Data Second Input Screen
Final plot:
The following are all third-parties that I have used code implementations from or referenced in the development of this application:
- CSV-Reader from PrinzEugn
- The basic data plotting functionality from Big Data Social Science Fellows @ Penn State.
- Accord, Accord.Math, Accord.Statistics Assembly for PCA
- The NuGet configuration method for VisualStudio located here.
- Other. This space is for all sources that I failed to mention.
DataViz Library Copyright (C) 2018 Eric Feng
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Special recognition to the following:
- Professor Xijin Ge from SDSU who acted as my advisor for the duration of the project
- Virtual Reality @ Berkeley for the use of their equipment in the development and testing of the VR components of the application
This research was supported by National Science Foundation/EPSCoR Award No. IIA-1355423, by the state of South Dakota’s Governor’s Office of Economic Development as a South Dakota Research Innovation Center (SDRIC), and with financial/match commitment from all participating institutions.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation