Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2. Training Graph-Transformer #21

Closed
03-134202-096 opened this issue Jul 5, 2024 · 1 comment
Closed

2. Training Graph-Transformer #21

03-134202-096 opened this issue Jul 5, 2024 · 1 comment

Comments

@03-134202-096
Copy link

Kindly explain these text files in the Training Graph-Transformer and their format, from where i get these text files?
Split training, validation, and testing dataset and store them in text files as:

sample1 \t label1
sample2 \t label2
LUAD/C3N-00293-23 \t luad
...

@GSWS
Copy link
Collaborator

GSWS commented Dec 10, 2024

The text files mentioned are used to store the data samples and their corresponding labels in a tab-separated format. Each line in the file represents a single sample, where:

  • sample1: Refers to the identifier or path to a sample (e.g., file name, image ID, or dataset ID).
  • label1: Indicates the label assigned to that sample (e.g., cancer, normal, or specific classes like luad for lung adenocarcinoma).

The format is as follows:
sample1 \t label1
sample2 \t label2
LUAD/C3N-00293-23 \t luad
...

How to Create These Text Files:

  1. Source the Data: Labels such as cancer or normal can often be obtained from public datasets like the GDC Data Portal or other datasets appropriate for your task.

  2. Split the Dataset:

  • Training Set: Used to train the model.
  • Validation Set: Used to monitor model performance during training and prevent overfitting.
  • Test Set: Used to evaluate the final model's performance on unseen data.
    Ensure that these splits are mutually exclusive to maintain the integrity of the evaluation.
  1. Prepare the Files:
    For each split (training, validation, and testing), create a text file where each line contains a sample and its label, separated by a tab (\t).

Example Workflow:

  • Download the dataset from a public source (e.g., GDC).
  • Parse the dataset to extract the sample identifiers and labels.
  • Randomly split the dataset into training, validation, and testing sets.
  • Write the splits into separate text files (e.g., train.txt, val.txt, test.txt) in the format specified.

Example:
For a dataset with samples categorized into cancer and normal, the files might look like:

train.txt:
sample1 \t cancer
sample2 \t normal
sample3 \t cancer

val.txt:
sample4 \t normal
sample5 \t cancer

test.txt:
sample6 \t normal
sample7 \t cancer

These files can then be fed into a Graph-Transformer training pipeline, where the model uses the training set to learn, the validation set to tune hyperparameters, and the test set to evaluate its performance.

@GSWS GSWS closed this as completed Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants