How can I apply the TimeGrad model to my own CSV dataset? #181

youandyourself · 2025-02-09T05:35:12Z

After loading the dataset with pandas and converting it to ListDataset format, I applied the MultivariateGrouper, but there seems to be an issue with how to specify the target_dim and input_size for the TimeGrad model.
My dataset has only 5 variables and 1 timestamp, but it requires the input_size to be 36.
here's my code :

`def load_data_from_csv(file_path, prediction_length=24):
data = pd.read_csv(file_path)
if 'datetime' in data.columns or 'date' in data.columns or 'Time(s)' in data.columns or 'dt' in data.columns or '0' in data.columns:
time_column = data.columns[0]
data[time_column] = pd.to_datetime(data[time_column], errors='coerce')
data = data.set_index(time_column)
else:
# Assume starting from a fixed time (e.g., 2022-01-01 00:00:00) with hourly observations
start_time = pd.to_datetime('2022-01-01 00:00:00') # Set start time
time_index = pd.date_range(start=start_time, periods=len(data), freq='h') # One observation per hour
data['timestamp'] = time_index # Add timestamp to data
data = data.set_index('timestamp') # Set timestamp as index

# Infer data frequency
freq = pd.infer_freq(data.index)
if freq is None:
    freq = 'H'  # Default to hourly
feature_size = data.select_dtypes(include=[np.number]).shape[1]
# Split data into training and testing sets
train_length = int(len(data) * 0.8)
train_df = data.iloc[:train_length]
test_df = data
#test_length = len(data) - train_length
print(data.shape[1])
print(feature_size)
# Create multivariate time series dataset
dataset_train = ListDataset(
    [
        {
            "start": train_df.index[0], 
            "target": train_df[feature].values
        } for feature in data.columns if data[feature].dtype == np.number
    ],
    freq=freq
)

dataset_test = ListDataset(
    [
        {
            "start": test_df.index[train_length], 
            "target": test_df[feature].values
        } for feature in test_df.columns if test_df[feature].dtype == np.number
    ],
    freq=freq
)
grouper = MultivariateGrouper(max_target_dim=min(2000, int(feature_size)))
dataset_train = grouper(dataset_train)
dataset_test = grouper(dataset_test)
return dataset_train, dataset_test, feature_size, freq

estimator = TimeGradEstimator(
target_dim=target_dim,
prediction_length=24,
context_length=24,
cell_type='GRU',
input_size=5*target_dim + 1,
freq=freq,
loss_type='l2',
scaling=True,
diff_steps=100,
beta_end=0.1,
beta_schedule="linear",
trainer=Trainer(device=device,
epochs=1,
learning_rate=1e-3,
num_batches_per_epoch=100,
batch_size=32,)
)`

The text was updated successfully, but these errors were encountered:

kashif · 2025-02-10T18:04:11Z

could you kindly have a look at using the 0.7.0 branch and see this notebook for a running example: https://github.com/kashif/time_match/blob/main/Time-Grad-Solar.ipynb

…

On Sun, Feb 9, 2025 at 6:35 AM Chongyang Zhong ***@***.***> wrote: After loading the dataset with pandas and converting it to ListDataset format, I applied the MultivariateGrouper, but there seems to be an issue with how to specify the target_dim and input_size for the TimeGrad model. My dataset has only 5 variables and 1 timestamp, but it requires the input_size to be 36. here's my code : `def load_data_from_csv(file_path, prediction_length=24): data = pd.read_csv(file_path) if 'datetime' in data.columns or 'date' in data.columns or 'Time(s)' in data.columns or 'dt' in data.columns or '0' in data.columns: time_column = data.columns[0] data[time_column] = pd.to_datetime(data[time_column], errors='coerce') data = data.set_index(time_column) else: # Assume starting from a fixed time (e.g., 2022-01-01 00:00:00) with hourly observations start_time = pd.to_datetime('2022-01-01 00:00:00') # Set start time time_index = pd.date_range(start=start_time, periods=len(data), freq='h') # One observation per hour data['timestamp'] = time_index # Add timestamp to data data = data.set_index('timestamp') # Set timestamp as index # Infer data frequency freq = pd.infer_freq(data.index) if freq is None: freq = 'H' # Default to hourly feature_size = data.select_dtypes(include=[np.number]).shape[1] # Split data into training and testing sets train_length = int(len(data) * 0.8) train_df = data.iloc[:train_length] test_df = data #test_length = len(data) - train_length print(data.shape[1]) print(feature_size) # Create multivariate time series dataset dataset_train = ListDataset( [ { "start": train_df.index[0], "target": train_df[feature].values } for feature in data.columns if data[feature].dtype == np.number ], freq=freq ) dataset_test = ListDataset( [ { "start": test_df.index[train_length], "target": test_df[feature].values } for feature in test_df.columns if test_df[feature].dtype == np.number ], freq=freq ) grouper = MultivariateGrouper(max_target_dim=min(2000, int(feature_size))) dataset_train = grouper(dataset_train) dataset_test = grouper(dataset_test) return dataset_train, dataset_test, feature_size, freq` estimator = TimeGradEstimator( target_dim=target_dim, prediction_length=24, context_length=24, cell_type='GRU', input_size=5*target_dim + 1, freq=freq, loss_type='l2', scaling=True, diff_steps=100, beta_end=0.1, beta_schedule="linear", trainer=Trainer(device=device, epochs=1, learning_rate=1e-3, num_batches_per_epoch=100, batch_size=32,) ) — Reply to this email directly, view it on GitHub <#181>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAB7JB5GXXTQIH6YUS2GOT2O3SKNAVCNFSM6AAAAABWYONBZGVHI2DSMVQWIX3LMV43ASLTON2WKOZSHA2DANBVGMZDOMA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

youandyourself · 2025-02-15T03:14:55Z

could you kindly have a look at using the 0.7.0 branch and see this
notebook for a running example:
https://github.com/kashif/time_match/blob/main/Time-Grad-Solar.ipynb
…

thank you very much. but i met an error. Which version of GluonTS should I download?
ImportError: cannot import name 'TestData' from 'gluonts.dataset.split'
my version is 0.15.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I apply the TimeGrad model to my own CSV dataset? #181

How can I apply the TimeGrad model to my own CSV dataset? #181

youandyourself commented Feb 9, 2025 •

edited

Loading

kashif commented Feb 10, 2025 via email

youandyourself commented Feb 15, 2025

How can I apply the TimeGrad model to my own CSV dataset? #181

How can I apply the TimeGrad model to my own CSV dataset? #181

Comments

youandyourself commented Feb 9, 2025 • edited Loading

kashif commented Feb 10, 2025 via email

youandyourself commented Feb 15, 2025

youandyourself commented Feb 9, 2025 •

edited

Loading