Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I apply the TimeGrad model to my own CSV dataset? #181

Open
youandyourself opened this issue Feb 9, 2025 · 2 comments
Open

How can I apply the TimeGrad model to my own CSV dataset? #181

youandyourself opened this issue Feb 9, 2025 · 2 comments

Comments

@youandyourself
Copy link

youandyourself commented Feb 9, 2025

After loading the dataset with pandas and converting it to ListDataset format, I applied the MultivariateGrouper, but there seems to be an issue with how to specify the target_dim and input_size for the TimeGrad model.
My dataset has only 5 variables and 1 timestamp, but it requires the input_size to be 36.
here's my code :

`def load_data_from_csv(file_path, prediction_length=24):
data = pd.read_csv(file_path)
if 'datetime' in data.columns or 'date' in data.columns or 'Time(s)' in data.columns or 'dt' in data.columns or '0' in data.columns:
time_column = data.columns[0]
data[time_column] = pd.to_datetime(data[time_column], errors='coerce')
data = data.set_index(time_column)
else:
# Assume starting from a fixed time (e.g., 2022-01-01 00:00:00) with hourly observations
start_time = pd.to_datetime('2022-01-01 00:00:00') # Set start time
time_index = pd.date_range(start=start_time, periods=len(data), freq='h') # One observation per hour
data['timestamp'] = time_index # Add timestamp to data
data = data.set_index('timestamp') # Set timestamp as index

# Infer data frequency
freq = pd.infer_freq(data.index)
if freq is None:
    freq = 'H'  # Default to hourly
feature_size = data.select_dtypes(include=[np.number]).shape[1]
# Split data into training and testing sets
train_length = int(len(data) * 0.8)
train_df = data.iloc[:train_length]
test_df = data
#test_length = len(data) - train_length
print(data.shape[1])
print(feature_size)
# Create multivariate time series dataset
dataset_train = ListDataset(
    [
        {
            "start": train_df.index[0], 
            "target": train_df[feature].values
        } for feature in data.columns if data[feature].dtype == np.number
    ],
    freq=freq
)

dataset_test = ListDataset(
    [
        {
            "start": test_df.index[train_length], 
            "target": test_df[feature].values
        } for feature in test_df.columns if test_df[feature].dtype == np.number
    ],
    freq=freq
)
grouper = MultivariateGrouper(max_target_dim=min(2000, int(feature_size)))
dataset_train = grouper(dataset_train)
dataset_test = grouper(dataset_test)
return dataset_train, dataset_test, feature_size, freq

estimator = TimeGradEstimator(
target_dim=target_dim,
prediction_length=24,
context_length=24,
cell_type='GRU',
input_size=5*target_dim + 1,
freq=freq,
loss_type='l2',
scaling=True,
diff_steps=100,
beta_end=0.1,
beta_schedule="linear",
trainer=Trainer(device=device,
epochs=1,
learning_rate=1e-3,
num_batches_per_epoch=100,
batch_size=32,)
)`

@kashif
Copy link
Collaborator

kashif commented Feb 10, 2025 via email

@youandyourself
Copy link
Author

could you kindly have a look at using the 0.7.0 branch and see this
notebook for a running example:
https://github.com/kashif/time_match/blob/main/Time-Grad-Solar.ipynb

thank you very much. but i met an error. Which version of GluonTS should I download?
ImportError: cannot import name 'TestData' from 'gluonts.dataset.split'
my version is 0.15.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants