-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I apply the TimeGrad model to my own CSV dataset? #181
Comments
could you kindly have a look at using the 0.7.0 branch and see this
notebook for a running example:
https://github.com/kashif/time_match/blob/main/Time-Grad-Solar.ipynb
…On Sun, Feb 9, 2025 at 6:35 AM Chongyang Zhong ***@***.***> wrote:
After loading the dataset with pandas and converting it to ListDataset
format, I applied the MultivariateGrouper, but there seems to be an issue
with how to specify the target_dim and input_size for the TimeGrad model.
My dataset has only 5 variables and 1 timestamp, but it requires the
input_size to be 36.
here's my code :
`def load_data_from_csv(file_path, prediction_length=24):
data = pd.read_csv(file_path)
if 'datetime' in data.columns or 'date' in data.columns or 'Time(s)' in
data.columns or 'dt' in data.columns or '0' in data.columns:
time_column = data.columns[0]
data[time_column] = pd.to_datetime(data[time_column], errors='coerce')
data = data.set_index(time_column)
else:
# Assume starting from a fixed time (e.g., 2022-01-01 00:00:00) with
hourly observations
start_time = pd.to_datetime('2022-01-01 00:00:00') # Set start time
time_index = pd.date_range(start=start_time, periods=len(data), freq='h')
# One observation per hour
data['timestamp'] = time_index # Add timestamp to data
data = data.set_index('timestamp') # Set timestamp as index
# Infer data frequency
freq = pd.infer_freq(data.index)
if freq is None:
freq = 'H' # Default to hourly
feature_size = data.select_dtypes(include=[np.number]).shape[1]
# Split data into training and testing sets
train_length = int(len(data) * 0.8)
train_df = data.iloc[:train_length]
test_df = data
#test_length = len(data) - train_length
print(data.shape[1])
print(feature_size)
# Create multivariate time series dataset
dataset_train = ListDataset(
[
{
"start": train_df.index[0],
"target": train_df[feature].values
} for feature in data.columns if data[feature].dtype == np.number
],
freq=freq
)
dataset_test = ListDataset(
[
{
"start": test_df.index[train_length],
"target": test_df[feature].values
} for feature in test_df.columns if test_df[feature].dtype == np.number
],
freq=freq
)
grouper = MultivariateGrouper(max_target_dim=min(2000, int(feature_size)))
dataset_train = grouper(dataset_train)
dataset_test = grouper(dataset_test)
return dataset_train, dataset_test, feature_size, freq`
estimator = TimeGradEstimator(
target_dim=target_dim,
prediction_length=24,
context_length=24,
cell_type='GRU',
input_size=5*target_dim + 1,
freq=freq,
loss_type='l2',
scaling=True,
diff_steps=100,
beta_end=0.1,
beta_schedule="linear",
trainer=Trainer(device=device,
epochs=1,
learning_rate=1e-3,
num_batches_per_epoch=100,
batch_size=32,)
)
—
Reply to this email directly, view it on GitHub
<#181>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAB7JB5GXXTQIH6YUS2GOT2O3SKNAVCNFSM6AAAAABWYONBZGVHI2DSMVQWIX3LMV43ASLTON2WKOZSHA2DANBVGMZDOMA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
thank you very much. but i met an error. Which version of GluonTS should I download? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
After loading the dataset with pandas and converting it to ListDataset format, I applied the MultivariateGrouper, but there seems to be an issue with how to specify the target_dim and input_size for the TimeGrad model.
My dataset has only 5 variables and 1 timestamp, but it requires the input_size to be 36.
here's my code :
`def load_data_from_csv(file_path, prediction_length=24):
data = pd.read_csv(file_path)
if 'datetime' in data.columns or 'date' in data.columns or 'Time(s)' in data.columns or 'dt' in data.columns or '0' in data.columns:
time_column = data.columns[0]
data[time_column] = pd.to_datetime(data[time_column], errors='coerce')
data = data.set_index(time_column)
else:
# Assume starting from a fixed time (e.g., 2022-01-01 00:00:00) with hourly observations
start_time = pd.to_datetime('2022-01-01 00:00:00') # Set start time
time_index = pd.date_range(start=start_time, periods=len(data), freq='h') # One observation per hour
data['timestamp'] = time_index # Add timestamp to data
data = data.set_index('timestamp') # Set timestamp as index
estimator = TimeGradEstimator(
target_dim=target_dim,
prediction_length=24,
context_length=24,
cell_type='GRU',
input_size=5*target_dim + 1,
freq=freq,
loss_type='l2',
scaling=True,
diff_steps=100,
beta_end=0.1,
beta_schedule="linear",
trainer=Trainer(device=device,
epochs=1,
learning_rate=1e-3,
num_batches_per_epoch=100,
batch_size=32,)
)`
The text was updated successfully, but these errors were encountered: