Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The difference in data #5

Open
JKZuo opened this issue Jan 25, 2021 · 5 comments
Open

The difference in data #5

JKZuo opened this issue Jan 25, 2021 · 5 comments

Comments

@JKZuo
Copy link

JKZuo commented Jan 25, 2021

About the data set. Each data file has these three named data: tensor,random_tensor,random_matrix.
What do these three stand for and is there any difference?

@xinychen
Copy link
Owner

Hello, thanks for this question! In each data folder, we give three data files:

  • tensor.mat is an M-by-I-by-J observation tensor;
  • random_tensor.mat is an M-by-I-by-J uniform distributed random tensor of range [0, 1];
  • random_matrix.mat is an M-by-I uniform distributed random matrix of range [0, 1].

Of course, you can remove both random_tensor.mat and random_matrix.mat and use the following codes instead:

import numpy as np

# Specify tensor size
M = 214 # Suppose 214 road segments
I = 61 # Suppose 61 days
J = 144 # Suppose 144 time slots per day

# Generate random matrix of size M-by-I
np.random.seed(1000) # Set random seed
random_matrix = np.random.rand(M, I)

# Or generate random tensor of size M-by-I-by-J
np.random.seed(1000) # Set random seed
random_tensor = np.random.rand(M, I, J)

Hope it can help you!

Best,
Xinyu

@cq70605
Copy link

cq70605 commented Nov 18, 2021

您好,我现在手上有一份数据集(传感器采集的数据,存在缺失值),想尝试用LRC-TNN来试试填充缺失值的效果,但跑出来结果似乎有点问题。
`import pandas as pd
from tqdm import tqdm
import time

r = 0.2
print('Missing rate = {}'.format(r))
missing_rate = r

file_path = ''
data_19111201984=pd.read_csv(file_path,encoding='gbk')
data_19111201984=data_19111201984[data_19111201984.day.isin([9,10,11,12,13,14])]
data_list = []

for day, day_df in tqdm(data_19111201984.groupby('day')):
data_list.append([day_df['温度'].values.tolist()])

dense_tensor = np.array([ten2mat(np.array(data_list), 2)])
print(dense_tensor.shape) # (1, 1440, 6) (sensor_id,num of data for one day,6 days)
dim1, dim2, dim3 = dense_tensor.shape
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)
print(sparse_tensor.shape)
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-4
theta = 30
epsilon = 1e-4
maxiter = 100
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
print()`

输出结果是:
`Missing rate = 0.2
100%|██████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 3009.55it/s]
(1, 1440, 6)
(1, 1440, 6)
Total iteration: 2
Tolerance: 0.0
Imputation MAPE: 1.0
Imputation RMSE: 5.55775

Running time: 0 seconds`

@xinychen
Copy link
Owner

Hello, thank you for this question! If your tensor data is of size 1-by-1440-by-6, this is really a matrix. Please consider a matrix completion model rather than tensor completion models.

Best regards,
Xinyu

@cq70605
Copy link

cq70605 commented Nov 18, 2021

Hello, thank you for this question! If your tensor data is of size 1-by-1440-by-6, this is really a matrix. Please consider a matrix completion model rather than tensor completion models.

Best regards, Xinyu

Thank you for your answer. Now I only use the data collected by just one sensor, so my tensor data is of size 1-by-1440-by-6. Does that mean if I use data collected by n sensors and get the tensor data of size n-by-1440-by-6, then I can consider a tensor completion model. By the way, is there any matrix model recommended.

@xinychen
Copy link
Owner

Yeah, you can consider tensor completion model, but in LRTC-TNN, theta should be smaller than min{n, 1440, 6}.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants