The difference in data #5

JKZuo · 2021-01-25T13:09:12Z

About the data set. Each data file has these three named data: tensor，random_tensor，random_matrix.
What do these three stand for and is there any difference?

xinychen · 2021-01-29T10:04:41Z

Hello, thanks for this question! In each data folder, we give three data files:

tensor.mat is an M-by-I-by-J observation tensor;
random_tensor.mat is an M-by-I-by-J uniform distributed random tensor of range [0, 1];
random_matrix.mat is an M-by-I uniform distributed random matrix of range [0, 1].

Of course, you can remove both random_tensor.mat and random_matrix.mat and use the following codes instead:

import numpy as np

# Specify tensor size
M = 214 # Suppose 214 road segments
I = 61 # Suppose 61 days
J = 144 # Suppose 144 time slots per day

# Generate random matrix of size M-by-I
np.random.seed(1000) # Set random seed
random_matrix = np.random.rand(M, I)

# Or generate random tensor of size M-by-I-by-J
np.random.seed(1000) # Set random seed
random_tensor = np.random.rand(M, I, J)

Hope it can help you!

Best,
Xinyu

cq70605 · 2021-11-18T13:31:49Z

您好，我现在手上有一份数据集（传感器采集的数据，存在缺失值），想尝试用LRC-TNN来试试填充缺失值的效果，但跑出来结果似乎有点问题。
`import pandas as pd
from tqdm import tqdm
import time

r = 0.2
print('Missing rate = {}'.format(r))
missing_rate = r

file_path = ''
data_19111201984=pd.read_csv(file_path,encoding='gbk')
data_19111201984=data_19111201984[data_19111201984.day.isin([9,10,11,12,13,14])]
data_list = []

for day, day_df in tqdm(data_19111201984.groupby('day')):
data_list.append([day_df['温度'].values.tolist()])

dense_tensor = np.array([ten2mat(np.array(data_list), 2)])
print(dense_tensor.shape) # (1, 1440, 6) (sensor_id,num of data for one day,6 days)
dim1, dim2, dim3 = dense_tensor.shape
np.random.seed(1000)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)
print(sparse_tensor.shape)
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-4
theta = 30
epsilon = 1e-4
maxiter = 100
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
print()`

输出结果是：
`Missing rate = 0.2
100%|██████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 3009.55it/s]
(1, 1440, 6)
(1, 1440, 6)
Total iteration: 2
Tolerance: 0.0
Imputation MAPE: 1.0
Imputation RMSE: 5.55775

Running time: 0 seconds`

xinychen · 2021-11-18T13:43:33Z

Hello, thank you for this question! If your tensor data is of size 1-by-1440-by-6, this is really a matrix. Please consider a matrix completion model rather than tensor completion models.

Best regards,
Xinyu

cq70605 · 2021-11-18T14:05:21Z

Hello, thank you for this question! If your tensor data is of size 1-by-1440-by-6, this is really a matrix. Please consider a matrix completion model rather than tensor completion models.

Best regards, Xinyu

Thank you for your answer. Now I only use the data collected by just one sensor, so my tensor data is of size 1-by-1440-by-6. Does that mean if I use data collected by n sensors and get the tensor data of size n-by-1440-by-6, then I can consider a tensor completion model. By the way, is there any matrix model recommended.

xinychen · 2021-11-18T14:08:08Z

Yeah, you can consider tensor completion model, but in LRTC-TNN, theta should be smaller than min{n, 1440, 6}.

xinychen mentioned this issue May 17, 2021

数据集问题 #8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The difference in data #5

The difference in data #5

JKZuo commented Jan 25, 2021

xinychen commented Jan 29, 2021

cq70605 commented Nov 18, 2021

xinychen commented Nov 18, 2021

cq70605 commented Nov 18, 2021

xinychen commented Nov 18, 2021

The difference in data #5

The difference in data #5

Comments

JKZuo commented Jan 25, 2021

xinychen commented Jan 29, 2021

cq70605 commented Nov 18, 2021

xinychen commented Nov 18, 2021

cq70605 commented Nov 18, 2021

xinychen commented Nov 18, 2021