Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mkv #20

Merged
merged 81 commits into from
Sep 18, 2024
Merged

Mkv #20

Changes from 1 commit
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
bbb1644
basic structure
KeplerC Aug 19, 2024
9309045
Refactor Trajectory class to improve frame encoding and add support f…
KeplerC Aug 20, 2024
d394d67
fix loading
KeplerC Aug 20, 2024
daaaa57
doesnt work, consider migrate robot data loader over
KeplerC Aug 20, 2024
70dc889
static works
KeplerC Aug 20, 2024
5b0b462
feat: Improve frame encoding and add support for different encodings …
KeplerC Aug 21, 2024
9bf9eea
decode
KeplerC Aug 21, 2024
9d66f32
h5 cache
KeplerC Aug 21, 2024
c389804
figure out the issue of remuxing due to the context
KeplerC Aug 21, 2024
d875a0a
it works without h264
KeplerC Aug 21, 2024
e9f051e
fix the decoding bug and silient logs
KeplerC Aug 21, 2024
32d3dac
Refactor Trajectory class to improve frame encoding and add support f…
KeplerC Aug 21, 2024
945ddb0
feat: Add support for pre-initialized H.264 video streams in Trajecto…
KeplerC Aug 21, 2024
a3e2c34
Refactor Trajectory class to remove commented code and improve code r…
KeplerC Aug 21, 2024
c7c9284
Refactor Trajectory class to remove commented code and improve code r…
KeplerC Aug 21, 2024
b627e75
Refactor Trajectory class to improve code readability and remove comm…
KeplerC Aug 21, 2024
00d3d5e
init robot data loader structure
KeplerC Aug 22, 2024
5cfa061
convert from openx
KeplerC Aug 22, 2024
0e5226f
Refactor Trajectory class to improve frame encoding and add support f…
KeplerC Aug 22, 2024
e41675a
feat: Add HDF5Loader to support loading HDF5 files in fog_x/loader/__…
KeplerC Aug 22, 2024
c34a7c0
add h5 accessing
KeplerC Aug 22, 2024
239c230
code formatting
KeplerC Aug 22, 2024
2f10e68
Refactor Trajectory class to remove commented code and improve code r…
KeplerC Aug 25, 2024
8f40ff8
benchmark code, missing container loader info
KeplerC Aug 25, 2024
301f385
Refactor Trajectory class to remove commented code and improve code r…
KeplerC Aug 25, 2024
4545447
Refactor Trajectory class to improve code readability and remove comm…
KeplerC Aug 25, 2024
d594e39
Refactor Trajectory class to add optional cache path for storing cach…
KeplerC Aug 25, 2024
430c73b
Refactor Trajectory class to clear cache directory and improve code r…
KeplerC Aug 25, 2024
0220880
Refactor Trajectory class for improved code readability and removal o…
KeplerC Aug 25, 2024
e280615
Refactor Trajectory class to fix cache directory path
KeplerC Aug 25, 2024
87fecf1
Refactor Trajectory class to add HDF5Handler for converting data to H…
KeplerC Aug 25, 2024
b4254e8
save stream to a different file doesnt work yet
KeplerC Aug 25, 2024
058fd5b
Refactor VLALoader class to update file path for loading VLA data
KeplerC Aug 25, 2024
1a3bee4
Refactor DatasetHandler to clear OS cache after loading data
KeplerC Aug 26, 2024
a8a05ef
Refactor Trajectory class to improve code readability and add lazy lo…
KeplerC Aug 26, 2024
c866200
Refactor prepare function for improved code readability and consistency
KeplerC Aug 26, 2024
6e9c5bf
Refactor VLAHandler.measure_loading_time() to recursively load h5 dat…
KeplerC Aug 26, 2024
1cfcc27
Refactor Trajectory class to improve code readability and add lazy lo…
KeplerC Aug 26, 2024
975c4e5
Refactor Trajectory class to improve code readability and add lazy lo…
KeplerC Aug 26, 2024
e83e6da
Refactor RLDSLoader class to improve code readability and add lazy lo…
KeplerC Aug 26, 2024
4ba6453
fix tf record's benchmark to read the data
KeplerC Aug 26, 2024
2c4d797
support no cache baseline
KeplerC Aug 26, 2024
1cb9ee5
add visualization results
KeplerC Aug 26, 2024
d14c4ab
add DL dataset
KeplerC Aug 27, 2024
d430709
add basic support for octo integration
KeplerC Aug 27, 2024
3467b3f
fix a bug in loading
KeplerC Aug 27, 2024
b79068a
octo dataloader working
KeplerC Aug 27, 2024
5600985
Refactor RLDSLoader and Trajectory classes to improve code readabilit…
KeplerC Aug 27, 2024
1f445df
Refactor RLDSLoader and Trajectory classes to improve code readabilit…
KeplerC Aug 27, 2024
0333a09
open x dataset converter streamline
KeplerC Aug 30, 2024
d5c7332
support both lossy and lossless compression
KeplerC Aug 30, 2024
1567b44
chore: Update DEFAULT_DATASET_NAMES in openx.py and add lossy_compres…
KeplerC Aug 30, 2024
3953f7f
chore: Update vla_to_h5.py to process VLA data and convert it to HDF5…
KeplerC Aug 30, 2024
3fcfc43
Refactor RLDSLoader and Trajectory classes to improve code readabilit…
KeplerC Aug 30, 2024
c0a840f
Refactor RLDSLoader and Trajectory classes for improved code readabil…
KeplerC Aug 30, 2024
8e0b188
fix the logging; before randomly access
KeplerC Aug 30, 2024
a0e813a
add random loading
KeplerC Aug 30, 2024
143a4fe
async write to cache
KeplerC Aug 31, 2024
b13ae79
support dataloader, still has bugs
KeplerC Aug 31, 2024
305d8b6
support lerobot
KeplerC Aug 31, 2024
604486c
Refactor RLDSLoader and Trajectory classes to improve code readabilit…
KeplerC Aug 31, 2024
466c5cb
write as pytorch dataloader
KeplerC Aug 31, 2024
eccf7b2
multi proces to sped up
KeplerC Aug 31, 2024
e4913b1
chore: Refactor Trajectory class for improved code readability and ef…
KeplerC Aug 31, 2024
01841ee
Refactor Trajectory and VLAIterableDataset classes for improved code …
KeplerC Aug 31, 2024
5f5d328
fix rlds and add debug
KeplerC Sep 1, 2024
c4d7150
Refactor DatasetHandler class for improved code readability and perfo…
KeplerC Sep 1, 2024
fcd8f2d
Refactor evaluation script for improved code organization and perform…
KeplerC Sep 2, 2024
3516491
Refactor evaluation script for improved code organization and perform…
KeplerC Sep 2, 2024
d264839
Refactor VLA loader to use PyTorch dataloader for improved code reada…
KeplerC Sep 2, 2024
79117e6
Refactor loader modules to improve code organization and performance
KeplerC Sep 2, 2024
0670995
Refactor LeRobotLoader to load one episode at a time for improved per…
KeplerC Sep 2, 2024
e618571
Refactor LeRobotLoader to load one episode at a time for improved per…
KeplerC Sep 2, 2024
066abe9
Refactor RLDSLoader and HDF5Loader to use smaller shuffle buffer size…
KeplerC Sep 2, 2024
74d6c38
fix bugs that prevents rlds and lr to move forward after iterating th…
KeplerC Sep 2, 2024
47bce7e
fix size issue
KeplerC Sep 3, 2024
f14a943
backward support octo and rlds conversion
KeplerC Sep 3, 2024
04f3b4a
fix bug of resizing width
KeplerC Sep 7, 2024
0a0542d
data, etc.
KeplerC Sep 10, 2024
a35a695
submitted version
KeplerC Sep 18, 2024
3868a85
Merge branch 'mkv' of https://github.com/BerkeleyAutomation/fog_x int…
KeplerC Sep 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
add random loading
  • Loading branch information
KeplerC committed Aug 30, 2024
commit a0e813a5546490d5699f36475014f9da811f0369
147 changes: 117 additions & 30 deletions benchmarks/openx.py
Original file line number Diff line number Diff line change
@@ -13,9 +13,9 @@
# Constants
DEFAULT_EXP_DIR = "/mnt/data/fog_x/"
DEFAULT_NUMBER_OF_TRAJECTORIES = -1 # Load all trajectories
DEFAULT_DATASET_NAMES = ["nyu_door_opening_surprising_effectiveness", "berkeley_cable_routing", "berkeley_autolab_ur5", "bridge"]
#["nyu_door_opening_surprising_effectiveness"]
CACHE_DIR = "/mnt/data/fog_x/cache/"
# DEFAULT_DATASET_NAMES = ["nyu_door_opening_surprising_effectiveness", "berkeley_cable_routing", "berkeley_autolab_ur5", "bridge"]
DEFAULT_DATASET_NAMES = ["nyu_door_opening_surprising_effectiveness"]
CACHE_DIR = "/tmp/fog_x/cache/"
DEFAULT_LOG_FREQUENCY = 20

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
@@ -103,6 +103,25 @@ def measure_loading_time(self):
print(f"RLDS - Loaded {i} trajectories, Time: {elapsed_time:.2f} s")
return time.time() - start_time

def measure_random_loading_time(self, num_loads):
start_time = time.time()
loader = RLDSLoader(self.dataset_dir, split="train")
dataset_size = len(loader)
num_loads = num_loads * dataset_size

loader.ds = loader.ds.shuffle(buffer_size=num_loads)
# shuffled_ds = shuffled_ds.take(num_loads)

for i, data in enumerate(loader):
self._recursively_load_data(data)

elapsed_time = time.time() - start_time
self.write_result(f"RLDS-RandomLoad", elapsed_time, i)
if i % self.log_frequency == 0:
print(f"RLDS-RandomLoad - Loaded {i} random trajectories, Time: {elapsed_time:.2f} s")

return time.time() - start_time

class VLAHandler(DatasetHandler):
def __init__(self, exp_dir, dataset_name, num_trajectories, log_frequency=DEFAULT_LOG_FREQUENCY):
super().__init__(exp_dir, dataset_name, num_trajectories, dataset_type="vla", log_frequency=log_frequency)
@@ -124,6 +143,26 @@ def measure_loading_time(self, mode="no_cache"):
print(f"Failed to load data: {e}")
return time.time() - start_time

def measure_random_loading_time(self, num_loads):
start_time = time.time()
loader = VLALoader(self.dataset_dir, cache_dir=CACHE_DIR)
dataset_size = len(loader)
num_loads = num_loads * dataset_size

for i in range(num_loads):
random_index = np.random.randint(0, dataset_size)
data = loader[random_index]
try:
self._recursively_load_data(data.load(mode="cache"))
elapsed_time = time.time() - start_time
self.write_result(f"VLA-RandomLoad", elapsed_time, i + 1)
if (i + 1) % self.log_frequency == 0:
print(f"VLA-RandomLoad - Loaded {i + 1} random trajectories, Time: {elapsed_time:.2f} s")
except Exception as e:
print(f"Failed to load data: {e}")

return time.time() - start_time

class FFV1Handler(DatasetHandler):
def __init__(self, exp_dir, dataset_name, num_trajectories, log_frequency=DEFAULT_LOG_FREQUENCY):
super().__init__(exp_dir, dataset_name, num_trajectories, dataset_type="ffv1", log_frequency=log_frequency)
@@ -145,6 +184,26 @@ def measure_loading_time(self, mode="no_cache"):
print(f"Failed to load data: {e}")
return time.time() - start_time

def measure_random_loading_time(self, num_loads):
start_time = time.time()
loader = VLALoader(self.dataset_dir, cache_dir=CACHE_DIR)
dataset_size = len(loader)
num_loads = num_loads * dataset_size

for i in range(num_loads):
random_index = np.random.randint(0, dataset_size)
data = loader[random_index]
try:
self._recursively_load_data(data.load(mode="cache"))
elapsed_time = time.time() - start_time
self.write_result(f"FFV1-RandomLoad", elapsed_time, i + 1)
if (i + 1) % self.log_frequency == 0:
print(f"FFV1-RandomLoad - Loaded {i + 1} random trajectories, Time: {elapsed_time:.2f} s")
except Exception as e:
print(f"Failed to load data: {e}")

return time.time() - start_time


class HDF5Handler(DatasetHandler):
def __init__(self, exp_dir, dataset_name, num_trajectories, log_frequency=DEFAULT_LOG_FREQUENCY):
@@ -164,6 +223,24 @@ def measure_loading_time(self):
print(f"HDF5 - Loaded {i} trajectories, Time: {elapsed_time:.2f} s")
return time.time() - start_time

def measure_random_loading_time(self, num_loads):
start_time = time.time()
loader = HDF5Loader(path=os.path.join(self.dataset_dir, "*.h5"))
dataset_size = len(loader)
num_loads = num_loads * dataset_size

for i in range(num_loads):
random_index = np.random.randint(0, dataset_size)
data = loader[random_index]
self._recursively_load_data(data)

elapsed_time = time.time() - start_time
self.write_result(f"HDF5-RandomLoad", elapsed_time, i + 1)
if (i + 1) % self.log_frequency == 0:
print(f"HDF5-RandomLoad - Loaded {i + 1} random trajectories, Time: {elapsed_time:.2f} s")

return time.time() - start_time

def prepare(args):
# Clear the cache directory
if os.path.exists(CACHE_DIR):
@@ -194,39 +271,48 @@ def evaluation(args):
handler.clear_os_cache()

avg_traj_size = handler.measure_average_trajectory_size()
loading_time = handler.measure_loading_time()
# loading_time = handler.measure_loading_time()

# new_results.append({
# 'Dataset': dataset_name,
# 'Format': handler.dataset_type.upper(),
# 'AverageTrajectorySize(MB)': avg_traj_size,
# 'LoadingTime(s)': loading_time,
# })

# print(f"{handler.dataset_type.upper()} - Average Trajectory Size: {avg_traj_size:.2f} MB, Loading Time: {loading_time:.2f} s")

random_load_time = handler.measure_random_loading_time(args.random_loads)
new_results.append({
'Dataset': dataset_name,
'Format': handler.dataset_type.upper(),
'Format': f"{handler.dataset_type.upper()}-RandomLoad",
'AverageTrajectorySize(MB)': avg_traj_size,
'LoadingTime(s)': loading_time,
'LoadingTime(s)': random_load_time,
})
print(f"{handler.dataset_type.upper()}-RandomLoad - Average Trajectory Size: {avg_traj_size:.2f} MB, Loading Time: {random_load_time:.2f} s")

# # Additional VLA measurements
# vla_handler = handlers[1]
# vla_handler.clear_cache()
# vla_handler.clear_os_cache()
# cold_cache_time = vla_handler.measure_loading_time(mode="cache")
# hot_cache_time = vla_handler.measure_loading_time(mode="cache")

# new_results.append({
# 'Dataset': dataset_name,
# 'Format': 'VLA-ColdCache',
# 'AverageTrajectorySize(MB)': avg_traj_size,
# 'LoadingTime(s)': cold_cache_time,
# })

print(f"{handler.dataset_type.upper()} - Average Trajectory Size: {avg_traj_size:.2f} MB, Loading Time: {loading_time:.2f} s")

# Additional VLA measurements
vla_handler = handlers[1]
vla_handler.clear_cache()
vla_handler.clear_os_cache()
cold_cache_time = vla_handler.measure_loading_time(mode="cache")
hot_cache_time = vla_handler.measure_loading_time(mode="cache")

new_results.append({
'Dataset': dataset_name,
'Format': 'VLA-ColdCache',
'AverageTrajectorySize(MB)': avg_traj_size,
'LoadingTime(s)': cold_cache_time,
})

new_results.append({
'Dataset': dataset_name,
'Format': 'VLA-HotCache',
'AverageTrajectorySize(MB)': avg_traj_size,
'LoadingTime(s)': hot_cache_time,
})
print(f"VLA-ColdCache - Average Trajectory Size: {avg_traj_size:.2f} MB, Loading Time: {cold_cache_time:.2f} s")
print(f"VLA-HotCache - Average Trajectory Size: {avg_traj_size:.2f} MB, Loading Time: {hot_cache_time:.2f} s")
# new_results.append({
# 'Dataset': dataset_name,
# 'Format': 'VLA-HotCache',
# 'AverageTrajectorySize(MB)': avg_traj_size,
# 'LoadingTime(s)': hot_cache_time,
# })
# print(f"VLA-ColdCache - Average Trajectory Size: {avg_traj_size:.2f} MB, Loading Time: {cold_cache_time:.2f} s")
# print(f"VLA-HotCache - Average Trajectory Size: {avg_traj_size:.2f} MB, Loading Time: {hot_cache_time:.2f} s")

# Combine existing and new results
all_results = existing_results + new_results
@@ -243,6 +329,7 @@ def evaluation(args):
parser.add_argument("--dataset_names", nargs="+", default=DEFAULT_DATASET_NAMES, help="List of dataset names to evaluate.")
parser.add_argument("--prepare", action="store_true", help="Prepare the datasets before evaluation.")
parser.add_argument("--log_frequency", type=int, default=DEFAULT_LOG_FREQUENCY, help="Frequency of logging results.")
parser.add_argument("--random_loads", type=int, default=2, help="Number of random loads to perform for each loader.")
args = parser.parse_args()

if args.prepare:
7 changes: 6 additions & 1 deletion fog_x/loader/hdf5.py
Original file line number Diff line number Diff line change
@@ -30,6 +30,9 @@ def __init__(self, path, split = None):
self.index = 0
self.files = glob.glob(self.path, recursive=True)

def __getitem__(self, idx):
return self._read_hdf5(self.files[idx])

def _read_hdf5(self, data_path):

with h5py.File(data_path, "r") as f:
@@ -52,4 +55,6 @@ def __next__(self):
self.index += 1
return self._read_hdf5(file_path)
raise StopIteration


def __len__(self):
return len(self.files)
11 changes: 10 additions & 1 deletion fog_x/loader/rlds.py
Original file line number Diff line number Diff line change
@@ -23,6 +23,12 @@ def __init__(self, path, split):
self.index = 0

def __len__(self):
try:
import tensorflow as tf
import tensorflow_datasets as tfds
except ImportError:
raise ImportError("Please install tensorflow and tensorflow_datasets to use rlds loader")

return tf.data.experimental.cardinality(self.ds).numpy()

def __iter__(self):
@@ -48,4 +54,7 @@ def __next__(self):
except StopIteration:
self.index = 0
self.iterator = iter(self.ds)
raise StopIteration
raise StopIteration

def __getitem__(self, idx):
return next(iter(self.ds.skip(idx).take(1)))
7 changes: 5 additions & 2 deletions fog_x/loader/vla.py
Original file line number Diff line number Diff line change
@@ -50,5 +50,8 @@ def __next__(self):
def __len__(self):
return len(self.files)

def peak(self, index):
return self._read_vla(self.files[index])
def __getitem__(self, index):
return self._read_vla(self.files[index])

def peak(self):
return self._read_vla(self.files[self.index])
4 changes: 2 additions & 2 deletions openx_to_vla.sh
Original file line number Diff line number Diff line change
@@ -31,7 +31,7 @@

# nyu_door_opening_surprising_effectiveness dataset
# python examples/openx_loader.py --data_dir /home/kych/datasets/rtx --dataset_name nyu_door_opening_surprising_effectiveness --destination_dir /mnt/data/fog_x/vla --version 0.1.0 --split train[0:] --max_workers 4
# python examples/openx_loader.py --data_dir /home/kych/datasets/rtx --dataset_name nyu_door_opening_surprising_effectiveness --destination_dir /mnt/data/fog_x/ffv1 --version 0.1.0 --split train[0:] --max_workers 4 --lossless
python examples/openx_loader.py --data_dir /home/kych/datasets/rtx --dataset_name nyu_door_opening_surprising_effectiveness --destination_dir /mnt/data/fog_x/ffv1 --version 0.1.0 --split train[0:] --max_workers 4 --lossless

# python examples/openx_loader.py --data_dir /home/kych/datasets/rtx --dataset_name bridge --destination_dir /mnt/data/fog_x/vla --version 0.1.0 --split train[0:] --max_workers 4
python examples/openx_loader.py --data_dir /home/kych/datasets/rtx --dataset_name bridge --destination_dir /mnt/data/fog_x/ffv1 --version 0.1.0 --split train[0:] --max_workers 4 --lossless
# python examples/openx_loader.py --data_dir /home/kych/datasets/rtx --dataset_name bridge --destination_dir /mnt/data/fog_x/ffv1 --version 0.1.0 --split train[0:] --max_workers 4 --lossless