RDF_on_Cloud

This repository contains the scripts, datasets, and results for an experimental project exploring RDF (Resource Description Framework) implementation in a cloud environment. The objective is to test the performance, scalability, and efficiency of processing RDF datasets in the cloud with single-machine system and multiple-machine system, under various conditions including loading on very large dataset and high-volume querying.

Repository Structure

The repository is organized into the following main directories:

datasets: Contains RDF datasets used in the experiment. These datasets vary in size and complexity to test scalability and performance under different conditions.
result: Stores the results of all tests conducted during the experiment. This includes raw output data.
scripts: Includes all the scripts used for data processing, loading RDF datasets into the cloud, querying, and conducting stress tests.

Overview of the Experiment

The project aims to evaluate the following aspects of RDF processing in the cloud:

Loading Efficiency: Measuring the time and resources required to load RDF datasets into the cloud environment.
Querying Performance: Testing the efficiency of RDF queries on datasets of varying sizes.
Stress Testing: Simulating high-volume querying using multiprocessing to represent real-world clients accessing the database simultaneously.

How to Use This Repository

Prerequisites

A cloud platform setup (GCP, AWS, Azure, etc.) with ubuntu 20.04
Python 3.8+ with necessary libraries listed in scripts/requirements.txt

Key Findings

System-wise:

Several systems are outdated, with limited documentation and inactive communities, making it difficult to find resources for proper system setup.
During our experimentation with various instance configurations, we encountered issues with systems built from machine images, often requiring a complete system rebuild

Performance-wise:

Massive computation resources needed for preparing the data, and additional storage space is needs while converting raw data into system compatible strucutre.

Future Work

Current open-source systems present limitations, leaving room for future research:

Single-machine systems: While we focused on graph-based and relational databases, future work could explore systems like Hadoop, Spark, and cloud-based architectures.
Multi-machine systems: We did not fully test open-source multi-machine systems; future research should include comprehensive evaluations of these platforms.
Other systems: Neo4j, a non-RDF system, outperformed RDF-based systems in some cases, suggesting that future studies could investigate other systems that may offer better performance.

Related Paper and Authors

An Experimental Comparison of RDF Systems on Cloud

Yi Ding - [email protected] Hualong Lin - [email protected] Zhengyi Yang - [email protected] Dong Wen - [email protected] Xiaoyang Wang - [email protected] Wenjie Zhang - [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
datasets		datasets
results		results
scripts		scripts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RDF_on_Cloud

Repository Structure

Overview of the Experiment

How to Use This Repository

Key Findings

Future Work

Related Paper and Authors

About

Releases

Packages

Languages

License

unswdb/RDF_on_Cloud

Folders and files

Latest commit

History

Repository files navigation

RDF_on_Cloud

Repository Structure

Overview of the Experiment

How to Use This Repository

Key Findings

Future Work

Related Paper and Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages