System Requirements

In this section, we discuss the hardware, software, and network requirements for SnappyData.

Hardware Requirements

SnappyData turns Apache Spark into a mission-critical, elastic scalable in-memory data store. This allows users to run Spark workloads and classic database workloads on SnappyData.

Memory: SnappyData works well with anywhere from 8GB of memory to TBs of memory. While exact memory requirements depend on the end user application, we recommend allocating no more than 75% of the memory to SnappyData. We recommend using a machine with at least 8GB of RAM when working with SnappyData.

!!!Note It is recommended to have a minimum of 8GB memory for server-grade machines.

CPU Cores: SnappyData is a highly multi-threaded system and can take advantage of CPU cores to deliver higher throughput. It has been tested with multi-core multi-CPU machines. We recommend using machines with at least 16 cores when working with SnappyData. The degree of parallelism you can achieve with SnappyData directly depends on the number of cores, as higher core machines perform better than lower core machines.

Network: SnappyData is a clustered scale-out in-memory data store and both jobs and queries use the network extensively to complete their job. Since data is mostly available in-memory, queries and jobs typically get CPU and/or network bound. We recommend running SnappyData on at least a 1GB network for testing and use a 10GB network for production scenarios.

Disk: SnappyData overflows data to local disk files and tables can be configured for persistence. We recommend using flash storage for optimal performance for SnappyData shared nothing persistence. Data can be saved out to stores like HDFS and S3 using SnappyData DataFrame APIs.

Operating Systems Supported

Operating System	Version
Red Hat Enterprise Linux	RHEL 6, 7 and later (Minimum recommended kernel version: 3.10.0-693.2.2.el7.x86_64)
Ubuntu	Ubuntu Server 14.04 and later
CentOS	CentOS 6, 7 and later (Minimum recommended kernel version: 3.10.0-693.2.2.el7.x86_64)

Host Machine Requirements

Requirements for each host:

A supported Oracle Java SE 8 JDK installation. Required minimum version: 1.8.0_144 (see SNAP-2017, SNAP-1999, SNAP-1911, SNAP-1375 for crashes reported with earlier versions). Recommended is the latest stable release version.
Alternatively equivalent Java version >= 1.8.0_144 from OpenJDK distributions (Linux vendor build, AdoptOpenJDK or equivalent). Recommended is the latest stable release version. A full JDK installation is required.
The latest version of Bash shell.
A file system that supports long file names.
TCP/IP.
System clock set to the correct time.
For each Linux host, the hostname and host files must be properly configured. See the system manual pages for hostname and host settings.
For each Linux host, configure the swap to be in the range of 16-64GB to allow for swapping out of unused pages.
Time synchronization service such as Network Time Protocol (NTP).
cURL must be installed on lead nodes for snappy scripts to work. On Red Hat based systems it can be installed using sudo yum install curl while on Debian/Ubuntu based systems, you can install using sudo apt-get install curl command.

!!! Note * For troubleshooting, you must run a time synchronization service on all hosts. Synchronized time stamps allow you to merge log messages from different hosts, for an accurate chronological history of a distributed run.

* If you deploy SnappyData on a virtualized host, consult the documentation provided with the platform, for system requirements and recommended best practices, for running Java and latency-sensitive workloads.

Python Integration using pyspark

The Python pyspark module has the same requirements as in Apache Spark. The numpy package is required by many modules of pyspark including the examples shipped with SnappyData. On recent Red Hat based systems, it can be installed using sudo yum install numpy or sudo yum install python2-numpy commands. Whereas, on Debian/Ubuntu based systems, you can install using the sudo apt-get install python-numpy command.
Some of the python APIs can use SciPy to optimize some algorithms (in linalg package), and some others need Pandas. On recent Red Hat based systems SciPy can be installed using sudo yum install scipy command. Whereas, on Debian/Ubuntu based systems you can install using the sudo apt-get install python-scipy command. Likewise, Pandas on recent Red Hat based systems can be installed using sudo yum installed python-pandas command, while on Debian/Ubuntu based systems it can be installed using the sudo apt-get install python-pandas command.
On Red Hat based systems, some of the above Python packages may be available only after enabling the EPEL repository. If these are not available in the repositories for your OS version or if using EPEL is not an option, then you can use pip. Refer to the respective project documentation for details and alternative options such as Anaconda.
Alternatively Python 3 can also be used but the version should be <= 3.7. Consult your distribution documentation to install the equivalent python 3 packages for numpy, scipy and pandas. Or you can use conda/mamba to set up the required python environment.

Filesystem Type for Linux Platforms

For optimum disk-store performance, we recommend the use of local filesystem for disk data storage and not over NFS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

system_requirements.md

system_requirements.md

System Requirements

Hardware Requirements

Operating Systems Supported

Host Machine Requirements

Python Integration using pyspark

Filesystem Type for Linux Platforms

Files

system_requirements.md

Latest commit

History

system_requirements.md

File metadata and controls

System Requirements

Hardware Requirements

Operating Systems Supported

Host Machine Requirements

Python Integration using pyspark

Filesystem Type for Linux Platforms