Skip to content

Latest commit

 

History

History
474 lines (335 loc) · 16.4 KB

README.md

File metadata and controls

474 lines (335 loc) · 16.4 KB

Redis via Node.js and Python

Objective

By the end of this module you will know how to:

  • Install the Redis Driver for Node.js
  • Install the Redis Driver for Python
  • Manipulate Redis Data Structures in Python
  • Cache a Query's results into Redis in Node
  • Count the Number of Unique IPs that Visited your site in Node

Introduction

As we learned in Module 2 and 3, Redis is a feature-packed cache with pub/sub capabilities along with advanced data structures that can keep application information.

In this module, we will use Redis via the Node.js and the Python drivers in a variety of examples which show off Redis features.

Getting Started

Before you get started with this session, you will need to install a few things:

You can also install all of these things with either Homebrew for OSX or Chocolatey for Windows. Package managers make installation and updating easy.

Redis Commands with Python

With Python, we'll show you how to use the redis-py client to manipulate a few Redis data structures.

Getting Started

To install the Redis client go ahead and execute:

# linux/mac
sudo pip install redis
# windows
pip install redis

Create a new Python file called datastructures.py with the following import:

import redis
r = redis.StrictRedis(host="localhost", port=6379, db=0)

Going forward the Python Redis Client implements nearly all the native Redis commands.

Hashes

Continuing from the examples in Module 2 and 3, we can access Redis' HSET command right from the Redis client object:

import redis
r = redis.StrictRedis(host="localhost", port=6379, db=0)
print "Creating Key first_name"
r.hset("person:0", "first_name", "Rami")
print "Creating Key last_name"
r.hset("person:0", "last_name", "Edouard")
# now print the hset
print hgetall("person:0")

The Python script creates a new person hash with an id of 0 with both the first_name and last_name fields set.

Just as on the command line, doing a GETALL will retrieve the entire hash set:

$ python datastructures.py
{'first_name': 'Rami', 'last_name': 'Edouard'}

Multiple HashSets

Using the client.hmset function, we can set a batch of key-value mappings for a hash set and send it all at once to Redis, instead of sending one at a time. This improves the performance of your Redis calls by reducing the overhead of individual interactions.

print "Creating a batch mapping of keys first_name, last_name and location"
r.hmset("person:1", {"first_name":"Rami", "last_name":"Edouard", "location":"CANADA"})
print r.hgetall("person:1")

Sets

Sets are just as easy to manipulate. Let's go off the Sets example from the previous section. If we wanted to create a set with the names of countries, we can do that with the setadd function:

print "Creating a set of countries in redis..."
r.sadd("countries", "USA")
r.sadd("countries", "Canada")
r.sadd("countries", "Mexico")
print "Getting contry set from redis..."
print r.smembers("countries")

This is the output that you'll get from running the above lines:

Creating a set of countries in redis...
Getting contry set from redis...
set(['Canada', 'USA', 'Mexico']

Caching Queries (Node.js)

One scenario for using Redis or other caches is storing the results of common queries for a limited amount of time for fast recall.

For example, a long running query that may be common to many users of your application can be cached so that the result is quicker for most who ask for the query.

The Crime Data Set

We're going to build off of our previous MongoDB MVA example dataset and use it to run a long running query. You don't need to know much about MongoDB but in order to run the example you'll have to download and install MongoDB.

Loadup the Data

Once you're running locally, we can download the dataset and quickly load up that .csv file with mongoimport.

mongoimport Crimes_-_2001_to_present.csv --type csv --headerline --collection crimes

This dataset contains about 5 million entries and represents crimes that have occurred in Chicago since 2001.

A typical record in this database looks like:

{ "_id" : ObjectId("5462725476ecd357dbbc721e"), "ID" : 9844675, "Case Number" : "HX494115", "Date" : "11/03/2014 11:51:00 PM", "Block" : "056XX S MORGAN ST", "IUCR" : 486, "Primary Type" : "BATTERY", "Description" : "DOMESTIC BATTERY SIMPLE", "Location Description" : "ALLEY", "Arrest" : "false", "Domestic" : "true", "Beat" : 712, "District" : 7, "Ward" : 16, "Community Area" : 68, "FBI Code" : "08B", "X Coordinate" : 1170654, "Y Coordinate" : 1867165, "Year" : 2014, "Updated On" : "11/10/2014 12:43:02 PM", "Latitude" : 41.790980835, "Longitude" : -87.649786614, "Location" : "(41.790980835, -87.649786614)" }

Querying the Database

Now that we have the data up in MongoDB, let's make an app that will query this data. We can call this query.js.

var MongoClient = require('mongodb').MongoClient
var url = 'mongodb://127.0.0.1:27017/test';
var assert = require('assert');

MongoClient.connect(url, function(err, db) {

  //ensure we've connected
  assert.equal(null, err);

  var crimes = db.collection('crimes');
  //start timer
  console.time('query_time');

    crimes.find({"Primary Type": "ROBBERY"}, function(err, data){

        if(err){
          return console.error(err);
        }

        data.count(function(err, count){
          console.log('Total ROBBERY Crimes: ' + count);
          return console.timeEnd('query_time');
        });
    });
});

We won't go into too much detail about how MongoDB works. If you want more detail there, just check out our MongoDB language drivers walkthrough.

This query executes on an unindexed field - Primary Type - of over 5 million records, this can take a while. Run the following commands to run query.js:

# install mongodb language driver
# Creates a package.json package manifest for you
npm init
...
# Answer all the questions or use defaults npm provides
npm install mongodb --save
# run the application
node query.js
Total ROBBERY Crimes: 212034
query_time: 15701ms

Notice how long the query took - 15 seconds! This was on a cold start on our machine and subsequent times may take around 2-3 seconds:

Total ROBBERY Crimes: 212034
query_time: 2757ms

Caching the Query Result

Eitherway, 2.7 seconds for a query is quite long for a user to have to wait especially if this is a commonly accessed query. We can save the result of this query into Redis with a TTL on the data so that the data is not only stored for fast access later, but also available for any other user who needs that same exact query.

First, we'll add the Redis Node.js language driver:

npm install redis --save

Now, create a new file called cached_query.js next to query.js and copy the contents of query.js into cached_query.js to get things started.

Add the Redis client by adding a require to the redis module and calling the createClient function on it:

var redis = require("redis");
var client = redis.createClient();

Now, we can use the client.set function to store something in the Redis cache. We can use this to stash the result of the query count:

data.count(function(err, count){
	console.log('Total ROBBERY Crimes: ' + count);
	console.timeEnd('query_time');

   //cache into redis
   client.set('longquery_result', count, function(err){

     if(err){
   		  return console.error(err);
     }

     //expire query in 20 seconds
     client.expire('longquery_result', 20);
              
     console.log('sucessfully cached query');
     //close the database connection
     return db.close();
          
  });

});

Now the result of the query has been stored into the key longquery_result. Redis stores all of its values as strings, so for example if you wanted to store a JavaScript object you'd have to use JSON.stringify to properly store it into Redis.

The client.expire function will attach a TTL of 20 seconds to the data which will cause Redis to delete the stored value after an elapsed time of 20 seconds.

Now, when we attempt to run the query, we need to first attempt to fetch the data out of Redis before running the query on MongoDB. We'll add this right after we start the query timer:

client.get('longquery_result', function(err,count){
	if(err){
		//unexpected redis error
		return console.error(err);
	}
	
	if(!count){ //cache miss!
		//run long running query on MongoDB
	}
	else{ //cache hit!
		console.log('Total ROBBERY Crimes: ' + count);
		console.timeEnd('query_time');
	}
	
	
});

We can see that the client.get function callback takes the standard err as the first parameter and the value of the key as the second parameter, in our case we call this count.

If the count variable is defined, we know we got the data and can quickly return it to the user. Otherwise, we'll have to run the query on MongoDB.

Finally the entire cached_query.js looks like the following:

var redis = require("redis");
var client = redis.createClient();
var MongoClient = require('mongodb').MongoClient
var url = 'mongodb://127.0.0.1:27017/test';
var assert = require('assert');
MongoClient.connect(url, function(err, db) {

  //ensure we've connected
  assert.equal(null, err);

  var crimes = db.collection('crimes');

  //start timer
  console.time('query_time');

  client.get('longquery_result', function(err,count){

    if(err){
      return console.error(err);
    }

    if(!count){ //cache miss!
      crimes.find({"Primary Type": "ROBBERY"}, function(err, data){

        if(err){
          return console.error(err);
        }

        data.count(function(err, count){
          console.log('Total ROBBERY Crimes: ' + count);
          console.timeEnd('query_time');

          //cache into redis
          client.set('longquery_result', count, function(err){

            if(err){
              return console.error(err);
            }

            //expire query in 20 seconds
            client.expire('longquery_result', 20);
              
            console.log('sucessfully cached query');
            //close the database connection
            return db.close();
          
          });

        });

        
      });
      
    }
    else{ // cache hit!!
      console.log('Total ROBBERY Crimes: ' + count);
      console.timeEnd('query_time');
    }
  });
  
  	

});

Now lets checkout the advantage of caching the query:

# Ran the first time:
$ node cached_query.js 
Total ROBBERY Crimes: 212034
query_time: 3911ms
sucessfully cached query
# Ran the second time:
$ node cached_query.js 
Total ROBBERY Crimes: 212034
query_time: 0ms
# Ran the third time:
$ node cached_query.js
Total ROBBERY Crimes: 212034
query_time: 1ms
# Ran after 20 seconds:
Total ROBBERY Crimes: 212034
query_time: 2380ms
sucessfully cached query

We can see that the first time our app executes the query it takes a long time, in our case about 3.9 seconds. After caching into Redis it takes .01-.02 seconds a tremedous improvement because we don't actually have to run the query and Redis holds the information in-memory for fast retrieval. Finally after the data expires, in our case a 20 second TTL, the queries run again and take a long time.

The important thing to highlight is that these cached key-values can be shared among all your servers in your deployment and therefore these performance gains can be experienced by any of your users.

Counting the Unique Number of IP Addresses that Connect to Our Site (Node.js)

An interesting feature of Redis is the algorithm HyperLogLog as a datastructure. It's an ingenious way to approximate count the number of unique items, without actually having to store what those items are. Traditionally this would require you to utilize an amount of space proportional to the number of unique items you need to calculate.

Redis exposes HyperLogLog as a data structure where you simply have to add a value to a key and redis will keep track of how many unique items you've added to that collection with a < 1% error.

Creating a quick web server

In order to do this we'll need to quickly create a web app. From the Node.js homepage we can copy the standard 'Hello World' server:

var http = require('http');
http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end('Hello World\n');
}).listen(1337, '127.0.0.1');
console.log('Server running at http://127.0.0.1:1337/');

Counting Unique Clients with Redis

Let's add our Redis client as before

var redis = require("redis");
var client = redis.createClient();
var http = require('http');
http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end('Hello World\n');
}).listen(1337, '127.0.0.1');
console.log('Server running at http://127.0.0.1:1337/');

We can use the HTTP req.connection.remoteAddress variable to get the client's IP address:

var redis = require("redis");
var client = redis.createClient();
var http = require('http');
http.createServer(function (req, res) {
  client.pfadd('clientips', req.connection.remoteAddress, function(err){
        if(err){
          return res.writeHead(500, {'Content-Type': 'text/plain'});
          res.end('Error talking to redis ' + err + '\n');
        }

        client.pfcount('clientips', function(err, count){
                res.writeHead(500, {'Content-Type': 'text/plain'});
                return res.end('Hello ' + req.connection.remoteAddress + '\n about ' + count + ' unique connections have visited this site!');
        });
  });


}).listen(1337, '127.0.0.1');
console.log('Server running at http://127.0.0.1:1337/');

Running this you should see the following output:

$ curl http://localhost:1337
Hello 127.0.0.1
about 1 unique connections have visited this site!

Deploying

We need to have this deployed to a cloud service provider in order for the app to actually count the number of unique clients. Let's do this by quickly creating an Azure site and deploying. First, intiialize an Azure Website, setup source control deployment and copy your Git url. You can use this guide which walks you through the process.

Add to your package.json in the scripts object a start field:

  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1",
    "start": "node count_server.js"
  },

This will tell Azure Websites to start the website using that command.

We'll have to tell the driver that it needs to connect to an Azure Redis Cache instead of the local one. We can do this by adding the credentials to the Redis client.

var client = redis.createClient(6379, "<redis cache name>.redis.cache.windows.net", {auth_pass: '<password>', return_buffers: true});

Initialize your Git repository, add Azure as a remote with the Git publish link you got from Azure Websites:

# initialize git repo
git init
# create a .gitignore and exclude the node modules repo
echo node_modules >> .gitignore
# add everything to the repo
git add .
# commit
git commit . -m 'initial commit'
# add the azure remote
git remote add azure <your azure git url>
# push to azure websites
git push azure master

You'll see the node packages get installed and your site deployed.

Finally, visit your website and have a friend visit it. You should see the following output on your browser:

Hello 208.72.141.54:54160
 about 5 unique connections have visited this site!

Because the app is deployed to a publicly accessible location, you can now count the number of unique connections from clients anywhere on the web!