Skip to content

Commit

Permalink
genetic algorithm et al
Browse files Browse the repository at this point in the history
  • Loading branch information
calebmadrigal committed Sep 20, 2016
1 parent d1089e1 commit c6c1688
Show file tree
Hide file tree
Showing 4 changed files with 1,251 additions and 0 deletions.
258 changes: 258 additions & 0 deletions Autocorrelation.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Title: Fuzzy sequence matching with cross-correlation\n",
"\n",
"## Cross-correlation\n",
"\n",
"I am going to generate a \"random file\" - an array with values ranging between -1 and 1, and then randomly choose a \"random slice\" - a subarray portion from the \"random file\". I'm then going to randomly mutate some of the values in teh subarray, and see if I can still find where in the \"random file\" the \"random slice\" came from."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import random\n",
"import numpy as np\n",
"\n",
"FILE_SIZE = 100000\n",
"SLICE_SIZE = 100\n",
"\n",
"random_file = np.asarray([random.uniform(-1,1) for i in range(FILE_SIZE)])\n",
"random_slice_index = random.randint(0, FILE_SIZE-SLICE_SIZE)\n",
"random_slice = random_file[random_slice_index:random_slice_index+SLICE_SIZE]\n",
"\n",
"# Randomly change the sample by about 50% - simulating a 50% noise\n",
"random_slice_mutated = np.asarray([i + random.uniform(-0.5,0.5) for i in random_slice])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So now, we are going to try to find where in `random_file` the `random_slice_mutated` came from. We know it should be `random_slice_index`.\n",
"\n",
"### First, let's write our own cross-correlation function to find it."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Coding our own cross-correlation algorithm\n",
"\n",
"def find_sample_index(container, sample):\n",
" best_match_index = 0\n",
" largest_dot_product = 0\n",
" sample_size = len(sample)\n",
" for index in range(len(container) - sample_size):\n",
" dot = np.dot(container[index:index+sample_size], sample)\n",
" if dot > largest_dot_product:\n",
" best_match_index = index\n",
" largest_dot_product = dot\n",
" return (best_match_index, largest_dot_product)\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(14108, 33.238798044766)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"find_sample_index(random_file, random_slice_mutated)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Actual index (should match): 14108\n"
]
}
],
"source": [
"print(\"Actual index (should match):\", random_slice_index)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Now, let's use the `scipy.signal.correlate()` function instead of our own one.\n",
"\n",
"Note that the numpy implementation returns a vector of all of the correlation values, not just the largest one, so to find the largest index, we need to find the largest value."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"14108"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import scipy.signal\n",
"result = scipy.signal.correlate(random_file, random_slice)\n",
"max_correlation_index = np.argmax(result) - (SLICE_SIZE-1)\n",
"max_correlation_index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cross-correlation algorithm"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 3, 8, 14, 20, 26, 14, 5])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.array([1,2,3,4,5])\n",
"b = np.array([1,2,3])\n",
"scipy.signal.correlate(a,b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For vectors:\n",
"\n",
"A=\n",
"<table>\n",
"<tr><td>1</td><td>2</td><td>3</td><td>4</td><td>5</td></tr>\n",
"</table>\n",
"\n",
"B=\n",
"<table>\n",
"<tr><td>1</td><td>2</td><td>3</td></tr>\n",
"</table>\n",
"\n",
"The cross-correlate algorithm zero-pads each vector as such (where the blank cells represent 0):\n",
"\n",
"#### Step 1\n",
"<table>\n",
"<tr><td></td><td></td><td>1</td><td>2</td><td>3</td><td>4</td><td>5</td></tr>\n",
"<tr><td>1</td><td>2</td><td>3</td><td></td><td></td><td></td><td></td></tr>\n",
"</table>\n",
"\n",
" 0*1 + 0*2 + 1*3 + 2*0 + 3*0 + 4*0 + 5*0 = 3\n",
"\n",
"#### Step 2\n",
"<table>\n",
"<tr><td></td><td>1</td><td>2</td><td>3</td><td>4</td><td>5</td></tr>\n",
"<tr><td>1</td><td>2</td><td>3</td><td></td><td></td><td></td></tr>\n",
"</table>\n",
"\n",
" 0*1 + 1*2 + 2*3 + 3*0 + 4*0 + 5*0 = 8\n",
"\n",
"#### Step 3\n",
"<table>\n",
"<tr><td>1</td><td>2</td><td>3</td><td>4</td><td>5</td></tr>\n",
"<tr><td>1</td><td>2</td><td>3</td><td></td><td></td></tr>\n",
"</table>\n",
"\n",
" 1*1 + 2*2 + 3*3 + 4*0 + 5*0 = 14\n",
"\n",
"## ...\n",
"\n",
"#### Last step\n",
"<table>\n",
"<tr><td>1</td><td>2</td><td>3</td><td>4</td><td>5</td><td></td><td></td></tr>\n",
"<tr><td></td><td></td><td></td><td></td><td>1</td><td>2</td><td>3</td></tr>\n",
"</table>\n",
"\n",
"So each step consists of doing the appropriate zero-padding and taking the dot product of the 2 vectors, and then moving to the right one spot."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Loading

0 comments on commit c6c1688

Please sign in to comment.