{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\n", "# 第九章 推荐系统简介\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![image.png](images/recsys.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## 集体智慧编程\n", "\n", "> 集体智慧是指为了创造新想法,将一群人的行为、偏好或思想组合在一起。一般基于聪明的算法(Netflix, Google)或者提供内容的用户(Wikipedia)。\n", "\n", "集体智慧编程所强调的是前者,即通过编写计算机程序、构造具有智能的算法收集并分析用户的数据,发现新的信息甚至是知识。\n", "\n", "Toby Segaran, 2007, Programming Collective Intelligence. O'Reilly. \n", "\n", "https://github.com/computational-class/programming-collective-intelligence-code/blob/master/chapter2/recommendations.py" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## 推荐系统\n", "\n", "- 目前互联网世界最常见的智能产品形式。\n", "- 从信息时代过渡到注意力时代:\n", " - 信息过载(information overload)\n", " - 注意力稀缺\n", "- 推荐系统的基本任务是联系用户和物品,帮助用户快速发现有用信息,解决信息过载的问题。\n", " - 针对长尾分布问题,找到个性化需求,优化资源配置\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## 推荐系统的类型\n", "- 基于流行度的推荐\n", "- 社会化推荐(Social Recommendation)\n", " - 让朋友帮助推荐物品\n", "- 基于内容的推荐 (Content-based filtering)\n", " - 基于用户已经消费的物品内容,推荐新的物品。例如根据看过的电影的导演和演员,推荐新影片。\n", "- 基于协同过滤的推荐(collaborative filtering)\n", " - 找到和某用户的历史兴趣一致的用户,根据这些用户之间的相似性或者他们所消费物品的相似性,为该用户推荐物品\n", "- 隐语义模型(latent factor model)\n", "- 基于图的随机游走算法(random walk on graphs)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## 协同过滤算法\n", "\n", "- 基于邻域的方法(neighborhood-based method)\n", " - 基于用户的协同过滤(user-based filtering)\n", " - 基于物品的协同过滤 (item-based filtering)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## UserCF和ItemCF的比较\n", "\n", "- UserCF较为古老, 1992年应用于电子邮件个性化推荐系统Tapestry, 1994年应用于Grouplens新闻个性化推荐, 后来被Digg采用\n", " - 推荐那些与个体有共同兴趣爱好的用户所喜欢的物品(群体热点,社会化)\n", " - 反映用户所在小型群体中物品的热门程度\n", "- ItemCF相对较新,应用于电子商务网站Amazon和DVD租赁网站Netflix\n", " - 推荐那些和用户之前喜欢的物品相似的物品 (历史兴趣,个性化)\n", " - 反映了用户自己的兴趣传承\n", "- 新闻更新快,物品数量庞大,相似度变化很快,不利于维护一张物品相似度的表格,电影、音乐、图书则可以。\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## 推荐系统评测\n", "- 用户满意度\n", "- 预测准确度\n", "\n", " $r_{ui}$用户实际打分, $\\hat{r_{ui}}$推荐算法预测打分, T为测量次数\n", "\n", " - 均方根误差RMSE\n", " \n", " $RMSE = \\sqrt{\\frac{\\sum_{u, i \\in T} (r_{ui} - \\hat{r_{ui}})}{ T }^2} $\n", " \n", " - 平均绝对误差MAE\n", " \n", " $ MAE = \\frac{\\sum_{u, i \\in T} \\left | r_{ui} - \\hat{r_{ui}} \\right|}{ T}$" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T06:30:51.537620Z", "start_time": "2020-06-13T06:30:51.526566Z" }, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# A dictionary of movie critics and their ratings of a small\n", "# set of movies\n", "critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,\n", " 'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,\n", " 'The Night Listener': 3.0},\n", " 'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,\n", " 'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,\n", " 'You, Me and Dupree': 3.5},\n", " 'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,\n", " 'Superman Returns': 3.5, 'The Night Listener': 4.0},\n", " 'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,\n", " 'The Night Listener': 4.5, 'Superman Returns': 4.0,\n", " 'You, Me and Dupree': 2.5},\n", " 'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,\n", " 'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,\n", " 'You, Me and Dupree': 2.0},\n", " 'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,\n", " 'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},\n", " 'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T06:31:53.659215Z", "start_time": "2020-06-13T06:31:53.654866Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "2.5" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "critics['Lisa Rose']['Lady in the Water']" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T06:31:58.921456Z", "start_time": "2020-06-13T06:31:58.913864Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "4.5" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "critics['Toby']['Snakes on a Plane']" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T06:32:03.476879Z", "start_time": "2020-06-13T06:32:03.472952Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{'Snakes on a Plane': 4.5, 'You, Me and Dupree': 1.0, 'Superman Returns': 4.0}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "critics['Toby']" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\n", "## 1. User-based filtering\n", "\n", "\n", "### 1.0 Finding similar users" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T06:34:52.932021Z", "start_time": "2020-06-13T06:34:52.548845Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "3.1622776601683795" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 欧几里得距离\n", "import numpy as np\n", "np.sqrt(np.power(5-4, 2) + np.power(4-1, 2))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- This formula calculates the distance, which will be smaller for people who are more similar. \n", "- However, you need a function that gives higher values for people who are similar. \n", "- This can be done by adding 1 to the function (so you don’t get a division-by-zero error) and inverting it:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T06:36:22.370060Z", "start_time": "2020-06-13T06:36:22.346809Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0.2402530733520421" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1.0 /(1 + np.sqrt(np.power(5-4, 2) + np.power(4-1, 2)) )" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T06:41:56.679054Z", "start_time": "2020-06-13T06:41:56.672608Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Returns a distance-based similarity score for person1 and person2\n", "def sim_distance(prefs,person1,person2):\n", " # Get the list of shared_items\n", " si={}\n", " for item in prefs[person1]:\n", " if item in prefs[person2]:\n", " si[item]=1\n", " # if they have no ratings in common, return 0\n", " if len(si)==0: return 0\n", " # Add up the squares of all the differences\n", " sum_of_squares=np.sum([np.power(prefs[person1][item]-prefs[person2][item],2) for item in si])\n", " return 1/(1+np.sqrt(sum_of_squares) )" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T06:42:13.195212Z", "start_time": "2020-06-13T06:42:13.186579Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "0.3483314773547883" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sim_distance(critics, 'Lisa Rose','Toby')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Pearson Correlation Coefficient" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T06:50:04.179061Z", "start_time": "2020-06-13T06:50:04.167014Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Returns the Pearson correlation coefficient for p1 and p2\n", "def sim_pearson(prefs,p1,p2):\n", " # Get the list of mutually rated items\n", " si={}\n", " for item in prefs[p1]:\n", " if item in prefs[p2]: si[item]=1\n", " # Find the number of elements\n", " n=len(si)\n", " # if they are no ratings in common, return 0\n", " if n==0: return 0\n", " # Add up all the preferences\n", " sum1=np.sum([prefs[p1][it] for it in si])\n", " sum2=np.sum([prefs[p2][it] for it in si])\n", " # Sum up the squares\n", " sum1Sq=np.sum([np.power(prefs[p1][it],2) for it in si])\n", " sum2Sq=np.sum([np.power(prefs[p2][it],2) for it in si])\n", " # Sum up the products\n", " pSum=np.sum([prefs[p1][it]*prefs[p2][it] for it in si])\n", " # Calculate Pearson score\n", " num=pSum-(sum1*sum2/n)\n", " den=np.sqrt((sum1Sq-np.power(sum1,2)/n)*(sum2Sq-np.power(sum2,2)/n))\n", " if den==0: return 0\n", " return num/den" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T06:56:00.811865Z", "start_time": "2020-06-13T06:56:00.798655Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "0.9912407071619299" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sim_pearson(critics, 'Lisa Rose','Toby')\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T06:55:39.891497Z", "start_time": "2020-06-13T06:55:39.311790Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXIAAAD6CAYAAAC8sMwIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAQuUlEQVR4nO3df4xlZX3H8fdnoXRdhIDLRAjLsMAGlSy0xTGpxj9oYuJKG1JqjH9METUymtRIKUgpW01j2f6h/GHYBpONbZaSSatUY5V2pWBKLTRtGLDbCg1mU9ldyFYG2j+UFYq73/5x77aX8c7eO7tnZnx236/k5tz7PM895/vsj8+cee45M6kqJEntWrPaBUiSjo9BLkmNM8glqXEGuSQ1ziCXpMadutIHPOecc2rjxo0rfVhJatrjjz/+QlVNDOtb8SDfuHEjc3NzK31YSWpakr2L9bm0IkmNM8glqXEGuSQ1ziCXpMYZ5JLUOINckpbb7Cxs3Ahr1vS2s7Od7n6sIE9yWpKnknxxQfvOJM8l2dN/THZanSS1bnYWZmZg716o6m1nZjoN83HPyG8Hnlmkb7qqNvUf+7opS5JOEFu3wsGDr207eLDX3pGRQZ7kLcDbgC8f60GSzCSZSzI3Pz9/rLuRpPbsW+T8drH2Y3DUIE8S4C7gxkWGvArck+TJJDcvtp+q2lFVU1U1NTEx9A5TSToxTS6y4rxY+zEYdUb+MeDhqtozrLOqbqiqC4EtwA1J3tVZZZJ0Iti2Ddate23bunW99o6M+lkr1wFnJHkf8Abg9CRPV9XnBgdV1f4k9wObgYc6q06SWjc93dtu3dpbTpmc7IX4kfYOHDXIq+odR54n+SDwzsEQT7KpqvYkWU/vrPyjnVUmSSeK6elOg3uhJf/0wyTXApdU1Z3AXUkuA14BtlfVo10XKEk6urGDvKp2AjsXtF3dcT2SpCXyzk5JapxBLkmNM8glqXEGuSQ1ziCXpMYZ5JLUOINckhpnkEtS4wxySWqcQS5JjTPIJalxBrkkNc4gl6TGGeSS1DiDXJIaZ5BLUuMMcklqnEEuSY0zyCVpuc3OwsaNsGZNbzs72+nuxwryJKcleSrJFxe0b06yO8neJNuT+IVBkgbNzsLMDOzdC1W97cxMp2E+bvDeDjwzpP1u4DbgYuAK4JpuypKkE8TWrXDw4GvbDh7stXdkZJAneQvwNuDLC9ongIuqaldVHQJmgS2L7GMmyVySufn5+Q7KlqRG7Nu3tPZjcNQgTxLgLuDGId0bgMFKngXOG7afqtpRVVNVNTUxMXGstUpSeyYnl9Z+DEadkX8MeLiq9gzpOw04PPD6MHCoq8Ik6YSwbRusW/fatnXreu0dOXVE/3XAGUneB7wBOD3J01X1OeAAcP7A2A3A/s4qk6QTwfR0b7t1a285ZXKyF+JH2jtw1CCvqncceZ7kg8A7+yFOVe1L8lKSq4B/oBf63a3eS9KJYnq60+BeaMmXCya5Nskt/ZfXA9vpXdHy7ap6pMPaJEljGLW08n+qaiewc0HbE8Dl3ZYkSVoKb+CRpMYZ5JLUOINckhpnkEtS4wxySWqcQS5JjTPIJalxBrkkNc4gl6TGGeSS1DiDXJIaZ5BLUuMMcklqnEEuSY0zyCWpcQa5JDXOIJekxhnkktQ4g1ySGjcyyJOsSfJgku8leTrJuxf070zyXJI9/cfk8pUrSVponF++XMAHqupAki3ANuCBBWOmq+rhrouTJI02MsirqoAD/ZcXAruXtSJJ0pKMtUae5NYkLwI3AZ9Z0P0qcE+SJ5PcvMj7Z5LMJZmbn58/voolSa+R3gn3mIOT3wD+CHhLLXhjkguAB4GPV9VDi+1jamqq5ubmjrFcSTo5JXm8qqaG9S3pqpWq+irwemD9kL79wP3A5mMpUpJ0bMa5auXiJOf2n78deLmqXhjo39Tfrge2AI8tU62SpCHGuWrlLOCbSU4Bngfen+Ra4JKquhO4K8llwCvA9qp6dPnKlSQtNM5VK08Aly5ofnyg/+qui5Ikjc87OyWpcQa5JDXOIJekxhnkktQ4g1ySGmeQS1LjDHJJapxBLkmNM8glqXEGuSQ1ziCXpMYZ5JLUOINckhpnkEtS4wxySWqcQS5JjTPIJalxBrkkNc4gl6TGjQzyJGuSPJjke0meTvLuBf2bk+xOsjfJ9iR+cZCkFTRO6Bbwgaq6FLgR2Lag/27gNuBi4Argmk4rlCQd1cggr54D/ZcXAruP9CWZAC6qql1VdQiYBbYsS6WSpKHGWgZJcmuSF4GbgM8MdG0A9g28fhY4b8j7Z5LMJZmbn58/nnolSQuMFeRV9dmqWg/cDjyQJP2u04DDA0MPA4eGvH9HVU1V1dTExMTx1ixJGrCkDyar6qvA64H1/aYDwPkDQzYA+7spTZI0jnGuWrk4ybn9528HXq6qFwCqah/wUpKrkpwCXAfct5wFS5Je69QxxpwFfLMf1M8D709yLXBJVd0JXA/c0x+3s6oeWbZqJUk/ZWSQV9UTwKULmh9f0H95x3VJksbkzTuS1DiDXJIaZ5BLUuMMcklqnEEuSY0zyCWpcQa5JDXOIJekxhnkktQ4g1ySGmeQS1LjDHJJapxBLkmNM8glqXEGuSQ1ziCXpMYZ5JLUOINckhpnkEtS4wxySWrcyCBPsjbJjiRPJ9mb5KYF/TuTPJdkT/8xuXzlSpIWOnWMMacDDwAfBdYDTyb5y6raPzBmuqoeXob6JEkjjDwjr6oXq+or1fMCsB84aykHSTKTZC7J3Pz8/LHWKkkaYklr5Ek2A2uB7w40vwrck+TJJDcPe19V7aiqqaqampiYOPZqJUk/ZZylFQCSnAPcC3yoqupIe1Xd0O+/AHgwye6qeqjzSiVJQ411Rp7kbOAbwO1V9diwMf018/uBzd2VJ0kaZZyrVs4Evg5sq6pdQ/o39bfrgS3A0KCXJC2PcZZWPgFcCXw+yef7bV8AUlV3AncluQx4BdheVY8uT6mSpGFGBnlV3QHccZT+qzutSJK0JN7ZKUmNM8glqXEGuSQ1ziCXpMYZ5JLUOINckhpnkEtS4wxySWqcQS5JjTPIJalxBrkkNc4gl6TGGeSS1DiDXJIaZ5BLUuMMcklqnEEuSY0zyCWpcQa5JDVuZJAnWZtkR5Knk+xNctOC/s1Jdvf7tifxi4MkraBxQvd04AHgzcBbgduSXDDQfzdwG3AxcAVwTddFSpIWNzLIq+rFqvpK9bwA7AfOAkgyAVxUVbuq6hAwC2xZuI8kM0nmkszNz893PAVJOrktaRkkyWZgLfDdftMGYN/AkGeB8xa+r6p2VNVUVU1NTEwca62SpCHGDvIk5wD3Ah+qquo3nwYcHhh2GDjUXXmSpFHGCvIkZwPfAG6vqscGug4A5w+83kBv6UWStELGuWrlTODrwLaq2jXYV1X7gJeSXJXkFOA64L5lqVSSNNQ4Z+SfAK4EPp9kT/9xc5Jb+v3XA9uBZ4BvV9Ujy1OqJGmYU0cNqKo7gDuO0v8EcHmXRUmSxufNO5LUOINckhpnkEtS4wxySWqcQS5JjTPIJalxBrkkNc4gl6TGGeSS1DiDXJIaZ5BLUuMMcklqnEEuSY0zyCWpcQa5JDXOIJekxhnkktQ4g1ySGjd2kCd5XZJLl7MYSdLSjQzyJGcm+RrwA+DWIf07kzw38IuZJ5ejUEnScCN/+TJwGNgO3A/88iJjpqvq4a6KkiSNb+QZeVX9qKq+BfxkBeqRJC1RFx92vgrck+TJJDcPG5BkJslckrn5+fkODilJOuK4g7yqbqiqC4EtwA1J3jVkzI6qmqqqqYmJieM9pCRpQGeXH1bVfnrr6Ju72qckabTjDvIkm/rb9fTOyh873n1KksY38qqVJGcA3wHOANYmuQr4JHBJVd0J3JXkMuAVYHtVPbqM9UqSFhgZ5FX1Q2DTUfqv7rQiSdKSeIu+JDXOIJekxhnkktQ4g1ySGmeQS1LjDHJJapxBLkmNM8glqXEGuSQ1ziCXpMYZ5JLUOINckhpnkEtS4wxySWqcQS5JjTPIJalxBrkkNc4gl6TGjR3kSV6X5NLlLEaStHQjgzzJmUm+BvwAuHVI/+Yku5PsTbI9yfKc5c/OwsaNsGZNbzs7uyyHkaTWjBO6h4HtwO8s0n83cBtwMXAFcE03pQ2YnYWZGdi7F6p625kZw1ySGCPIq+pHVfUt4CcL+5JMABdV1a6qOgTMAls6r3LrVjh48LVtBw/22iXpJHe8yyAbgH0Dr58Fzls4KMlMkrkkc/Pz80s/yr59S2uXpJPI8Qb5afSWXo44DBxaOKiqdlTVVFVNTUxMLP0ok5NLa5ekk8jxBvkB4PyB1xuA/ce5z5+2bRusW/fatnXreu2SdJI7riCvqn3AS0muSnIKcB1wXyeVDZqehh074MILIeltd+zotUvSSe7UUQOSnAF8BzgDWJvkKuCTwCVVdSdwPXAPcBaws6oeWZZKp6cNbkkaYmSQV9UPgU1H6X8CuLzLoiRJ4/MWfUlqnEEuSY0zyCWpcQa5JDUuVbWyB0zmgb0retBunAO8sNpFrDDnfHI42ebc6nwvrKqhd1SueJC3KslcVU2tdh0ryTmfHE62OZ+I83VpRZIaZ5BLUuMM8vHtWO0CVoFzPjmcbHM+4ebrGrkkNc4zcklqnEEuSY0zyCWpcQb5gCRrk+xI8nSSvUluGjLmD5LsT/JMknesRp1dGjXnJO9K8m9Jvp/kT/s/d75pSdYkeTDJ9/rzfveC/s1Jdvf/PLYnaf7/yRhzvjHJv/f/Xd+bZORPRv1ZN2rOA+P+JMmela6vU1Xlo/8A1gPvBULv7q8fABcM9H8YuB94XX/M2tWueQXm/H1gM3AK8AjwntWuuYM5Bziv/3wLMLeg/9vAe/pz/nvg11e75hWY84fp/erGU4G/BaZXu+blnnO//VeAvwb2rHa9x/No/kyjS1X1YlV9pXpeoPdr684aGHIT8NtV9eP+mJdXp9LujDHnVwae/zzw/IoWuAz6cz3Qf3khsPtIX5IJ4KKq2lVVh4BZeiHQtKPNud//p1X1P1X1E+BfgTesdI1dGzXnJGuBPwQ+tdK1da35b5+WS5LNwFrgu/3XPwecC3w4yXuBp4CPVNWLq1dltxbOue83gS8BPwburarHV6O2riW5FfhdYB4Y/JZ7A7Bv4PWzwK+uYGnL5ihzHhyzjt5837OCpS2bEXP+NPAF4L9Wuq6ueUY+RJJzgHuBD1X/+y96yw5nA38HvJnef/atq1Nh9xaZM8AN9P6xfwJ4b5LJ1aiva1X12apaD9wOPJAk/a7TgMMDQw8Dh1a6vuVwlDkDvTVl4M+A7VX1zCqU2LnF5pzkcuAXqmp2VQvsiEG+QJKzgW8At1fVYwNdLwA/qqoH+0H3V8CbVqPGri025ySXAW+tqj+uqn8Evgb81iqVuSyq6qvA6+l9VgBwADh/YMgGestNJ4whc6YfcF8Enqqqu1ertuUyZM7XA5uS/AvwN8AFSb60WvUdL4N8QJIzga8D26pq12BfVb0K/HOSI+ulvwY8RuOONmd66+OTSd7YP1v7JeC/V7rGriW5OMm5/edvB17ufz5AVe0DXkpyVf8KneuA+1av2m4cbc59dwP/WVWfXpUCl8GIv+dbqupNVfWLwNXA/qp6/yqWe1y8RX9Akt8Hfo/eWdkRX6D353RnkovpLT+8kV6If6SqXlr5Srszxpxvobes8gr/P+eDK19pd5JcCfwFvatSngc+DkwCl/TnfCVwD70PfXdWVfMfhh1tzsA/0btS5z8G3vKpqvrzla6zS6P+ngfGbQQeqqpFf8n8zzqDXJIa59KKJDXOIJekxhnkktQ4g1ySGmeQS1LjDHJJapxBLkmNM8glqXH/Cz7OmT5JYtZHAAAAAElFTkSuQmCC\n", "text/plain": [ "<Figure size 432x288 with 1 Axes>" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "si = [item for item in critics['Lisa Rose'] if item in critics['Toby']]\n", "score1 = [critics['Lisa Rose'][i] for i in si]\n", "score2 = [critics['Toby'][i] for i in si]\n", "\n", "plt.plot(score1, score2, 'ro');" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T06:59:11.962629Z", "start_time": "2020-06-13T06:59:11.952880Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Returns the best matches for person from the prefs dictionary.\n", "# Number of results and similarity function are optional params.\n", "def topMatches(prefs,person,n=5,similarity=sim_pearson):\n", " scores=[(similarity(prefs,person,other),other)\n", " for other in prefs if other!=person]\n", " # Sort the list so the highest scores appear at the top \n", " scores.sort( )\n", " scores.reverse( )\n", " return scores[0:n]" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T06:59:30.391953Z", "start_time": "2020-06-13T06:59:30.378265Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[(0.9912407071619299, 'Lisa Rose'),\n", " (0.9244734516419049, 'Mick LaSalle'),\n", " (0.8934051474415647, 'Claudia Puig')]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "topMatches(critics,'Toby',n=3) # topN" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## 1.1 Recommending Items\n", "\n", "<img src=\"images/recsys2.png\" width =1000>" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ " Toby相似的五个用户(Rose, Reymour, Puig, LaSalle, Matthews)及相似度(依次为0.99, 0.38, 0.89, 0.92, 0.66)\n", "- 这五个用户看过的三个电影(Night,Lady, Luck)及其评分\n", " - 例如,Rose对Night评分是3.0\n", "- S.xNight是用户相似度与电影评分的乘积\n", " - 例如,Toby与Rose相似度(0.99) * Rose对Night评分是3.0 = 2.97\n", "- 可以得到每部电影的得分\n", " - 例如,Night的得分是12.89 = 2.97+1.14+4.02+2.77+1.99\n", "- 电影得分需要使用用户相似度之和进行加权\n", " - 例如,Night电影的预测得分是3.35 = 12.89/3.84\n" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T07:14:52.505189Z", "start_time": "2020-06-13T07:14:52.484149Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Gets recommendations for a person by using a weighted average\n", "# of every other user's rankings\n", "def getRecommendations(prefs,person,similarity=sim_pearson):\n", " totals={}\n", " simSums={}\n", " for other in prefs:\n", " # don't compare me to myself\n", " if other==person: continue\n", " sim=similarity(prefs,person,other)\n", " # ignore scores of zero or lower\n", " if sim<=0: continue\n", " for item in prefs[other]: \n", " # only score movies I haven't seen yet\n", " if item not in prefs[person]:# or prefs[person][item]==0:\n", " # Similarity * Score\n", " totals.setdefault(item,0)\n", " totals[item]+=prefs[other][item]*sim\n", " # Sum of similarities\n", " simSums.setdefault(item,0)\n", " simSums[item]+=sim\n", " # Create the normalized list\n", " rankings=[(total/simSums[item],item) for item,total in totals.items()]\n", " # Return the sorted list\n", " rankings.sort()\n", " rankings.reverse()\n", " return rankings" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T07:15:06.527935Z", "start_time": "2020-06-13T07:15:06.505181Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[(3.3477895267131017, 'The Night Listener'),\n", " (2.8325499182641614, 'Lady in the Water'),\n", " (2.530980703765565, 'Just My Luck')]" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Now you can find out what movies I should watch next:\n", "getRecommendations(critics,'Toby')" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T07:16:02.611091Z", "start_time": "2020-06-13T07:16:02.590855Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[(3.457128694491423, 'The Night Listener'),\n", " (2.778584003814924, 'Lady in the Water'),\n", " (2.422482042361917, 'Just My Luck')]" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# You’ll find that the results are only affected very slightly by the choice of similarity metric.\n", "getRecommendations(critics,'Toby',similarity=sim_distance)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\n", "## 2. Item-based filtering\n", "\n", "\n", "Now you know how to find similar people and recommend products for a given person\n", "\n", "**But what if you want to see which products are similar to each other?**\n", "\n", "This is actually the same method we used earlier to determine similarity between people—" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### 将item-user字典的键值翻转" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T07:20:02.381751Z", "start_time": "2020-06-13T07:20:02.371482Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# you just need to swap the people and the items. \n", "def transformPrefs(prefs):\n", " result={}\n", " for person in prefs:\n", " for item in prefs[person]:\n", " result.setdefault(item,{})\n", " # Flip item and person\n", " result[item][person]=prefs[person][item]\n", " return result\n", "\n", "movies = transformPrefs(critics)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### 计算item的相似性" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T07:20:55.450779Z", "start_time": "2020-06-13T07:20:55.437989Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[(0.6579516949597695, 'You, Me and Dupree'),\n", " (0.4879500364742689, 'Lady in the Water'),\n", " (0.11180339887498941, 'Snakes on a Plane'),\n", " (-0.1798471947990544, 'The Night Listener'),\n", " (-0.42289003161103106, 'Just My Luck')]" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "topMatches(movies,'Superman Returns')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### 给item推荐user" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T07:24:49.898121Z", "start_time": "2020-06-13T07:24:49.876244Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[(0.3090169943749474, 'Snakes on a Plane'),\n", " (0.252650308587072, 'The Night Listener'),\n", " (0.2402530733520421, 'Lady in the Water'),\n", " (0.20799159651347807, 'Just My Luck'),\n", " (0.1918253663634734, 'You, Me and Dupree')]" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def calculateSimilarItems(prefs,n=10):\n", " # Create a dictionary of items showing which other items they\n", " # are most similar to.\n", " result={}\n", " # Invert the preference matrix to be item-centric\n", " itemPrefs=transformPrefs(prefs)\n", " c=0\n", " for item in itemPrefs:\n", " # Status updates for large datasets\n", " c+=1\n", " if c%100==0: \n", " print(\"%d / %d\" % (c,len(itemPrefs)))\n", " # Find the most similar items to this one\n", " scores=topMatches(itemPrefs,item,n=n,similarity=sim_distance)\n", " result[item]=scores\n", " return result\n", "\n", "itemsim=calculateSimilarItems(critics) \n", "itemsim['Superman Returns']" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "<img src=\"images/recsys3.png\" width = 1200>" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Toby看过三个电影(snakes、Superman、dupree)和评分(依次是4.5、4.0、1.0)\n", "- 表格2-3给出这三部电影与另外三部电影的相似度\n", " - 例如superman与night的相似度是0.103\n", "- R.xNight表示Toby对自己看过的三部定影的评分与Night这部电影相似度的乘积\n", " - 例如,0.818 = 4.5*0.182\n", " \n", " \n", "- 那么Toby对于Night的评分可以表达为0.818+0.412+0.148 = 1.378\n", " - 已经知道Night相似度之和是0.182+0.103+0.148 = 0.433\n", " - 那么Toby对Night的最终评分可以表达为:1.378/0.433 = 3.183\n" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T07:35:40.432626Z", "start_time": "2020-06-13T07:35:40.393254Z" }, "code_folding": [ 0, 5 ], "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[(3.1667425234070894, 'The Night Listener'),\n", " (2.9366294028444346, 'Just My Luck'),\n", " (2.868767392626467, 'Lady in the Water')]" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def getRecommendedItems(prefs,itemMatch,user):\n", " userRatings=prefs[user]\n", " scores={}\n", " totalSim={}\n", " # Loop over items rated by this user\n", " for (item,rating) in userRatings.items( ):\n", " # Loop over items similar to this one\n", " for (similarity,item2) in itemMatch[item]:\n", " # Ignore if this user has already rated this item\n", " if item2 in userRatings: continue\n", " # Weighted sum of rating times similarity\n", " scores.setdefault(item2,0)\n", " scores[item2]+=similarity*rating\n", " # Sum of all the similarities\n", " totalSim.setdefault(item2,0)\n", " totalSim[item2]+=similarity\n", " # Divide each total score by total weighting to get an average\n", " rankings=[(score/totalSim[item],item) for item,score in scores.items( )]\n", " # Return the rankings from highest to lowest\n", " rankings.sort( )\n", " rankings.reverse( )\n", " return rankings\n", "\n", "getRecommendedItems(critics,itemsim,'Toby')" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T07:36:33.702524Z", "start_time": "2020-06-13T07:36:33.692602Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[(4.0, 'Michael Phillips'), (3.0, 'Jack Matthews')]" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "getRecommendations(movies,'Just My Luck')" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T07:37:19.312286Z", "start_time": "2020-06-13T07:37:19.297469Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[(3.1637361366111816, 'Michael Phillips')]" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "getRecommendations(movies, 'You, Me and Dupree')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "<img src = './img/itemcfNetwork.png' width = 700px>\n", "\n", "**基于物品的协同过滤算法的网络表示方法**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## 基于图的模型\n", "\n", "使用二分图表示用户行为,因此基于图的算法可以应用到推荐系统当中。\n", "\n", "<img src = './img/graphrec.png' width = 500px>" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T07:40:53.240510Z", "start_time": "2020-06-13T07:40:53.225497Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "# https://github.com/ParticleWave/RecommendationSystemStudy/blob/d1960056b96cfaad62afbfe39225ff680240d37e/PersonalRank.py\n", "import os\n", "import random\n", "\n", "class Graph:\n", " def __init__(self):\n", " self.G = dict()\n", " \n", " def addEdge(self, p, q):\n", " if p not in self.G: self.G[p] = dict()\n", " if q not in self.G: self.G[q] = dict()\n", " self.G[p][q] = 1\n", " self.G[q][p] = 1\n", "\n", " def getGraphMatrix(self):\n", " return self.G" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T07:41:17.803287Z", "start_time": "2020-06-13T07:41:17.791525Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dict_keys(['A', 'a', 'c', 'B', 'b', 'd', 'C'])\n" ] } ], "source": [ "graph = Graph()\n", "graph.addEdge('A', 'a')\n", "graph.addEdge('A', 'c')\n", "graph.addEdge('B', 'a')\n", "graph.addEdge('B', 'b')\n", "graph.addEdge('B', 'c')\n", "graph.addEdge('B', 'd')\n", "graph.addEdge('C', 'c')\n", "graph.addEdge('C', 'd')\n", "G = graph.getGraphMatrix()\n", "print(G.keys())" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T07:41:29.594553Z", "start_time": "2020-06-13T07:41:29.578141Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "{'A': {'a': 1, 'c': 1},\n", " 'a': {'A': 1, 'B': 1},\n", " 'c': {'A': 1, 'B': 1, 'C': 1},\n", " 'B': {'a': 1, 'b': 1, 'c': 1, 'd': 1},\n", " 'b': {'B': 1},\n", " 'd': {'B': 1, 'C': 1},\n", " 'C': {'c': 1, 'd': 1}}" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "G" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T07:41:51.228320Z", "start_time": "2020-06-13T07:41:51.194474Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A a 1\n", "A c 1\n", "a A 1\n", "a B 1\n", "c A 1\n", "c B 1\n", "c C 1\n", "B a 1\n", "B b 1\n", "B c 1\n", "B d 1\n", "b B 1\n", "d B 1\n", "d C 1\n", "C c 1\n", "C d 1\n" ] } ], "source": [ "for i, ri in G.items():\n", " for j, wij in ri.items():\n", " print(i, j, wij) " ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T07:47:47.553766Z", "start_time": "2020-06-13T07:47:47.522736Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "def PersonalRank(G, alpha, root, max_step):\n", " # G is the biparitite graph of users' ratings on items\n", " # alpha is the probability of random walk forward\n", " # root is the studied User\n", " # max_step if the steps of iterations.\n", " rank = dict()\n", " rank = {x:0.0 for x in G.keys()}\n", " rank[root] = 1.0\n", " for k in range(max_step):\n", " tmp = {x:0.0 for x in G.keys()}\n", " for i,ri in G.items():\n", " for j,wij in ri.items():\n", " if j not in tmp: tmp[j] = 0.0 #\n", " tmp[j] += alpha * rank[i] / (len(ri)*1.0)\n", " if j == root: tmp[j] += 1.0 - alpha\n", " rank = tmp\n", " print(k, rank)\n", " return rank" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T07:48:02.258913Z", "start_time": "2020-06-13T07:48:02.228561Z" }, "scrolled": true, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 {'A': 0.3999999999999999, 'a': 0.4, 'c': 0.4, 'B': 0.0, 'b': 0.0, 'd': 0.0, 'C': 0.0}\n", "1 {'A': 0.6666666666666666, 'a': 0.15999999999999998, 'c': 0.15999999999999998, 'B': 0.2666666666666667, 'b': 0.0, 'd': 0.0, 'C': 0.10666666666666669}\n", "2 {'A': 0.5066666666666666, 'a': 0.32, 'c': 0.3626666666666667, 'B': 0.10666666666666665, 'b': 0.053333333333333344, 'd': 0.09600000000000003, 'C': 0.04266666666666666}\n", "3 {'A': 0.624711111111111, 'a': 0.22399999999999998, 'c': 0.24106666666666665, 'B': 0.3057777777777778, 'b': 0.02133333333333333, 'd': 0.03839999999999999, 'C': 0.13511111111111113}\n", "4 {'A': 0.5538844444444444, 'a': 0.31104, 'c': 0.36508444444444443, 'B': 0.1863111111111111, 'b': 0.06115555555555557, 'd': 0.11520000000000002, 'C': 0.07964444444444443}\n", "5 {'A': 0.6217718518518518, 'a': 0.258816, 'c': 0.29067377777777775, 'B': 0.31677629629629633, 'b': 0.03726222222222222, 'd': 0.06911999999999999, 'C': 0.14343585185185187}\n", "6 {'A': 0.5810394074074073, 'a': 0.312064, 'c': 0.3694383407407408, 'B': 0.2384971851851852, 'b': 0.06335525925925926, 'd': 0.12072960000000002, 'C': 0.1051610074074074}\n", "7 {'A': 0.6233424908641975, 'a': 0.2801152, 'c': 0.322179602962963, 'B': 0.322318538271605, 'b': 0.047699437037037044, 'd': 0.08976384000000001, 'C': 0.14680873086419757}\n", "8 {'A': 0.5979606407901235, 'a': 0.313800704, 'c': 0.372524196345679, 'B': 0.27202572641975314, 'b': 0.06446370765432101, 'd': 0.12318720000000004, 'C': 0.12182009679012348}\n", "9 {'A': 0.6248600672921809, 'a': 0.2935894016, 'c': 0.34231744031604944, 'B': 0.32570591341563787, 'b': 0.05440514528395063, 'd': 0.10313318400000002, 'C': 0.14861466569218107}\n", "10 {'A': 0.6087204113909463, 'a': 0.31508520959999997, 'c': 0.3745310758768724, 'B': 0.29349780121810704, 'b': 0.06514118268312757, 'd': 0.12458704896, 'C': 0.13253792435094652}\n", "11 {'A': 0.6259090374071659, 'a': 0.3021877248, 'c': 0.3552028945403786, 'B': 0.327856803137668, 'b': 0.05869956024362141, 'd': 0.11171472998400003, 'C': 0.14970977315116596}\n", "12 {'A': 0.6155958617974342, 'a': 0.31593497559039996, 'c': 0.37581888485086634, 'B': 0.3072414019859315, 'b': 0.0655713606275336, 'd': 0.125455269888, 'C': 0.13940666387103434}\n", "13 {'A': 0.6265923595297243, 'a': 0.30768662511615996, 'c': 0.3634492906645737, 'B': 0.32923155598695125, 'b': 0.0614482803971863, 'd': 0.11721094594560004, 'C': 0.15040047724876437}\n", "14 {'A': 0.6199944608903503, 'a': 0.31648325500928, 'c': 0.37664344590878573, 'B': 0.3160374635863394, 'b': 0.06584631119739025, 'd': 0.126006502096896, 'C': 0.14380418922212634}\n", "15 {'A': 0.6270315542460547, 'a': 0.311205277073408, 'c': 0.36872695276225853, 'B': 0.3301112040427255, 'b': 0.06320749271726787, 'd': 0.12072916840611841, 'C': 0.1508408530811013}\n", "16 {'A': 0.6228092982326321, 'a': 0.316834862506967, 'c': 0.37717120373940755, 'B': 0.3216669597688938, 'b': 0.0660222408085451, 'd': 0.12635858204098563, 'C': 0.14661885476571632}\n", "17 {'A': 0.6273129326666287, 'a': 0.3134571112468316, 'c': 0.37210465315311814, 'B': 0.33067415812985923, 'b': 0.06433339195377877, 'd': 0.1229809338600653, 'C': 0.15112242048023627}\n", "18 {'A': 0.6246107520062307, 'a': 0.3170600046926233, 'c': 0.3775089728847178, 'B': 0.32526983911327995, 'b': 0.06613483162597185, 'd': 0.12658379981806636, 'C': 0.1484202810515243}\n", "19 {'A': 0.627493061312974, 'a': 0.3148982686251483, 'c': 0.37426638104575805, 'B': 0.3310344465409781, 'b': 0.06505396782265599, 'd': 0.1244220802432657, 'C': 0.15130257936315128}\n" ] }, { "data": { "text/plain": [ "{'A': 0.627493061312974,\n", " 'a': 0.3148982686251483,\n", " 'c': 0.37426638104575805,\n", " 'B': 0.3310344465409781,\n", " 'b': 0.06505396782265599,\n", " 'd': 0.1244220802432657,\n", " 'C': 0.15130257936315128}" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PersonalRank(G, 0.8, 'A', 20)\n", "# print(PersonalRank(G, 0.8, 'B', 20))\n", "# print(PersonalRank(G, 0.8, 'C', 20))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\n", "## 3. MovieLens Recommender\n", "\n", "MovieLens是一个电影评价的真实数据,由明尼苏达州立大学的GroupLens项目组开发。\n", "\n", "### 数据下载\n", "http://grouplens.org/datasets/movielens/1m/\n", "\n", "> These files contain 1,000,209 anonymous ratings of approximately 3,900 movies \n", "made by 6,040 MovieLens users who joined MovieLens in 2000.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**数据格式**\n", "\n", "All ratings are contained in the file \"ratings.dat\" and are in the following format:\n", "\n", "UserID::MovieID::Rating::Timestamp\n", "\n", "1::1193::5::978300760\n", "\n", "1::661::3::978302109\n", "\n", "1::914::3::978301968\n" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T07:52:18.464975Z", "start_time": "2020-06-13T07:52:18.448739Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def loadMovieLens(path='/Users/datalab/bigdata/cjc/ml-1m/'):\n", " # Get movie titles\n", " movies={}\n", " for line in open(path+'movies.dat', encoding = 'iso-8859-15'):\n", " (id,title)=line.split('::')[0:2]\n", " movies[id]=title\n", " \n", " # Load data\n", " prefs={}\n", " for line in open(path+'/ratings.dat'):\n", " (user,movieid,rating,ts)=line.split('::')\n", " prefs.setdefault(user,{})\n", " prefs[user][movies[movieid]]=float(rating)\n", " return prefs" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T07:52:27.350084Z", "start_time": "2020-06-13T07:52:24.180778Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "{'X-Men (2000)': 4.0,\n", " 'Dear Diary (Caro Diario) (1994)': 1.0,\n", " 'Brady Bunch Movie, The (1995)': 1.0,\n", " 'Terminator 2: Judgment Day (1991)': 5.0,\n", " 'Last of the Mohicans, The (1992)': 1.0,\n", " 'Rock, The (1996)': 5.0,\n", " 'Star Wars: Episode V - The Empire Strikes Back (1980)': 5.0,\n", " 'Princess Bride, The (1987)': 3.0,\n", " 'Raiders of the Lost Ark (1981)': 4.0,\n", " 'Cyrano de Bergerac (1990)': 4.0,\n", " 'Chambermaid on the Titanic, The (1998)': 1.0,\n", " 'Diebinnen (1995)': 1.0,\n", " 'Indiana Jones and the Last Crusade (1989)': 4.0,\n", " 'Jaws (1975)': 5.0,\n", " 'Longest Day, The (1962)': 1.0,\n", " 'Untouchables, The (1987)': 5.0,\n", " 'Hunt for Red October, The (1990)': 5.0,\n", " 'Pope of Greenwich Village, The (1984)': 1.0,\n", " 'King Kong (1933)': 1.0,\n", " 'Hurricane, The (1999)': 5.0,\n", " 'Fast, Cheap & Out of Control (1997)': 1.0,\n", " 'Westworld (1973)': 1.0,\n", " 'Braveheart (1995)': 5.0,\n", " 'Planet of the Apes (1968)': 1.0,\n", " 'Escape from the Planet of the Apes (1971)': 1.0,\n", " 'Thelma & Louise (1991)': 1.0,\n", " 'Bad Boys (1995)': 5.0,\n", " 'Out of Sight (1998)': 1.0,\n", " 'Matrix, The (1999)': 5.0,\n", " 'Buffalo 66 (1998)': 1.0,\n", " 'Taking of Pelham One Two Three, The (1974)': 1.0,\n", " 'King of New York (1990)': 1.0,\n", " 'Star Wars: Episode IV - A New Hope (1977)': 5.0,\n", " 'Shanghai Noon (2000)': 1.0,\n", " 'U-571 (2000)': 5.0,\n", " 'Rocky (1976)': 5.0,\n", " 'Get Shorty (1995)': 1.0,\n", " \"On Her Majesty's Secret Service (1969)\": 1.0,\n", " 'Man with the Golden Gun, The (1974)': 5.0,\n", " 'Alice in Wonderland (1951)': 1.0,\n", " 'Die Hard (1988)': 3.0,\n", " 'Gladiator (2000)': 5.0,\n", " 'Cowboy Way, The (1994)': 1.0,\n", " 'Palookaville (1996)': 1.0,\n", " 'Speed (1994)': 1.0,\n", " 'Fugitive, The (1993)': 5.0,\n", " 'Lethal Weapon (1987)': 5.0,\n", " 'Good, The Bad and The Ugly, The (1966)': 4.0,\n", " 'Benji (1974)': 1.0,\n", " 'Mask of Zorro, The (1998)': 5.0,\n", " 'Goldfinger (1964)': 5.0,\n", " 'From Russia with Love (1963)': 1.0,\n", " 'Dr. No (1962)': 1.0,\n", " 'Faster Pussycat! Kill! Kill! (1965)': 1.0,\n", " 'Army of Darkness (1993)': 3.0,\n", " 'Saving Private Ryan (1998)': 4.0,\n", " 'Jurassic Park (1993)': 5.0,\n", " 'True Romance (1993)': 1.0,\n", " 'Terminator, The (1984)': 4.0}" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prefs=loadMovieLens()\n", "prefs['87']" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### user-based filtering" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T07:53:58.902528Z", "start_time": "2020-06-13T07:53:55.260258Z" }, "scrolled": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[(5.0, 'Time of the Gypsies (Dom za vesanje) (1989)'),\n", " (5.0, 'Tigrero: A Film That Was Never Made (1994)'),\n", " (5.0, 'Schlafes Bruder (Brother of Sleep) (1995)'),\n", " (5.0, 'Return with Honor (1998)'),\n", " (5.0, 'Lured (1947)'),\n", " (5.0, 'Identification of a Woman (Identificazione di una donna) (1982)'),\n", " (5.0, 'I Am Cuba (Soy Cuba/Ya Kuba) (1964)'),\n", " (5.0, 'Hour of the Pig, The (1993)'),\n", " (5.0, 'Gay Deceivers, The (1969)'),\n", " (5.0, 'Gate of Heavenly Peace, The (1995)'),\n", " (5.0, 'Foreign Student (1994)'),\n", " (5.0, 'Dingo (1992)'),\n", " (5.0, 'Dangerous Game (1993)'),\n", " (5.0, 'Callejón de los milagros, El (1995)'),\n", " (5.0, 'Bittersweet Motel (2000)'),\n", " (4.820460101722989, 'Apple, The (Sib) (1998)'),\n", " (4.738956184936385, 'Lamerica (1994)'),\n", " (4.681816541467396, 'Bells, The (1926)'),\n", " (4.664958072522234, 'Hurricane Streets (1998)'),\n", " (4.650741840804562, 'Sanjuro (1962)'),\n", " (4.649974172600346, 'On the Ropes (1999)'),\n", " (4.636825408739506, 'Shawshank Redemption, The (1994)'),\n", " (4.627888709544554, 'For All Mankind (1989)'),\n", " (4.582048349280509, 'Midaq Alley (Callejón de los milagros, El) (1995)'),\n", " (4.579778646871158, \"Schindler's List (1993)\"),\n", " (4.575199941037386,\n", " 'Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954)'),\n", " (4.574904988403462, 'Godfather, The (1972)'),\n", " (4.5746840191882345, \"Ed's Next Move (1996)\"),\n", " (4.558519037147829, 'Hanging Garden, The (1997)'),\n", " (4.5277600427755935, 'Close Shave, A (1995)')]" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "getRecommendations(prefs,'87')[0:30]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Item-based filtering" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2020-06-13T08:13:41.289867Z", "start_time": "2020-06-13T07:54:47.353302Z" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "itemsim=calculateSimilarItems(prefs,n=50)" ] }, { "cell_type": "markdown", "metadata": { "scrolled": true, "slideshow": { "slide_type": "subslide" } }, "source": [ "getRecommendedItems(prefs,itemsim,'87')[0:30]" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Libraries:\n", "- [Surprise](https://github.com/NicolasHug/Surprise): a Python scikit building and analyzing recommender systems that deal with explicit rating data.\n", "- [LightFM](https://github.com/lyst/lightfm): a hybrid recommendation algorithm in Python\n", "- [Python-recsys](https://github.com/ocelma/python-recsys): a Python library for implementing a recommender system\n", "\n", "https://realpython.com/build-recommendation-engine-collaborative-filtering/\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![image.png](images/end.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\n", "## Buiding Recommendation System with Turicreate\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In this notebook we will import Turicreate and use it to\n", "\n", "- train two models that can be used for recommending new songs to users \n", "- compare the performance of the two models\n" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T08:23:50.783592Z", "start_time": "2020-06-13T08:23:48.939865Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "%matplotlib inline\n", "import turicreate as tc\n", "import matplotlib.pyplot as plt\n" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T08:24:01.844570Z", "start_time": "2020-06-13T08:24:01.820073Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n", " <tr>\n", " <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">item_id</th>\n", " <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">rating</th>\n", " <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">user_id</th>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">3</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">c</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">4</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">b</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">c</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">4</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">d</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">3</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2</td>\n", " </tr>\n", "</table>\n", "[8 rows x 3 columns]<br/>\n", "</div>" ], "text/plain": [ "Columns:\n", "\titem_id\tstr\n", "\trating\tint\n", "\tuser_id\tstr\n", "\n", "Rows: 8\n", "\n", "Data:\n", "+---------+--------+---------+\n", "| item_id | rating | user_id |\n", "+---------+--------+---------+\n", "| a | 1 | 0 |\n", "| b | 3 | 0 |\n", "| c | 2 | 0 |\n", "| a | 5 | 1 |\n", "| b | 4 | 1 |\n", "| b | 1 | 2 |\n", "| c | 4 | 2 |\n", "| d | 3 | 2 |\n", "+---------+--------+---------+\n", "[8 rows x 3 columns]" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sf = tc.SFrame({'user_id': [\"0\", \"0\", \"0\", \"1\", \"1\", \"2\", \"2\", \"2\"],\n", " 'item_id': [\"a\", \"b\", \"c\", \"a\", \"b\", \"b\", \"c\", \"d\"],\n", " 'rating': [1, 3, 2, 5, 4, 1, 4, 3]})\n", "sf" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T08:24:26.373168Z", "start_time": "2020-06-13T08:24:25.218227Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "<pre>Preparing data set.</pre>" ], "text/plain": [ "Preparing data set." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre> Data has 8 observations with 3 users and 4 items.</pre>" ], "text/plain": [ " Data has 8 observations with 3 users and 4 items." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre> Data prepared in: 0.004622s</pre>" ], "text/plain": [ " Data prepared in: 0.004622s" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>Training ranking_factorization_recommender for recommendations.</pre>" ], "text/plain": [ "Training ranking_factorization_recommender for recommendations." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+--------------------------------+--------------------------------------------------+----------+</pre>" ], "text/plain": [ "+--------------------------------+--------------------------------------------------+----------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| Parameter | Description | Value |</pre>" ], "text/plain": [ "| Parameter | Description | Value |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+--------------------------------+--------------------------------------------------+----------+</pre>" ], "text/plain": [ "+--------------------------------+--------------------------------------------------+----------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| num_factors | Factor Dimension | 32 |</pre>" ], "text/plain": [ "| num_factors | Factor Dimension | 32 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| regularization | L2 Regularization on Factors | 1e-09 |</pre>" ], "text/plain": [ "| regularization | L2 Regularization on Factors | 1e-09 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| solver | Solver used for training | sgd |</pre>" ], "text/plain": [ "| solver | Solver used for training | sgd |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |</pre>" ], "text/plain": [ "| linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| ranking_regularization | Rank-based Regularization Weight | 0.25 |</pre>" ], "text/plain": [ "| ranking_regularization | Rank-based Regularization Weight | 0.25 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| max_iterations | Maximum Number of Iterations | 25 |</pre>" ], "text/plain": [ "| max_iterations | Maximum Number of Iterations | 25 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+--------------------------------+--------------------------------------------------+----------+</pre>" ], "text/plain": [ "+--------------------------------+--------------------------------------------------+----------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre> Optimizing model using SGD; tuning step size.</pre>" ], "text/plain": [ " Optimizing model using SGD; tuning step size." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre> Using 8 / 8 points for tuning the step size.</pre>" ], "text/plain": [ " Using 8 / 8 points for tuning the step size." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+---------+-------------------+------------------------------------------+</pre>" ], "text/plain": [ "+---------+-------------------+------------------------------------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| Attempt | Initial Step Size | Estimated Objective Value |</pre>" ], "text/plain": [ "| Attempt | Initial Step Size | Estimated Objective Value |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+---------+-------------------+------------------------------------------+</pre>" ], "text/plain": [ "+---------+-------------------+------------------------------------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 0 | 25 | Not Viable |</pre>" ], "text/plain": [ "| 0 | 25 | Not Viable |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 1 | 6.25 | Not Viable |</pre>" ], "text/plain": [ "| 1 | 6.25 | Not Viable |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 2 | 1.5625 | Not Viable |</pre>" ], "text/plain": [ "| 2 | 1.5625 | Not Viable |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 3 | 0.390625 | 2.83657 |</pre>" ], "text/plain": [ "| 3 | 0.390625 | 2.83657 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 4 | 0.195312 | 2.74513 |</pre>" ], "text/plain": [ "| 4 | 0.195312 | 2.74513 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 5 | 0.0976562 | 2.80074 |</pre>" ], "text/plain": [ "| 5 | 0.0976562 | 2.80074 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 6 | 0.0488281 | 3.00431 |</pre>" ], "text/plain": [ "| 6 | 0.0488281 | 3.00431 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 7 | 0.0244141 | 3.28335 |</pre>" ], "text/plain": [ "| 7 | 0.0244141 | 3.28335 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+---------+-------------------+------------------------------------------+</pre>" ], "text/plain": [ "+---------+-------------------+------------------------------------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| Final | 0.195312 | 2.74513 |</pre>" ], "text/plain": [ "| Final | 0.195312 | 2.74513 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+---------+-------------------+------------------------------------------+</pre>" ], "text/plain": [ "+---------+-------------------+------------------------------------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>Starting Optimization.</pre>" ], "text/plain": [ "Starting Optimization." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+---------+--------------+-------------------+-----------------------+-------------+</pre>" ], "text/plain": [ "+---------+--------------+-------------------+-----------------------+-------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |</pre>" ], "text/plain": [ "| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+---------+--------------+-------------------+-----------------------+-------------+</pre>" ], "text/plain": [ "+---------+--------------+-------------------+-----------------------+-------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| Initial | 141us | 3.89999 | 1.3637 | |</pre>" ], "text/plain": [ "| Initial | 141us | 3.89999 | 1.3637 | |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+---------+--------------+-------------------+-----------------------+-------------+</pre>" ], "text/plain": [ "+---------+--------------+-------------------+-----------------------+-------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 1 | 628us | 4.16499 | 1.66421 | 0.195312 |</pre>" ], "text/plain": [ "| 1 | 628us | 4.16499 | 1.66421 | 0.195312 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 2 | 1.29ms | 3.17812 | 1.43798 | 0.116134 |</pre>" ], "text/plain": [ "| 2 | 1.29ms | 3.17812 | 1.43798 | 0.116134 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 3 | 2.021ms | 2.6823 | 1.24864 | 0.0856819 |</pre>" ], "text/plain": [ "| 3 | 2.021ms | 2.6823 | 1.24864 | 0.0856819 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 4 | 2.607ms | 2.49669 | 1.15702 | 0.0580668 |</pre>" ], "text/plain": [ "| 4 | 2.607ms | 2.49669 | 1.15702 | 0.0580668 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 5 | 3.279ms | 2.42692 | 1.15987 | 0.0491185 |</pre>" ], "text/plain": [ "| 5 | 3.279ms | 2.42692 | 1.15987 | 0.0491185 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 10 | 14.955ms | 2.28822 | 1.08066 | 0.029206 |</pre>" ], "text/plain": [ "| 10 | 14.955ms | 2.28822 | 1.08066 | 0.029206 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 25 | 36.588ms | 2.20359 | 1.07144 | 0.0146899 |</pre>" ], "text/plain": [ "| 25 | 36.588ms | 2.20359 | 1.07144 | 0.0146899 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+---------+--------------+-------------------+-----------------------+-------------+</pre>" ], "text/plain": [ "+---------+--------------+-------------------+-----------------------+-------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>Optimization Complete: Maximum number of passes through the data reached.</pre>" ], "text/plain": [ "Optimization Complete: Maximum number of passes through the data reached." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>Computing final objective value and training RMSE.</pre>" ], "text/plain": [ "Computing final objective value and training RMSE." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre> Final objective value: 2.79173</pre>" ], "text/plain": [ " Final objective value: 2.79173" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre> Final training RMSE: 1.04408</pre>" ], "text/plain": [ " Final training RMSE: 1.04408" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n", " <tr>\n", " <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">user_id</th>\n", " <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">item_id</th>\n", " <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">score</th>\n", " <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">rank</th>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">d</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1.2694321870803833</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">c</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">3.9875572323799133</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">d</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">3.1200567483901978</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">a</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2.483269453048706</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " </tr>\n", "</table>\n", "[4 rows x 4 columns]<br/>\n", "</div>" ], "text/plain": [ "Columns:\n", "\tuser_id\tstr\n", "\titem_id\tstr\n", "\tscore\tfloat\n", "\trank\tint\n", "\n", "Rows: 4\n", "\n", "Data:\n", "+---------+---------+--------------------+------+\n", "| user_id | item_id | score | rank |\n", "+---------+---------+--------------------+------+\n", "| 0 | d | 1.2694321870803833 | 1 |\n", "| 1 | c | 3.9875572323799133 | 1 |\n", "| 1 | d | 3.1200567483901978 | 2 |\n", "| 2 | a | 2.483269453048706 | 1 |\n", "+---------+---------+--------------------+------+\n", "[4 rows x 4 columns]" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = tc.recommender.create(sf, target='rating')\n", "recs = m.recommend()\n", "recs" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### The CourseTalk dataset: loading and first look\n", "\n", "Loading of the CourseTalk database." ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T08:26:48.198494Z", "start_time": "2020-06-13T08:26:48.093742Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "<pre>Materializing SFrame</pre>" ], "text/plain": [ "Materializing SFrame" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<html> <body> <iframe style=\"border:0;margin:0\" width=\"1000\" height=\"1200\" srcdoc='<html lang=\"en\"> <head> <script src=\"https://cdnjs.cloudflare.com/ajax/libs/vega/5.4.0/vega.js\"></script> <script src=\"https://cdnjs.cloudflare.com/ajax/libs/vega-embed/4.0.0/vega-embed.js\"></script> <script src=\"https://cdnjs.cloudflare.com/ajax/libs/vega-tooltip/0.5.1/vega-tooltip.min.js\"></script> <link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdnjs.cloudflare.com/ajax/libs/vega-tooltip/0.5.1/vega-tooltip.min.css\"> <style> .vega-actions > a{ color:white; text-decoration: none; font-family: \"Arial\"; cursor:pointer; padding:5px; background:#AAAAAA; border-radius:4px; padding-left:10px; padding-right:10px; margin-right:5px; } .vega-actions{ margin-top:20px; text-align:center } .vega-actions > a{ background:#999999; } </style> </head> <body> <div id=\"vis\"> </div> <script> var vega_json = \"{\\\"$schema\\\": \\\"https://vega.github.io/schema/vega/v4.json\\\", \\\"metadata\\\": {\\\"bubbleOpts\\\": {\\\"showAllFields\\\": false, \\\"fields\\\": [{\\\"field\\\": \\\"left\\\"}, {\\\"field\\\": \\\"right\\\"}, {\\\"field\\\": \\\"count\\\"}, {\\\"field\\\": \\\"label\\\"}]}}, \\\"width\\\": 800, \\\"height\\\": 980, \\\"padding\\\": 8, \\\"data\\\": [{\\\"name\\\": \\\"pts_store\\\"}, {\\\"name\\\": \\\"source_2\\\", \\\"values\\\": [{\\\"a\\\": 0, \\\"title\\\": \\\"user_id\\\", \\\"num_row\\\": 2773, \\\"type\\\": \\\"integer\\\", \\\"num_unique\\\": 2017, \\\"num_missing\\\": 0, \\\"mean\\\": 1020.098089, \\\"min\\\": 1.0, \\\"max\\\": 2017.0, \\\"median\\\": 1103.0, \\\"stdev\\\": 555.490133, \\\"numeric\\\": [{\\\"left\\\": -30, \\\"right\\\": 74, \\\"count\\\": 152}, {\\\"left\\\": 74, \\\"right\\\": 178, \\\"count\\\": 105}, {\\\"left\\\": 178, \\\"right\\\": 282, \\\"count\\\": 107}, {\\\"left\\\": 282, \\\"right\\\": 386, \\\"count\\\": 107}, {\\\"left\\\": 386, \\\"right\\\": 490, \\\"count\\\": 108}, {\\\"left\\\": 490, \\\"right\\\": 594, \\\"count\\\": 128}, {\\\"left\\\": 594, \\\"right\\\": 698, \\\"count\\\": 146}, {\\\"left\\\": 698, \\\"right\\\": 802, \\\"count\\\": 155}, {\\\"left\\\": 802, \\\"right\\\": 906, \\\"count\\\": 151}, {\\\"left\\\": 906, \\\"right\\\": 1010, \\\"count\\\": 107}, {\\\"left\\\": 1010, \\\"right\\\": 1114, \\\"count\\\": 136}, {\\\"left\\\": 1114, \\\"right\\\": 1218, \\\"count\\\": 272}, {\\\"left\\\": 1218, \\\"right\\\": 1322, \\\"count\\\": 192}, {\\\"left\\\": 1322, \\\"right\\\": 1426, \\\"count\\\": 149}, {\\\"left\\\": 1426, \\\"right\\\": 1530, \\\"count\\\": 154}, {\\\"left\\\": 1530, \\\"right\\\": 1634, \\\"count\\\": 186}, {\\\"left\\\": 1634, \\\"right\\\": 1738, \\\"count\\\": 118}, {\\\"left\\\": 1738, \\\"right\\\": 1842, \\\"count\\\": 122}, {\\\"left\\\": 1842, \\\"right\\\": 1946, \\\"count\\\": 106}, {\\\"left\\\": 1946, \\\"right\\\": 2050, \\\"count\\\": 72}, {\\\"start\\\": -30, \\\"stop\\\": 2050, \\\"step\\\": 104}], \\\"categorical\\\": []}, {\\\"a\\\": 1, \\\"title\\\": \\\"course_id\\\", \\\"num_row\\\": 2773, \\\"type\\\": \\\"integer\\\", \\\"num_unique\\\": 214, \\\"num_missing\\\": 0, \\\"mean\\\": 46.83736, \\\"min\\\": 1.0, \\\"max\\\": 214.0, \\\"median\\\": 14.0, \\\"stdev\\\": 60.457737, \\\"numeric\\\": [{\\\"left\\\": -2, \\\"right\\\": 9, \\\"count\\\": 1223}, {\\\"left\\\": 9, \\\"right\\\": 20, \\\"count\\\": 316}, {\\\"left\\\": 20, \\\"right\\\": 31, \\\"count\\\": 134}, {\\\"left\\\": 31, \\\"right\\\": 42, \\\"count\\\": 154}, {\\\"left\\\": 42, \\\"right\\\": 53, \\\"count\\\": 93}, {\\\"left\\\": 53, \\\"right\\\": 64, \\\"count\\\": 87}, {\\\"left\\\": 64, \\\"right\\\": 75, \\\"count\\\": 37}, {\\\"left\\\": 75, \\\"right\\\": 86, \\\"count\\\": 11}, {\\\"left\\\": 86, \\\"right\\\": 97, \\\"count\\\": 83}, {\\\"left\\\": 97, \\\"right\\\": 108, \\\"count\\\": 32}, {\\\"left\\\": 108, \\\"right\\\": 119, \\\"count\\\": 134}, {\\\"left\\\": 119, \\\"right\\\": 130, \\\"count\\\": 38}, {\\\"left\\\": 130, \\\"right\\\": 141, \\\"count\\\": 91}, {\\\"left\\\": 141, \\\"right\\\": 152, \\\"count\\\": 87}, {\\\"left\\\": 152, \\\"right\\\": 163, \\\"count\\\": 14}, {\\\"left\\\": 163, \\\"right\\\": 174, \\\"count\\\": 69}, {\\\"left\\\": 174, \\\"right\\\": 185, \\\"count\\\": 50}, {\\\"left\\\": 185, \\\"right\\\": 196, \\\"count\\\": 68}, {\\\"left\\\": 196, \\\"right\\\": 207, \\\"count\\\": 33}, {\\\"left\\\": 207, \\\"right\\\": 218, \\\"count\\\": 19}, {\\\"start\\\": -2, \\\"stop\\\": 218, \\\"step\\\": 11}], \\\"categorical\\\": []}, {\\\"a\\\": 2, \\\"title\\\": \\\"rating\\\", \\\"num_row\\\": 2773, \\\"type\\\": \\\"float\\\", \\\"num_unique\\\": 10, \\\"num_missing\\\": 0, \\\"mean\\\": 4.503606, \\\"min\\\": 0.5, \\\"max\\\": 5.0, \\\"median\\\": 5.0, \\\"stdev\\\": 0.946545, \\\"numeric\\\": [{\\\"left\\\": 0.4554, \\\"right\\\": 0.6858, \\\"count\\\": 30}, {\\\"left\\\": 0.6858, \\\"right\\\": 0.9162, \\\"count\\\": 0}, {\\\"left\\\": 0.9162, \\\"right\\\": 1.1466, \\\"count\\\": 43}, {\\\"left\\\": 1.1466, \\\"right\\\": 1.377, \\\"count\\\": 0}, {\\\"left\\\": 1.377, \\\"right\\\": 1.6074, \\\"count\\\": 23}, {\\\"left\\\": 1.6074, \\\"right\\\": 1.8378, \\\"count\\\": 0}, {\\\"left\\\": 1.8378, \\\"right\\\": 2.0682, \\\"count\\\": 53}, {\\\"left\\\": 2.0682, \\\"right\\\": 2.2986, \\\"count\\\": 0}, {\\\"left\\\": 2.2986, \\\"right\\\": 2.529, \\\"count\\\": 35}, {\\\"left\\\": 2.529, \\\"right\\\": 2.7594, \\\"count\\\": 0}, {\\\"left\\\": 2.7594, \\\"right\\\": 2.9898, \\\"count\\\": 0}, {\\\"left\\\": 2.9898, \\\"right\\\": 3.2202, \\\"count\\\": 80}, {\\\"left\\\": 3.2202, \\\"right\\\": 3.4506, \\\"count\\\": 0}, {\\\"left\\\": 3.4506, \\\"right\\\": 3.681, \\\"count\\\": 94}, {\\\"left\\\": 3.681, \\\"right\\\": 3.9114, \\\"count\\\": 0}, {\\\"left\\\": 3.9114, \\\"right\\\": 4.1418, \\\"count\\\": 285}, {\\\"left\\\": 4.1418, \\\"right\\\": 4.3722, \\\"count\\\": 0}, {\\\"left\\\": 4.3722, \\\"right\\\": 4.6026, \\\"count\\\": 313}, {\\\"left\\\": 4.6026, \\\"right\\\": 4.833, \\\"count\\\": 0}, {\\\"left\\\": 4.833, \\\"right\\\": 5.0634, \\\"count\\\": 1817}, {\\\"start\\\": 0.4554, \\\"stop\\\": 5.0634, \\\"step\\\": 0.2304}], \\\"categorical\\\": []}]}, {\\\"name\\\": \\\"data_2\\\", \\\"source\\\": \\\"source_2\\\", \\\"transform\\\": [{\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"20\\\", \\\"as\\\": \\\"c_x_axis_back\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+66\\\", \\\"as\\\": \\\"c_main_background\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+43\\\", \\\"as\\\": \\\"c_top_bar\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+59\\\", \\\"as\\\": \\\"c_top_title\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+58\\\", \\\"as\\\": \\\"c_top_type\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+178\\\", \\\"as\\\": \\\"c_rule\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+106\\\", \\\"as\\\": \\\"c_num_rows\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+130\\\", \\\"as\\\": \\\"c_num_unique\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+154\\\", \\\"as\\\": \\\"c_missing\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+105\\\", \\\"as\\\": \\\"c_num_rows_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+130\\\", \\\"as\\\": \\\"c_num_unique_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+154\\\", \\\"as\\\": \\\"c_missing_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+195\\\", \\\"as\\\": \\\"c_frequent_items\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+218\\\", \\\"as\\\": \\\"c_first_item\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+235\\\", \\\"as\\\": \\\"c_second_item\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+252\\\", \\\"as\\\": \\\"c_third_item\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+269\\\", \\\"as\\\": \\\"c_fourth_item\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+286\\\", \\\"as\\\": \\\"c_fifth_item\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+200\\\", \\\"as\\\": \\\"c_mean\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+220\\\", \\\"as\\\": \\\"c_min\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+240\\\", \\\"as\\\": \\\"c_max\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+260\\\", \\\"as\\\": \\\"c_median\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+280\\\", \\\"as\\\": \\\"c_stdev\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+198\\\", \\\"as\\\": \\\"c_mean_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+218\\\", \\\"as\\\": \\\"c_min_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+238\\\", \\\"as\\\": \\\"c_max_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+258\\\", \\\"as\\\": \\\"c_median_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+278\\\", \\\"as\\\": \\\"c_stdev_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+106\\\", \\\"as\\\": \\\"graph_offset\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+132\\\", \\\"as\\\": \\\"graph_offset_categorical\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")?false:true\\\", \\\"as\\\": \\\"c_clip_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")?250:0\\\", \\\"as\\\": \\\"c_width_numeric_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\")?false:true\\\", \\\"as\\\": \\\"c_clip_val_cat\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\")?250:0\\\", \\\"as\\\": \\\"c_width_numeric_val_cat\\\"}]}], \\\"marks\\\": [{\\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 0}, \\\"width\\\": {\\\"value\\\": 734}, \\\"y\\\": {\\\"value\\\": 0}, \\\"height\\\": {\\\"value\\\": 366}, \\\"clip\\\": {\\\"value\\\": 0}, \\\"fill\\\": {\\\"value\\\": \\\"#ffffff\\\"}, \\\"fillOpacity\\\": {\\\"value\\\": 0}, \\\"stroke\\\": {\\\"value\\\": \\\"#000000\\\"}, \\\"strokeWidth\\\": {\\\"value\\\": 0}}}, \\\"marks\\\": [{\\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 0}, \\\"width\\\": {\\\"value\\\": 734}, \\\"y\\\": {\\\"value\\\": 0}, \\\"height\\\": {\\\"value\\\": 366}, \\\"clip\\\": {\\\"value\\\": 0}, \\\"fill\\\": {\\\"value\\\": \\\"#ffffff\\\"}, \\\"fillOpacity\\\": {\\\"value\\\": 0}, \\\"stroke\\\": {\\\"value\\\": \\\"#000000\\\"}, \\\"strokeWidth\\\": {\\\"value\\\": 0}}}, \\\"scales\\\": [], \\\"axes\\\": [], \\\"marks\\\": [{\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 33}, \\\"width\\\": {\\\"value\\\": 700}, \\\"y\\\": {\\\"value\\\": 66}, \\\"height\\\": {\\\"value\\\": 250}, \\\"fill\\\": {\\\"value\\\": \\\"#FEFEFE\\\"}, \\\"fillOpacity\\\": {\\\"value\\\": 1}, \\\"stroke\\\": {\\\"value\\\": \\\"#DEDEDE\\\"}, \\\"strokeWidth\\\": {\\\"value\\\": 0.5}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_main_background\\\"}}}, \\\"type\\\": \\\"rect\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 33}, \\\"width\\\": {\\\"value\\\": 700}, \\\"y\\\": {\\\"value\\\": 43}, \\\"height\\\": {\\\"value\\\": 30}, \\\"fill\\\": {\\\"value\\\": \\\"#F5F5F5\\\"}, \\\"fillOpacity\\\": {\\\"value\\\": 1}, \\\"stroke\\\": {\\\"value\\\": \\\"#DEDEDE\\\"}, \\\"strokeWidth\\\": {\\\"value\\\": 0.5}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_top_bar\\\"}}}, \\\"type\\\": \\\"rect\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 720}, \\\"y\\\": {\\\"value\\\": 58}, \\\"text\\\": {\\\"signal\\\": \\\"''+datum[\\\\\\\"type\\\\\\\"]\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 12}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#595859\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+687\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_top_type\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 44}, \\\"y\\\": {\\\"value\\\": 59}, \\\"text\\\": {\\\"signal\\\": \\\"''+datum[\\\\\\\"title\\\\\\\"]\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 15}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#9B9B9B\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+11\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_top_title\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 500}, \\\"y\\\": {\\\"value\\\": 178}, \\\"stroke\\\": {\\\"value\\\": \\\"#EDEDEB\\\"}, \\\"strokeWidth\\\": {\\\"value\\\": 1}, \\\"strokeCap\\\": {\\\"value\\\": \\\"butt\\\"}, \\\"x2\\\": {\\\"value\\\": 720}, \\\"y2\\\": {\\\"value\\\": 178}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}, \\\"x2\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+687\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_rule\\\"}, \\\"y2\\\": {\\\"field\\\": \\\"c_rule\\\"}}}, \\\"type\\\": \\\"rule\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 500}, \\\"y\\\": {\\\"value\\\": 106}, \\\"text\\\": {\\\"value\\\": \\\"Num. Rows:\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 12}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_num_rows\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 500}, \\\"y\\\": {\\\"value\\\": 130}, \\\"text\\\": {\\\"value\\\": \\\"Num. Unique:\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 12}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_num_unique\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 500}, \\\"y\\\": {\\\"value\\\": 154}, \\\"text\\\": {\\\"value\\\": \\\"Missing:\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 12}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_missing\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 700}, \\\"y\\\": {\\\"value\\\": 105}, \\\"text\\\": {\\\"signal\\\": \\\"toString(format(datum[\\\\\\\"num_row\\\\\\\"], \\\\\\\",\\\\\\\"))\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 12}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#5A5A5A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_num_rows_val\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 700}, \\\"y\\\": {\\\"value\\\": 130}, \\\"text\\\": {\\\"signal\\\": \\\"toString(format(datum[\\\\\\\"num_unique\\\\\\\"], \\\\\\\",\\\\\\\"))\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 12}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#5A5A5A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_num_unique_val\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 700}, \\\"y\\\": {\\\"value\\\": 154}, \\\"text\\\": {\\\"signal\\\": \\\"toString(format(datum[\\\\\\\"num_missing\\\\\\\"], \\\\\\\",\\\\\\\"))\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 12}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#5A5A5A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_missing_val\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 500}, \\\"y\\\": {\\\"value\\\": 200}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\")? \\\\\\\"Frequent Items\\\\\\\":\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"fontWeight\\\": {\\\"value\\\": \\\"bold\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_frequent_items\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 520}, \\\"y\\\": {\\\"value\\\": 200}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 1) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][0][\\\\\\\"label\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+487\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_first_item\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 520}, \\\"y\\\": {\\\"value\\\": 200}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 2) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][1][\\\\\\\"label\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+487\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_second_item\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 520}, \\\"y\\\": {\\\"value\\\": 200}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 3) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][2][\\\\\\\"label\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+487\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_third_item\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 520}, \\\"y\\\": {\\\"value\\\": 200}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 4) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][3][\\\\\\\"label\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+487\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_fourth_item\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 520}, \\\"y\\\": {\\\"value\\\": 200}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 5) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][4][\\\\\\\"label\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+487\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_fifth_item\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 700}, \\\"y\\\": {\\\"value\\\": 200}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 1) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][0][\\\\\\\"count\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#7A7A7A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_first_item\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 700}, \\\"y\\\": {\\\"value\\\": 200}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 2) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][1][\\\\\\\"count\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 10}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#7A7A7A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_second_item\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 700}, \\\"y\\\": {\\\"value\\\": 200}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 3) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][2][\\\\\\\"count\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 10}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#7A7A7A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_third_item\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 700}, \\\"y\\\": {\\\"value\\\": 200}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 4) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][3][\\\\\\\"count\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 10}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#7A7A7A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_fourth_item\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 700}, \\\"y\\\": {\\\"value\\\": 200}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 5) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][4][\\\\\\\"count\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 10}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#7A7A7A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_fifth_item\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 500}, \\\"y\\\": {\\\"value\\\": 200}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")? \\\\\\\"Mean:\\\\\\\":\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"fontWeight\\\": {\\\"value\\\": \\\"bold\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_mean\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 500}, \\\"y\\\": {\\\"value\\\": 220}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")? \\\\\\\"Min:\\\\\\\":\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"fontWeight\\\": {\\\"value\\\": \\\"bold\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_min\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 500}, \\\"y\\\": {\\\"value\\\": 240}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")? \\\\\\\"Max:\\\\\\\":\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"fontWeight\\\": {\\\"value\\\": \\\"bold\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_max\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 500}, \\\"y\\\": {\\\"value\\\": 260}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")? \\\\\\\"Median:\\\\\\\":\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"fontWeight\\\": {\\\"value\\\": \\\"bold\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_median\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 500}, \\\"y\\\": {\\\"value\\\": 280}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")? \\\\\\\"St. Dev:\\\\\\\":\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"fontWeight\\\": {\\\"value\\\": \\\"bold\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_stdev\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 700}, \\\"y\\\": {\\\"value\\\": 198}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")?toString(format(datum[\\\\\\\"mean\\\\\\\"], \\\\\\\",\\\\\\\")):\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 10}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#6A6A6A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_mean_val\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 700}, \\\"y\\\": {\\\"value\\\": 218}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")?toString(format(datum[\\\\\\\"min\\\\\\\"], \\\\\\\",\\\\\\\")):\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 10}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#6A6A6A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_min_val\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 700}, \\\"y\\\": {\\\"value\\\": 238}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")?toString(format(datum[\\\\\\\"max\\\\\\\"], \\\\\\\",\\\\\\\")):\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 10}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#6A6A6A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_max_val\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 700}, \\\"y\\\": {\\\"value\\\": 258}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")?toString(format(datum[\\\\\\\"median\\\\\\\"], \\\\\\\",\\\\\\\")):\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 10}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#6A6A6A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_median_val\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 700}, \\\"y\\\": {\\\"value\\\": 278}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")?toString(format(datum[\\\\\\\"stdev\\\\\\\"], \\\\\\\",\\\\\\\")):\\\\\\\"\\\\\\\"\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"dy\\\": {\\\"value\\\": 0, \\\"offset\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"fontSize\\\": {\\\"value\\\": 10}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#6A6A6A\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_stdev_val\\\"}}}, \\\"type\\\": \\\"text\\\"}, {\\\"from\\\": {\\\"facet\\\": {\\\"name\\\": \\\"new_data\\\", \\\"data\\\": \\\"data_2\\\", \\\"field\\\": \\\"numeric\\\"}}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 120}, \\\"width\\\": {\\\"value\\\": 250}, \\\"y\\\": {\\\"field\\\": \\\"graph_offset\\\"}, \\\"height\\\": {\\\"value\\\": 150}, \\\"fill\\\": {\\\"value\\\": \\\"#ffffff\\\"}, \\\"fillOpacity\\\": {\\\"value\\\": 0}, \\\"stroke\\\": {\\\"value\\\": \\\"#000000\\\"}, \\\"strokeWidth\\\": {\\\"value\\\": 0}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+87\\\"}, \\\"clip\\\": {\\\"field\\\": \\\"c_clip_val\\\"}, \\\"width\\\": {\\\"field\\\": \\\"c_width_numeric_val\\\"}}}, \\\"type\\\": \\\"group\\\", \\\"scales\\\": [{\\\"name\\\": \\\"x\\\", \\\"type\\\": \\\"linear\\\", \\\"domain\\\": {\\\"data\\\": \\\"new_data\\\", \\\"fields\\\": [\\\"left\\\", \\\"right\\\"], \\\"sort\\\": true}, \\\"range\\\": [0, {\\\"signal\\\": \\\"width\\\"}], \\\"nice\\\": true, \\\"zero\\\": true}, {\\\"name\\\": \\\"y\\\", \\\"type\\\": \\\"linear\\\", \\\"domain\\\": {\\\"data\\\": \\\"new_data\\\", \\\"field\\\": \\\"count\\\"}, \\\"range\\\": [{\\\"signal\\\": \\\"height\\\"}, 0], \\\"nice\\\": true, \\\"zero\\\": true}], \\\"axes\\\": [{\\\"title\\\": \\\"Values\\\", \\\"scale\\\": \\\"x\\\", \\\"labelOverlap\\\": true, \\\"orient\\\": \\\"bottom\\\", \\\"tickCount\\\": {\\\"signal\\\": \\\"ceil(width/40)\\\"}, \\\"zindex\\\": 1}, {\\\"scale\\\": \\\"x\\\", \\\"domain\\\": false, \\\"grid\\\": true, \\\"labels\\\": false, \\\"maxExtent\\\": 0, \\\"minExtent\\\": 0, \\\"orient\\\": \\\"bottom\\\", \\\"tickCount\\\": {\\\"signal\\\": \\\"ceil(width/40)\\\"}, \\\"ticks\\\": false, \\\"zindex\\\": 0, \\\"gridScale\\\": \\\"y\\\"}, {\\\"title\\\": \\\"Count\\\", \\\"scale\\\": \\\"y\\\", \\\"labelOverlap\\\": true, \\\"orient\\\": \\\"left\\\", \\\"tickCount\\\": {\\\"signal\\\": \\\"ceil(height/40)\\\"}, \\\"zindex\\\": 1}, {\\\"scale\\\": \\\"y\\\", \\\"domain\\\": false, \\\"grid\\\": true, \\\"labels\\\": false, \\\"maxExtent\\\": 0, \\\"minExtent\\\": 0, \\\"orient\\\": \\\"left\\\", \\\"tickCount\\\": {\\\"signal\\\": \\\"ceil(height/40)\\\"}, \\\"ticks\\\": false, \\\"zindex\\\": 0, \\\"gridScale\\\": \\\"x\\\"}], \\\"style\\\": \\\"cell\\\", \\\"signals\\\": [{\\\"name\\\": \\\"width\\\", \\\"update\\\": \\\"250\\\"}, {\\\"name\\\": \\\"height\\\", \\\"update\\\": \\\"150\\\"}], \\\"marks\\\": [{\\\"name\\\": \\\"marks\\\", \\\"type\\\": \\\"rect\\\", \\\"style\\\": [\\\"rect\\\"], \\\"from\\\": {\\\"data\\\": \\\"new_data\\\"}, \\\"encode\\\": {\\\"hover\\\": {\\\"fill\\\": {\\\"value\\\": \\\"#7EC2F3\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"scale\\\": \\\"x\\\", \\\"field\\\": \\\"left\\\"}, \\\"x2\\\": {\\\"scale\\\": \\\"x\\\", \\\"field\\\": \\\"right\\\"}, \\\"y\\\": {\\\"scale\\\": \\\"y\\\", \\\"field\\\": \\\"count\\\"}, \\\"y2\\\": {\\\"scale\\\": \\\"y\\\", \\\"value\\\": 0}, \\\"fill\\\": {\\\"value\\\": \\\"#108EE9\\\"}}}}]}, {\\\"from\\\": {\\\"facet\\\": {\\\"name\\\": \\\"data_5\\\", \\\"data\\\": \\\"data_2\\\", \\\"field\\\": \\\"categorical\\\"}}, \\\"encode\\\": {\\\"enter\\\": {\\\"x\\\": {\\\"value\\\": 170}, \\\"width\\\": {\\\"value\\\": 250}, \\\"y\\\": {\\\"field\\\": \\\"graph_offset_categorical\\\"}, \\\"height\\\": {\\\"value\\\": 150}, \\\"fill\\\": {\\\"value\\\": \\\"#ffffff\\\"}, \\\"fillOpacity\\\": {\\\"value\\\": 0}, \\\"stroke\\\": {\\\"value\\\": \\\"#000000\\\"}, \\\"strokeWidth\\\": {\\\"value\\\": 0}}, \\\"update\\\": {\\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+137\\\"}, \\\"clip\\\": {\\\"field\\\": \\\"c_clip_val_cat\\\"}, \\\"width\\\": {\\\"field\\\": \\\"c_width_numeric_val_cat\\\"}}}, \\\"type\\\": \\\"group\\\", \\\"style\\\": \\\"cell\\\", \\\"signals\\\": [{\\\"name\\\": \\\"unit\\\", \\\"value\\\": {}, \\\"on\\\": [{\\\"events\\\": \\\"mousemove\\\", \\\"update\\\": \\\"isTuple(group()) ? group() : unit\\\"}]}, {\\\"name\\\": \\\"pts\\\", \\\"update\\\": \\\"data(\\\\\\\"pts_store\\\\\\\").length && {count: data(\\\\\\\"pts_store\\\\\\\")[0].values[0]}\\\"}, {\\\"name\\\": \\\"pts_tuple\\\", \\\"value\\\": {}, \\\"on\\\": [{\\\"events\\\": [{\\\"source\\\": \\\"scope\\\", \\\"type\\\": \\\"click\\\"}], \\\"update\\\": \\\"datum && item().mark.marktype !== 'group' ? {unit: \\\\\\\"\\\\\\\", encodings: [\\\\\\\"x\\\\\\\"], fields: [\\\\\\\"count\\\\\\\"], values: [datum[\\\\\\\"count\\\\\\\"]]} : null\\\", \\\"force\\\": true}]}, {\\\"name\\\": \\\"pts_modify\\\", \\\"on\\\": [{\\\"events\\\": {\\\"signal\\\": \\\"pts_tuple\\\"}, \\\"update\\\": \\\"modify(\\\\\\\"pts_store\\\\\\\", pts_tuple, true)\\\"}]}], \\\"marks\\\": [{\\\"name\\\": \\\"marks\\\", \\\"type\\\": \\\"rect\\\", \\\"style\\\": [\\\"bar\\\"], \\\"from\\\": {\\\"data\\\": \\\"data_5\\\"}, \\\"encode\\\": {\\\"hover\\\": {\\\"fill\\\": {\\\"value\\\": \\\"#7EC2F3\\\"}}, \\\"update\\\": {\\\"x\\\": {\\\"scale\\\": \\\"x\\\", \\\"field\\\": \\\"count\\\"}, \\\"x2\\\": {\\\"scale\\\": \\\"x\\\", \\\"value\\\": 0}, \\\"y\\\": {\\\"scale\\\": \\\"y\\\", \\\"field\\\": \\\"label\\\"}, \\\"height\\\": {\\\"scale\\\": \\\"y\\\", \\\"band\\\": true}, \\\"fill\\\": {\\\"value\\\": \\\"#108EE9\\\"}}}}], \\\"scales\\\": [{\\\"name\\\": \\\"x\\\", \\\"type\\\": \\\"linear\\\", \\\"domain\\\": {\\\"data\\\": \\\"data_5\\\", \\\"field\\\": \\\"count\\\"}, \\\"range\\\": [0, 250], \\\"nice\\\": true, \\\"zero\\\": true}, {\\\"name\\\": \\\"y\\\", \\\"type\\\": \\\"band\\\", \\\"domain\\\": {\\\"data\\\": \\\"data_5\\\", \\\"field\\\": \\\"label\\\", \\\"sort\\\": {\\\"op\\\": \\\"mean\\\", \\\"field\\\": \\\"label_idx\\\", \\\"order\\\": \\\"descending\\\"}}, \\\"range\\\": [150, 0], \\\"paddingInner\\\": 0.1, \\\"paddingOuter\\\": 0.05}], \\\"axes\\\": [{\\\"orient\\\": \\\"top\\\", \\\"scale\\\": \\\"x\\\", \\\"labelOverlap\\\": true, \\\"tickCount\\\": {\\\"signal\\\": \\\"ceil(width/40)\\\"}, \\\"title\\\": \\\"Count\\\", \\\"zindex\\\": 1}, {\\\"orient\\\": \\\"top\\\", \\\"scale\\\": \\\"x\\\", \\\"domain\\\": false, \\\"grid\\\": true, \\\"labels\\\": false, \\\"maxExtent\\\": 0, \\\"minExtent\\\": 0, \\\"tickCount\\\": {\\\"signal\\\": \\\"ceil(width/40)\\\"}, \\\"ticks\\\": false, \\\"zindex\\\": 0, \\\"gridScale\\\": \\\"y\\\"}, {\\\"scale\\\": \\\"y\\\", \\\"labelOverlap\\\": true, \\\"orient\\\": \\\"left\\\", \\\"title\\\": \\\"Label\\\", \\\"zindex\\\": 1}]}], \\\"type\\\": \\\"group\\\"}], \\\"type\\\": \\\"group\\\"}], \\\"config\\\": {\\\"axis\\\": {\\\"labelFont\\\": \\\"HelveticaNeue-Light, Arial\\\", \\\"labelFontSize\\\": 7, \\\"labelPadding\\\": 10, \\\"labelColor\\\": \\\"#595959\\\", \\\"titleFont\\\": \\\"HelveticaNeue-Light, Arial\\\", \\\"titleFontWeight\\\": \\\"normal\\\", \\\"titlePadding\\\": 9, \\\"titleFontSize\\\": 12, \\\"titleColor\\\": \\\"#595959\\\"}, \\\"axisY\\\": {\\\"minExtent\\\": 30}, \\\"style\\\": {\\\"rect\\\": {\\\"stroke\\\": \\\"rgba(200, 200, 200, 0.5)\\\"}, \\\"group-title\\\": {\\\"fontSize\\\": 20, \\\"font\\\": \\\"HelveticaNeue-Light, Arial\\\", \\\"fontWeight\\\": \\\"normal\\\", \\\"fill\\\": \\\"#595959\\\"}}}}\"; var vega_json_parsed = JSON.parse(vega_json); var toolTipOpts = { showAllFields: true }; if(vega_json_parsed[\"metadata\"] != null){ if(vega_json_parsed[\"metadata\"][\"bubbleOpts\"] != null){ toolTipOpts = vega_json_parsed[\"metadata\"][\"bubbleOpts\"]; }; }; vegaEmbed(\"#vis\", vega_json_parsed).then(function (result) { vegaTooltip.vega(result.view, toolTipOpts); }); </script> </body> </html>' src=\"demo_iframe_srcdoc.htm\"> <p>Your browser does not support iframes.</p> </iframe> </body> </html>" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "train_file = '../data/ratings.dat'\n", "sf = tc.SFrame.read_csv(train_file, header=False, \n", " delimiter='|', verbose=False)\n", "sf = sf.rename({'X1':'user_id', 'X2':'course_id', 'X3':'rating'})\n", "sf.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In order to evaluate the performance of our model, we randomly split the observations in our data set into two partitions: we will use `train_set` when creating our model and `test_set` for evaluating its performance." ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T08:27:25.724942Z", "start_time": "2020-06-13T08:27:25.701413Z" } }, "outputs": [ { "data": { "text/html": [ "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n", " <tr>\n", " <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">user_id</th>\n", " <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">course_id</th>\n", " <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">rating</th>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5.0</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5.0</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">3</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5.0</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">4</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5.0</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5.0</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">6</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5.0</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5.0</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">8</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5.0</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">9</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5.0</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5.0</td>\n", " </tr>\n", "</table>\n", "[2773 rows x 3 columns]<br/>Note: Only the head of the SFrame is printed.<br/>You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.\n", "</div>" ], "text/plain": [ "Columns:\n", "\tuser_id\tint\n", "\tcourse_id\tint\n", "\trating\tfloat\n", "\n", "Rows: 2773\n", "\n", "Data:\n", "+---------+-----------+--------+\n", "| user_id | course_id | rating |\n", "+---------+-----------+--------+\n", "| 1 | 1 | 5.0 |\n", "| 2 | 1 | 5.0 |\n", "| 3 | 1 | 5.0 |\n", "| 4 | 1 | 5.0 |\n", "| 5 | 1 | 5.0 |\n", "| 6 | 1 | 5.0 |\n", "| 7 | 1 | 5.0 |\n", "| 8 | 1 | 5.0 |\n", "| 9 | 1 | 5.0 |\n", "| 10 | 1 | 5.0 |\n", "+---------+-----------+--------+\n", "[2773 rows x 3 columns]\n", "Note: Only the head of the SFrame is printed.\n", "You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns." ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sf" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T08:27:36.184433Z", "start_time": "2020-06-13T08:27:36.175088Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "train_set, test_set = sf.random_split(0.8, seed=1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Popularity model" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Create a model that makes recommendations using item popularity. When no target column is provided, the popularity is determined by the number of observations involving each item. When a target is provided, popularity is computed using the item’s mean target value. When the target column contains ratings, for example, the model computes the mean rating for each item and uses this to rank items for recommendations.\n", "\n", "One typically wants to initially create a simple recommendation system that can be used as a baseline and to verify that the rest of the pipeline works as expected. The `recommender` package has several models available for this purpose. For example, we can create a model that predicts songs based on their overall popularity across all users.\n" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T08:27:48.674044Z", "start_time": "2020-06-13T08:27:48.601904Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "<pre>Preparing data set.</pre>" ], "text/plain": [ "Preparing data set." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre> Data has 2202 observations with 1651 users and 201 items.</pre>" ], "text/plain": [ " Data has 2202 observations with 1651 users and 201 items." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre> Data prepared in: 0.024186s</pre>" ], "text/plain": [ " Data prepared in: 0.024186s" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>2202 observations to process; with 201 unique items.</pre>" ], "text/plain": [ "2202 observations to process; with 201 unique items." ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "popularity_model = tc.popularity_recommender.create(train_set, 'user_id', 'course_id', target = 'rating')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Item similarity Model" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* [Collaborative filtering](http://en.wikipedia.org/wiki/Collaborative_filtering) methods make predictions for a given user based on the patterns of other users' activities. One common technique is to compare items based on their [Jaccard](http://en.wikipedia.org/wiki/Jaccard_index) similarity.This measurement is a ratio: the number of items they have in common, over the total number of distinct items in both sets.\n", "* We could also have used another slightly more complicated similarity measurement, called [Cosine Similarity](http://en.wikipedia.org/wiki/Cosine_similarity). \n", "\n", "If your data is implicit, i.e., you only observe interactions between users and items, without a rating, then use ItemSimilarityModel with Jaccard similarity. \n", "\n", "If your data is explicit, i.e., the observations include an actual rating given by the user, then you have a wide array of options. ItemSimilarityModel with cosine or Pearson similarity can incorporate ratings. In addition, MatrixFactorizationModel, FactorizationModel, as well as LinearRegressionModel all support rating prediction. \n", "\n", "Now data contains three columns: ‘user_id’, ‘item_id’, and ‘rating’.\n", "\n", "itemsim_cosine_model = graphlab.recommender.create(data, \n", " target=’rating’, \n", " method=’item_similarity’, \n", " similarity_type=’cosine’)\n", " \n", "factorization_machine_model = graphlab.recommender.create(data, \n", " target=’rating’, \n", " method=’factorization_model’)\n", "\n", "\n", "In the following code block, we compute all the item-item similarities and create an object that can be used for recommendations." ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T08:28:06.988718Z", "start_time": "2020-06-13T08:28:06.920185Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "<pre>Preparing data set.</pre>" ], "text/plain": [ "Preparing data set." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre> Data has 2202 observations with 1651 users and 201 items.</pre>" ], "text/plain": [ " Data has 2202 observations with 1651 users and 201 items." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre> Data prepared in: 0.021186s</pre>" ], "text/plain": [ " Data prepared in: 0.021186s" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>Training model from provided data.</pre>" ], "text/plain": [ "Training model from provided data." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>Gathering per-item and per-user statistics.</pre>" ], "text/plain": [ "Gathering per-item and per-user statistics." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+--------------------------------+------------+</pre>" ], "text/plain": [ "+--------------------------------+------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| Elapsed Time (Item Statistics) | % Complete |</pre>" ], "text/plain": [ "| Elapsed Time (Item Statistics) | % Complete |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+--------------------------------+------------+</pre>" ], "text/plain": [ "+--------------------------------+------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 3.805ms | 60.5 |</pre>" ], "text/plain": [ "| 3.805ms | 60.5 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 23.105ms | 100 |</pre>" ], "text/plain": [ "| 23.105ms | 100 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+--------------------------------+------------+</pre>" ], "text/plain": [ "+--------------------------------+------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>Setting up lookup tables.</pre>" ], "text/plain": [ "Setting up lookup tables." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>Processing data in one pass using dense lookup tables.</pre>" ], "text/plain": [ "Processing data in one pass using dense lookup tables." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+-------------------------------------+------------------+-----------------+</pre>" ], "text/plain": [ "+-------------------------------------+------------------+-----------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| Elapsed Time (Constructing Lookups) | Total % Complete | Items Processed |</pre>" ], "text/plain": [ "| Elapsed Time (Constructing Lookups) | Total % Complete | Items Processed |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+-------------------------------------+------------------+-----------------+</pre>" ], "text/plain": [ "+-------------------------------------+------------------+-----------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 26.4ms | 0 | 0 |</pre>" ], "text/plain": [ "| 26.4ms | 0 | 0 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 32.479ms | 100 | 201 |</pre>" ], "text/plain": [ "| 32.479ms | 100 | 201 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+-------------------------------------+------------------+-----------------+</pre>" ], "text/plain": [ "+-------------------------------------+------------------+-----------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>Finalizing lookup tables.</pre>" ], "text/plain": [ "Finalizing lookup tables." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>Generating candidate set for working with new users.</pre>" ], "text/plain": [ "Generating candidate set for working with new users." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>Finished training in 0.037491s</pre>" ], "text/plain": [ "Finished training in 0.037491s" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "item_sim_model = tc.item_similarity_recommender.create(\n", " train_set, 'user_id', 'course_id', target = 'rating', \n", " similarity_type='cosine')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Factorization Recommender Model\n", "Create a FactorizationRecommender that learns latent factors for each user and item and uses them to make rating predictions. This includes both standard matrix factorization as well as factorization machines models (in the situation where side data is available for users and/or items). [link](https://dato.com/products/create/docs/generated/graphlab.recommender.factorization_recommender.create.html#graphlab.recommender.factorization_recommender.create)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T08:28:34.411557Z", "start_time": "2020-06-13T08:28:27.155572Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "<pre>Preparing data set.</pre>" ], "text/plain": [ "Preparing data set." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre> Data has 2202 observations with 1651 users and 201 items.</pre>" ], "text/plain": [ " Data has 2202 observations with 1651 users and 201 items." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre> Data prepared in: 0.00781s</pre>" ], "text/plain": [ " Data prepared in: 0.00781s" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>Training factorization_recommender for recommendations.</pre>" ], "text/plain": [ "Training factorization_recommender for recommendations." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+--------------------------------+--------------------------------------------------+----------+</pre>" ], "text/plain": [ "+--------------------------------+--------------------------------------------------+----------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| Parameter | Description | Value |</pre>" ], "text/plain": [ "| Parameter | Description | Value |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+--------------------------------+--------------------------------------------------+----------+</pre>" ], "text/plain": [ "+--------------------------------+--------------------------------------------------+----------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| num_factors | Factor Dimension | 8 |</pre>" ], "text/plain": [ "| num_factors | Factor Dimension | 8 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| regularization | L2 Regularization on Factors | 1e-08 |</pre>" ], "text/plain": [ "| regularization | L2 Regularization on Factors | 1e-08 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| solver | Solver used for training | sgd |</pre>" ], "text/plain": [ "| solver | Solver used for training | sgd |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| linear_regularization | L2 Regularization on Linear Coefficients | 1e-10 |</pre>" ], "text/plain": [ "| linear_regularization | L2 Regularization on Linear Coefficients | 1e-10 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| max_iterations | Maximum Number of Iterations | 50 |</pre>" ], "text/plain": [ "| max_iterations | Maximum Number of Iterations | 50 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+--------------------------------+--------------------------------------------------+----------+</pre>" ], "text/plain": [ "+--------------------------------+--------------------------------------------------+----------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre> Optimizing model using SGD; tuning step size.</pre>" ], "text/plain": [ " Optimizing model using SGD; tuning step size." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre> Using 2202 / 2202 points for tuning the step size.</pre>" ], "text/plain": [ " Using 2202 / 2202 points for tuning the step size." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+---------+-------------------+------------------------------------------+</pre>" ], "text/plain": [ "+---------+-------------------+------------------------------------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| Attempt | Initial Step Size | Estimated Objective Value |</pre>" ], "text/plain": [ "| Attempt | Initial Step Size | Estimated Objective Value |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+---------+-------------------+------------------------------------------+</pre>" ], "text/plain": [ "+---------+-------------------+------------------------------------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 0 | 25 | Not Viable |</pre>" ], "text/plain": [ "| 0 | 25 | Not Viable |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 1 | 6.25 | Not Viable |</pre>" ], "text/plain": [ "| 1 | 6.25 | Not Viable |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 2 | 1.5625 | Not Viable |</pre>" ], "text/plain": [ "| 2 | 1.5625 | Not Viable |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 3 | 0.390625 | 0.122277 |</pre>" ], "text/plain": [ "| 3 | 0.390625 | 0.122277 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 4 | 0.195312 | 0.172024 |</pre>" ], "text/plain": [ "| 4 | 0.195312 | 0.172024 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 5 | 0.0976562 | 0.235897 |</pre>" ], "text/plain": [ "| 5 | 0.0976562 | 0.235897 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 6 | 0.0488281 | 0.338391 |</pre>" ], "text/plain": [ "| 6 | 0.0488281 | 0.338391 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+---------+-------------------+------------------------------------------+</pre>" ], "text/plain": [ "+---------+-------------------+------------------------------------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| Final | 0.390625 | 0.122277 |</pre>" ], "text/plain": [ "| Final | 0.390625 | 0.122277 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+---------+-------------------+------------------------------------------+</pre>" ], "text/plain": [ "+---------+-------------------+------------------------------------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>Starting Optimization.</pre>" ], "text/plain": [ "Starting Optimization." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+---------+--------------+-------------------+-----------------------+-------------+</pre>" ], "text/plain": [ "+---------+--------------+-------------------+-----------------------+-------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |</pre>" ], "text/plain": [ "| Iter. | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+---------+--------------+-------------------+-----------------------+-------------+</pre>" ], "text/plain": [ "+---------+--------------+-------------------+-----------------------+-------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| Initial | 238us | 0.891401 | 0.94414 | |</pre>" ], "text/plain": [ "| Initial | 238us | 0.891401 | 0.94414 | |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+---------+--------------+-------------------+-----------------------+-------------+</pre>" ], "text/plain": [ "+---------+--------------+-------------------+-----------------------+-------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 1 | 285.573ms | 0.879224 | 0.937667 | 0.390625 |</pre>" ], "text/plain": [ "| 1 | 285.573ms | 0.879224 | 0.937667 | 0.390625 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 2 | 352.797ms | 0.511761 | 0.715374 | 0.232267 |</pre>" ], "text/plain": [ "| 2 | 352.797ms | 0.511761 | 0.715374 | 0.232267 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 3 | 444.885ms | 0.296781 | 0.544775 | 0.171364 |</pre>" ], "text/plain": [ "| 3 | 444.885ms | 0.296781 | 0.544775 | 0.171364 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 4 | 535.596ms | 0.215641 | 0.464371 | 0.138107 |</pre>" ], "text/plain": [ "| 4 | 535.596ms | 0.215641 | 0.464371 | 0.138107 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 5 | 594.038ms | 0.156399 | 0.395472 | 0.116824 |</pre>" ], "text/plain": [ "| 5 | 594.038ms | 0.156399 | 0.395472 | 0.116824 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 10 | 1.10s | 0.0319218 | 0.178663 | 0.069464 |</pre>" ], "text/plain": [ "| 10 | 1.10s | 0.0319218 | 0.178663 | 0.069464 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>| 50 | 4.83s | 0.000601022 | 0.0244708 | 0.0207746 |</pre>" ], "text/plain": [ "| 50 | 4.83s | 0.000601022 | 0.0244708 | 0.0207746 |" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>+---------+--------------+-------------------+-----------------------+-------------+</pre>" ], "text/plain": [ "+---------+--------------+-------------------+-----------------------+-------------+" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>Optimization Complete: Maximum number of passes through the data reached.</pre>" ], "text/plain": [ "Optimization Complete: Maximum number of passes through the data reached." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre>Computing final objective value and training RMSE.</pre>" ], "text/plain": [ "Computing final objective value and training RMSE." ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre> Final objective value: 0.000520035</pre>" ], "text/plain": [ " Final objective value: 0.000520035" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<pre> Final training RMSE: 0.0227559</pre>" ], "text/plain": [ " Final training RMSE: 0.0227559" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "factorization_machine_model = tc.recommender.factorization_recommender.create(\n", " train_set, 'user_id', 'course_id', \n", " target='rating')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Model Evaluation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "It's straightforward to use GraphLab to compare models on a small subset of users in the `test_set`. The [precision-recall](http://en.wikipedia.org/wiki/Precision_and_recall) plot that is computed shows the benefits of using the similarity-based model instead of the baseline `popularity_model`: better curves tend toward the upper-right hand corner of the plot. \n", "\n", "The following command finds the top-ranked items for all users in the first 500 rows of `test_set`. The observations in `train_set` are not included in the predicted items." ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "ExecuteTime": { "end_time": "2020-06-13T08:29:09.953482Z", "start_time": "2020-06-13T08:28:59.320534Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "compare_models: using 246 users to estimate model performance\n", "PROGRESS: Evaluate model M0\n", "\n", "Precision and recall summary statistics by cutoff\n", "+--------+-----------------------+-----------------------+\n", "| cutoff | mean_precision | mean_recall |\n", "+--------+-----------------------+-----------------------+\n", "| 1 | 0.0 | 0.0 |\n", "| 2 | 0.0020325203252032514 | 0.0020325203252032514 |\n", "| 3 | 0.0013550135501355016 | 0.0020325203252032522 |\n", "| 4 | 0.0010162601626016257 | 0.0020325203252032514 |\n", "| 5 | 0.0008130081300813005 | 0.0020325203252032522 |\n", "| 6 | 0.0006775067750677507 | 0.002032520325203251 |\n", "| 7 | 0.0005807200929152149 | 0.0020325203252032522 |\n", "| 8 | 0.0010162601626016261 | 0.003048780487804878 |\n", "| 9 | 0.0009033423667570004 | 0.003048780487804878 |\n", "| 10 | 0.0008130081300813008 | 0.003048780487804878 |\n", "+--------+-----------------------+-----------------------+\n", "[10 rows x 3 columns]\n", "\n", "\n", "Overall RMSE: 0.7914933576079455\n", "\n", "Per User RMSE (best)\n", "+---------+------+-------+\n", "| user_id | rmse | count |\n", "+---------+------+-------+\n", "| 1491 | 0.0 | 1 |\n", "+---------+------+-------+\n", "[1 rows x 3 columns]\n", "\n", "\n", "Per User RMSE (worst)\n", "+---------+-------------------+-------+\n", "| user_id | rmse | count |\n", "+---------+-------------------+-------+\n", "| 1615 | 4.166666666666667 | 1 |\n", "+---------+-------------------+-------+\n", "[1 rows x 3 columns]\n", "\n", "\n", "Per Item RMSE (best)\n", "+-----------+------+-------+\n", "| course_id | rmse | count |\n", "+-----------+------+-------+\n", "| 100 | 0.0 | 1 |\n", "+-----------+------+-------+\n", "[1 rows x 3 columns]\n", "\n", "\n", "Per Item RMSE (worst)\n", "+-----------+-------------------+-------+\n", "| course_id | rmse | count |\n", "+-----------+-------------------+-------+\n", "| 207 | 3.006811989100817 | 1 |\n", "+-----------+-------------------+-------+\n", "[1 rows x 3 columns]\n", "\n", "PROGRESS: Evaluate model M1\n", "\n", "Precision and recall summary statistics by cutoff\n", "+--------+----------------------+----------------------+\n", "| cutoff | mean_precision | mean_recall |\n", "+--------+----------------------+----------------------+\n", "| 1 | 0.004065040650406504 | 0.004065040650406504 |\n", "| 2 | 0.004065040650406503 | 0.008130081300813006 |\n", "| 3 | 0.004065040650406504 | 0.012195121951219513 |\n", "| 4 | 0.006097560975609756 | 0.02032520325203252 |\n", "| 5 | 0.006504065040650404 | 0.026422764227642278 |\n", "| 6 | 0.007452574525745261 | 0.038617886178861804 |\n", "| 7 | 0.007549361207897791 | 0.0467479674796748 |\n", "| 8 | 0.006605691056910572 | 0.04674796747967479 |\n", "| 9 | 0.006775067750677508 | 0.052845528455284556 |\n", "| 10 | 0.006097560975609756 | 0.052845528455284556 |\n", "+--------+----------------------+----------------------+\n", "[10 rows x 3 columns]\n", "\n", "\n", "Overall RMSE: 4.563058442299027\n", "\n", "Per User RMSE (best)\n", "+---------+--------------------+-------+\n", "| user_id | rmse | count |\n", "+---------+--------------------+-------+\n", "| 1615 | 0.4626693296432495 | 1 |\n", "+---------+--------------------+-------+\n", "[1 rows x 3 columns]\n", "\n", "\n", "Per User RMSE (worst)\n", "+---------+------+-------+\n", "| user_id | rmse | count |\n", "+---------+------+-------+\n", "| 1704 | 5.0 | 1 |\n", "+---------+------+-------+\n", "[1 rows x 3 columns]\n", "\n", "\n", "Per Item RMSE (best)\n", "+-----------+--------------------+-------+\n", "| course_id | rmse | count |\n", "+-----------+--------------------+-------+\n", "| 212 | 0.9862414979934693 | 1 |\n", "+-----------+--------------------+-------+\n", "[1 rows x 3 columns]\n", "\n", "\n", "Per Item RMSE (worst)\n", "+-----------+------+-------+\n", "| course_id | rmse | count |\n", "+-----------+------+-------+\n", "| 30 | 5.0 | 1 |\n", "+-----------+------+-------+\n", "[1 rows x 3 columns]\n", "\n", "PROGRESS: Evaluate model M2\n", "\n", "Precision and recall summary statistics by cutoff\n", "+--------+-----------------------+-----------------------+\n", "| cutoff | mean_precision | mean_recall |\n", "+--------+-----------------------+-----------------------+\n", "| 1 | 0.0 | 0.0 |\n", "| 2 | 0.0020325203252032527 | 0.0010162601626016263 |\n", "| 3 | 0.0013550135501355016 | 0.0010162601626016263 |\n", "| 4 | 0.0020325203252032527 | 0.0023712737127371277 |\n", "| 5 | 0.002439024390243902 | 0.006436314363143631 |\n", "| 6 | 0.0020325203252032522 | 0.00643631436314363 |\n", "| 7 | 0.0017421602787456444 | 0.00643631436314363 |\n", "| 8 | 0.002540650406504065 | 0.014566395663956639 |\n", "| 9 | 0.0022583559168925034 | 0.01456639566395664 |\n", "| 10 | 0.0020325203252032527 | 0.014566395663956644 |\n", "+--------+-----------------------+-----------------------+\n", "[10 rows x 3 columns]\n", "\n", "\n", "Overall RMSE: 0.85318587520692\n", "\n", "Per User RMSE (best)\n", "+---------+-----------------------+-------+\n", "| user_id | rmse | count |\n", "+---------+-----------------------+-------+\n", "| 1755 | 0.0029666259439427023 | 1 |\n", "+---------+-----------------------+-------+\n", "[1 rows x 3 columns]\n", "\n", "\n", "Per User RMSE (worst)\n", "+---------+-------------------+-------+\n", "| user_id | rmse | count |\n", "+---------+-------------------+-------+\n", "| 1615 | 4.466020835193037 | 1 |\n", "+---------+-------------------+-------+\n", "[1 rows x 3 columns]\n", "\n", "\n", "Per Item RMSE (best)\n", "+-----------+-----------------------+-------+\n", "| course_id | rmse | count |\n", "+-----------+-----------------------+-------+\n", "| 125 | 0.0029666259439427023 | 1 |\n", "+-----------+-----------------------+-------+\n", "[1 rows x 3 columns]\n", "\n", "\n", "Per Item RMSE (worst)\n", "+-----------+--------------------+-------+\n", "| course_id | rmse | count |\n", "+-----------+--------------------+-------+\n", "| 192 | 4.0101849712451685 | 1 |\n", "+-----------+--------------------+-------+\n", "[1 rows x 3 columns]\n", "\n" ] } ], "source": [ "result = tc.recommender.util.compare_models(\n", " test_set, [popularity_model, item_sim_model, factorization_machine_model],\n", " user_sample=.5, skip_set=train_set)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Now let's ask the item similarity model for song recommendations on several users. We first create a list of users and create a subset of observations, `users_ratings`, that pertain to these users." ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "ExecuteTime": { "end_time": "2019-06-15T06:49:39.479200Z", "start_time": "2019-06-15T06:49:39.464634Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "dtype: int\n", "Rows: 100\n", "[232, 363, 431, 738, 1860, 732, 187, 1368, 1753, 764, 926, 1180, 1323, 1742, 1685, 1876, 1232, 614, 1573, 786, 1158, 1072, 863, 695, 454, 1211, 1404, 1242, 696, 444, 349, 1883, 499, 354, 573, 1531, 71, 1889, 312, 578, 1556, 1572, 79, 416, 848, 1805, 550, 1644, 1017, 521, 566, 703, 196, 1401, 533, 1054, 398, 1077, 341, 982, 516, 1473, 1982, 1670, 529, 1544, 1920, 1414, 223, 465, 376, 1231, 193, 202, 1128, 1853, 1907, 1331, 966, 810, 1895, 1704, 267, 1615, 350, 979, 69, 138, 1645, 837, 1841, 1801, 853, 803, 405, 1172, 112, 1653, 937, 1429]" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "K = 10\n", "users = tc.SArray(sf['user_id'].unique().head(100))\n", "users" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Next we use the `recommend()` function to query the model we created for recommendations. The returned object has four columns: `user_id`, `song_id`, the `score` that the algorithm gave this user for this song, and the song's rank (an integer from 0 to K-1). To see this we can grab the top few rows of `recs`:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2019-06-14T16:46:10.176376Z", "start_time": "2019-06-14T16:46:10.152361Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\"><table frame=\"box\" rules=\"cols\">\n", " <tr>\n", " <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">user_id</th>\n", " <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">course_id</th>\n", " <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">score</th>\n", " <th style=\"padding-left: 1em; padding-right: 1em; text-align: center\">rank</th>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">232</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">93</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.21959692239761353</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">1</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">232</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">180</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.2127474546432495</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">2</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">232</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">188</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.20187050104141235</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">3</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">232</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">108</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.17809301614761353</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">4</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">232</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">55</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.1753571629524231</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">5</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">232</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">168</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.16715019941329956</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">6</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">232</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">133</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.16673386096954346</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">7</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">232</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">186</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.16153007745742798</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">8</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">232</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">164</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.1601088047027588</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">9</td>\n", " </tr>\n", " <tr>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">232</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">187</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">0.15785843133926392</td>\n", " <td style=\"padding-left: 1em; padding-right: 1em; text-align: center; vertical-align: top\">10</td>\n", " </tr>\n", "</table>\n", "[10 rows x 4 columns]<br/>\n", "</div>" ], "text/plain": [ "Columns:\n", "\tuser_id\tint\n", "\tcourse_id\tint\n", "\tscore\tfloat\n", "\trank\tint\n", "\n", "Rows: 10\n", "\n", "Data:\n", "+---------+-----------+---------------------+------+\n", "| user_id | course_id | score | rank |\n", "+---------+-----------+---------------------+------+\n", "| 232 | 93 | 0.21959692239761353 | 1 |\n", "| 232 | 180 | 0.2127474546432495 | 2 |\n", "| 232 | 188 | 0.20187050104141235 | 3 |\n", "| 232 | 108 | 0.17809301614761353 | 4 |\n", "| 232 | 55 | 0.1753571629524231 | 5 |\n", "| 232 | 168 | 0.16715019941329956 | 6 |\n", "| 232 | 133 | 0.16673386096954346 | 7 |\n", "| 232 | 186 | 0.16153007745742798 | 8 |\n", "| 232 | 164 | 0.1601088047027588 | 9 |\n", "| 232 | 187 | 0.15785843133926392 | 10 |\n", "+---------+-----------+---------------------+------+\n", "[10 rows x 4 columns]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "recs = item_sim_model.recommend(users=users, k=K)\n", "recs.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To learn what songs these ids pertain to, we can merge in metadata about each song." ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "ExecuteTime": { "end_time": "2019-06-15T06:51:05.644919Z", "start_time": "2019-06-15T06:51:05.291900Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "<pre>Materializing SFrame</pre>" ], "text/plain": [ "Materializing SFrame" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "<html> <body> <iframe style=\"border:0;margin:0\" width=\"1000\" height=\"2400\" srcdoc='<html lang=\"en\"> <head> <script src=\"https://cdnjs.cloudflare.com/ajax/libs/vega/3.0.8/vega.js\"></script> <script src=\"https://cdnjs.cloudflare.com/ajax/libs/vega-embed/3.0.0-rc7/vega-embed.js\"></script> <script src=\"https://cdnjs.cloudflare.com/ajax/libs/vega-tooltip/0.5.1/vega-tooltip.min.js\"></script> <link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdnjs.cloudflare.com/ajax/libs/vega-tooltip/0.5.1/vega-tooltip.min.css\"> <style> .vega-actions > a{ color:white; text-decoration: none; font-family: \"Arial\"; cursor:pointer; padding:5px; background:#AAAAAA; border-radius:4px; padding-left:10px; padding-right:10px; margin-right:5px; } .vega-actions{ margin-top:20px; text-align:center } .vega-actions > a{ background:#999999; } </style> </head> <body> <div id=\"vis\"> </div> <script> var vega_json = \"{\\\"metadata\\\": {\\\"bubbleOpts\\\": {\\\"showAllFields\\\": false, \\\"fields\\\": [{\\\"field\\\": \\\"left\\\"}, {\\\"field\\\": \\\"right\\\"}, {\\\"field\\\": \\\"count\\\"}, {\\\"field\\\": \\\"label\\\"}]}}, \\\"config\\\": {\\\"axis\\\": {\\\"labelPadding\\\": 10, \\\"labelFont\\\": \\\"HelveticaNeue-Light, Arial\\\", \\\"titleColor\\\": \\\"#595959\\\", \\\"titleFont\\\": \\\"HelveticaNeue-Light, Arial\\\", \\\"titleFontWeight\\\": \\\"normal\\\", \\\"labelFontSize\\\": 7, \\\"titleFontSize\\\": 12, \\\"titlePadding\\\": 9, \\\"labelColor\\\": \\\"#595959\\\"}, \\\"axisY\\\": {\\\"minExtent\\\": 30}, \\\"style\\\": {\\\"rect\\\": {\\\"stroke\\\": \\\"rgba(200, 200, 200, 0.5)\\\"}, \\\"group-title\\\": {\\\"fontWeight\\\": \\\"normal\\\", \\\"fontSize\\\": 20, \\\"fill\\\": \\\"#595959\\\", \\\"font\\\": \\\"HelveticaNeue-Light, Arial\\\"}}}, \\\"width\\\": 800, \\\"padding\\\": 8, \\\"data\\\": [{\\\"name\\\": \\\"pts_store\\\"}, {\\\"name\\\": \\\"source_2\\\", \\\"values\\\": [{\\\"num_unique\\\": 5597, \\\"num_row\\\": 5597, \\\"max\\\": 5597.0, \\\"stdev\\\": 1615.714703, \\\"median\\\": 2801.0, \\\"a\\\": 0, \\\"min\\\": 1.0, \\\"title\\\": \\\"course_id\\\", \\\"type\\\": \\\"integer\\\", \\\"numeric\\\": [{\\\"left\\\": -78, \\\"count\\\": 209, \\\"right\\\": 210}, {\\\"left\\\": 210, \\\"count\\\": 288, \\\"right\\\": 498}, {\\\"left\\\": 498, \\\"count\\\": 288, \\\"right\\\": 786}, {\\\"left\\\": 786, \\\"count\\\": 288, \\\"right\\\": 1074}, {\\\"left\\\": 1074, \\\"count\\\": 288, \\\"right\\\": 1362}, {\\\"left\\\": 1362, \\\"count\\\": 288, \\\"right\\\": 1650}, {\\\"left\\\": 1650, \\\"count\\\": 288, \\\"right\\\": 1938}, {\\\"left\\\": 1938, \\\"count\\\": 288, \\\"right\\\": 2226}, {\\\"left\\\": 2226, \\\"count\\\": 288, \\\"right\\\": 2514}, {\\\"left\\\": 2514, \\\"count\\\": 288, \\\"right\\\": 2802}, {\\\"left\\\": 2802, \\\"count\\\": 288, \\\"right\\\": 3090}, {\\\"left\\\": 3090, \\\"count\\\": 288, \\\"right\\\": 3378}, {\\\"left\\\": 3378, \\\"count\\\": 288, \\\"right\\\": 3666}, {\\\"left\\\": 3666, \\\"count\\\": 288, \\\"right\\\": 3954}, {\\\"left\\\": 3954, \\\"count\\\": 288, \\\"right\\\": 4242}, {\\\"left\\\": 4242, \\\"count\\\": 288, \\\"right\\\": 4530}, {\\\"left\\\": 4530, \\\"count\\\": 288, \\\"right\\\": 4818}, {\\\"left\\\": 4818, \\\"count\\\": 288, \\\"right\\\": 5106}, {\\\"left\\\": 5106, \\\"count\\\": 288, \\\"right\\\": 5394}, {\\\"left\\\": 5394, \\\"count\\\": 204, \\\"right\\\": 5682}, {\\\"stop\\\": 5682, \\\"step\\\": 288, \\\"start\\\": -78}], \\\"num_missing\\\": 0, \\\"mean\\\": 2799.0, \\\"categorical\\\": []}, {\\\"num_unique\\\": 5574, \\\"type\\\": \\\"str\\\", \\\"num_row\\\": 5597, \\\"numeric\\\": [], \\\"num_missing\\\": 0, \\\"a\\\": 1, \\\"categorical\\\": [{\\\"label_idx\\\": 0, \\\"percentage\\\": \\\"0.0357334%\\\", \\\"label\\\": \\\"CS-169.1x: Software as a Service\\\", \\\"count\\\": 2}, {\\\"label_idx\\\": 1, \\\"percentage\\\": \\\"0.0357334%\\\", \\\"label\\\": \\\"CSS Fundamentals\\\", \\\"count\\\": 2}, {\\\"label_idx\\\": 2, \\\"percentage\\\": \\\"0.0357334%\\\", \\\"label\\\": \\\"Creating a Responsive HTML Email\\\", \\\"count\\\": 2}, {\\\"label_idx\\\": 3, \\\"percentage\\\": \\\"0.0357334%\\\", \\\"label\\\": \\\"Differential Equations\\\", \\\"count\\\": 2}, {\\\"label_idx\\\": 4, \\\"percentage\\\": \\\"0.0357334%\\\", \\\"label\\\": \\\"Geometry\\\", \\\"count\\\": 2}, {\\\"label_idx\\\": 5, \\\"percentage\\\": \\\"0.0357334%\\\", \\\"label\\\": \\\"HTML\\\", \\\"count\\\": 2}, {\\\"label_idx\\\": 6, \\\"percentage\\\": \\\"0.0357334%\\\", \\\"label\\\": \\\"HTML5 for Beginners\\\", \\\"count\\\": 2}, {\\\"label_idx\\\": 7, \\\"percentage\\\": \\\"0.0357334%\\\", \\\"label\\\": \\\"How to Market Your Business\\\", \\\"count\\\": 2}, {\\\"label_idx\\\": 8, \\\"percentage\\\": \\\"0.0357334%\\\", \\\"label\\\": \\\"How to Start a Business\\\", \\\"count\\\": 2}, {\\\"label_idx\\\": 9, \\\"percentage\\\": \\\"0.0357334%\\\", \\\"label\\\": \\\"Introduction to Databases\\\", \\\"count\\\": 2}, {\\\"label_idx\\\": 10, \\\"percentage\\\": \\\"99.6427%\\\", \\\"label\\\": \\\"Other (5564 labels)\\\", \\\"count\\\": 5577}], \\\"title\\\": \\\"title\\\"}, {\\\"num_unique\\\": 32, \\\"num_row\\\": 5597, \\\"max\\\": 4.9, \\\"stdev\\\": 0.694036, \\\"median\\\": 4.1, \\\"a\\\": 2, \\\"min\\\": 1.3, \\\"title\\\": \\\"avg_rating\\\", \\\"type\\\": \\\"float\\\", \\\"numeric\\\": [{\\\"left\\\": 1.22402, \\\"count\\\": 2, \\\"right\\\": 1.41218}, {\\\"left\\\": 1.41218, \\\"count\\\": 0, \\\"right\\\": 1.60034}, {\\\"left\\\": 1.60034, \\\"count\\\": 0, \\\"right\\\": 1.7885}, {\\\"left\\\": 1.7885, \\\"count\\\": 3, \\\"right\\\": 1.97666}, {\\\"left\\\": 1.97666, \\\"count\\\": 2, \\\"right\\\": 2.16482}, {\\\"left\\\": 2.16482, \\\"count\\\": 1, \\\"right\\\": 2.35298}, {\\\"left\\\": 2.35298, \\\"count\\\": 2, \\\"right\\\": 2.54114}, {\\\"left\\\": 2.54114, \\\"count\\\": 4, \\\"right\\\": 2.7293}, {\\\"left\\\": 2.7293, \\\"count\\\": 4, \\\"right\\\": 2.91746}, {\\\"left\\\": 2.91746, \\\"count\\\": 8, \\\"right\\\": 3.10562}, {\\\"left\\\": 3.10562, \\\"count\\\": 4, \\\"right\\\": 3.29378}, {\\\"left\\\": 3.29378, \\\"count\\\": 13, \\\"right\\\": 3.48194}, {\\\"left\\\": 3.48194, \\\"count\\\": 21, \\\"right\\\": 3.6701}, {\\\"left\\\": 3.6701, \\\"count\\\": 12, \\\"right\\\": 3.85826}, {\\\"left\\\": 3.85826, \\\"count\\\": 29, \\\"right\\\": 4.04642}, {\\\"left\\\": 4.04642, \\\"count\\\": 23, \\\"right\\\": 4.23458}, {\\\"left\\\": 4.23458, \\\"count\\\": 31, \\\"right\\\": 4.42274}, {\\\"left\\\": 4.42274, \\\"count\\\": 28, \\\"right\\\": 4.6109}, {\\\"left\\\": 4.6109, \\\"count\\\": 13, \\\"right\\\": 4.79906}, {\\\"left\\\": 4.79906, \\\"count\\\": 14, \\\"right\\\": 4.98722}, {\\\"missing\\\": true, \\\"count\\\": 5383}, {\\\"stop\\\": 4.98722, \\\"step\\\": 0.18816, \\\"start\\\": 1.22402}], \\\"num_missing\\\": 5383, \\\"mean\\\": 3.937383, \\\"categorical\\\": []}, {\\\"num_unique\\\": 90, \\\"type\\\": \\\"str\\\", \\\"num_row\\\": 5597, \\\"numeric\\\": [], \\\"num_missing\\\": 0, \\\"a\\\": 3, \\\"categorical\\\": [{\\\"label_idx\\\": 0, \\\"percentage\\\": \\\"94.3541%\\\", \\\"label\\\": \\\"Self-paced\\\", \\\"count\\\": 5281}, {\\\"label_idx\\\": 1, \\\"percentage\\\": \\\"1.08987%\\\", \\\"label\\\": \\\"TBA\\\", \\\"count\\\": 61}, {\\\"label_idx\\\": 2, \\\"percentage\\\": \\\"4.55601%\\\", \\\"label\\\": \\\"Other (88 labels)\\\", \\\"count\\\": 255}], \\\"title\\\": \\\"workload\\\"}, {\\\"num_unique\\\": 83, \\\"type\\\": \\\"str\\\", \\\"num_row\\\": 5597, \\\"numeric\\\": [], \\\"num_missing\\\": 0, \\\"a\\\": 4, \\\"categorical\\\": [{\\\"label_idx\\\": 0, \\\"percentage\\\": \\\"94.5685%\\\", \\\"label\\\": \\\"\\\", \\\"count\\\": 5293}, {\\\"label_idx\\\": 1, \\\"percentage\\\": \\\"0.464535%\\\", \\\"label\\\": \\\"Stanford University\\\", \\\"count\\\": 26}, {\\\"label_idx\\\": 2, \\\"percentage\\\": \\\"4.96695%\\\", \\\"label\\\": \\\"Other (81 labels)\\\", \\\"count\\\": 278}], \\\"title\\\": \\\"university\\\"}, {\\\"num_unique\\\": 11, \\\"type\\\": \\\"str\\\", \\\"num_row\\\": 5597, \\\"numeric\\\": [], \\\"num_missing\\\": 0, \\\"a\\\": 5, \\\"categorical\\\": [{\\\"label_idx\\\": 0, \\\"percentage\\\": \\\"96.1408%\\\", \\\"label\\\": \\\"-\\\", \\\"count\\\": 5381}, {\\\"label_idx\\\": 1, \\\"percentage\\\": \\\"3.85921%\\\", \\\"label\\\": \\\"Other (10 labels)\\\", \\\"count\\\": 216}], \\\"title\\\": \\\"difficulty\\\"}, {\\\"num_unique\\\": 14, \\\"type\\\": \\\"str\\\", \\\"num_row\\\": 5597, \\\"numeric\\\": [], \\\"num_missing\\\": 0, \\\"a\\\": 6, \\\"categorical\\\": [{\\\"label_idx\\\": 0, \\\"percentage\\\": \\\"54.4756%\\\", \\\"label\\\": \\\"udemy\\\", \\\"count\\\": 3049}, {\\\"label_idx\\\": 1, \\\"percentage\\\": \\\"37.3593%\\\", \\\"label\\\": \\\"lynda\\\", \\\"count\\\": 2091}, {\\\"label_idx\\\": 2, \\\"percentage\\\": \\\"3.09094%\\\", \\\"label\\\": \\\"coursera\\\", \\\"count\\\": 173}, {\\\"label_idx\\\": 3, \\\"percentage\\\": \\\"1.30427%\\\", \\\"label\\\": \\\"edx\\\", \\\"count\\\": 73}, {\\\"label_idx\\\": 4, \\\"percentage\\\": \\\"3.76988%\\\", \\\"label\\\": \\\"Other (10 labels)\\\", \\\"count\\\": 211}], \\\"title\\\": \\\"provider\\\"}]}, {\\\"transform\\\": [{\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"20\\\", \\\"as\\\": \\\"c_x_axis_back\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+66\\\", \\\"as\\\": \\\"c_main_background\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+43\\\", \\\"as\\\": \\\"c_top_bar\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+59\\\", \\\"as\\\": \\\"c_top_title\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+58\\\", \\\"as\\\": \\\"c_top_type\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+178\\\", \\\"as\\\": \\\"c_rule\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+106\\\", \\\"as\\\": \\\"c_num_rows\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+130\\\", \\\"as\\\": \\\"c_num_unique\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+154\\\", \\\"as\\\": \\\"c_missing\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+105\\\", \\\"as\\\": \\\"c_num_rows_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+130\\\", \\\"as\\\": \\\"c_num_unique_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+154\\\", \\\"as\\\": \\\"c_missing_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+195\\\", \\\"as\\\": \\\"c_frequent_items\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+218\\\", \\\"as\\\": \\\"c_first_item\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+235\\\", \\\"as\\\": \\\"c_second_item\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+252\\\", \\\"as\\\": \\\"c_third_item\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+269\\\", \\\"as\\\": \\\"c_fourth_item\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+286\\\", \\\"as\\\": \\\"c_fifth_item\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+200\\\", \\\"as\\\": \\\"c_mean\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+220\\\", \\\"as\\\": \\\"c_min\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+240\\\", \\\"as\\\": \\\"c_max\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+260\\\", \\\"as\\\": \\\"c_median\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+280\\\", \\\"as\\\": \\\"c_stdev\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+198\\\", \\\"as\\\": \\\"c_mean_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+218\\\", \\\"as\\\": \\\"c_min_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+238\\\", \\\"as\\\": \\\"c_max_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+258\\\", \\\"as\\\": \\\"c_median_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+278\\\", \\\"as\\\": \\\"c_stdev_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+106\\\", \\\"as\\\": \\\"graph_offset\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"toNumber(datum[\\\\\\\"a\\\\\\\"])*300+132\\\", \\\"as\\\": \\\"graph_offset_categorical\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")?false:true\\\", \\\"as\\\": \\\"c_clip_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")?250:0\\\", \\\"as\\\": \\\"c_width_numeric_val\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\")?false:true\\\", \\\"as\\\": \\\"c_clip_val_cat\\\"}, {\\\"type\\\": \\\"formula\\\", \\\"expr\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\")?250:0\\\", \\\"as\\\": \\\"c_width_numeric_val_cat\\\"}], \\\"name\\\": \\\"data_2\\\", \\\"source\\\": \\\"source_2\\\"}], \\\"height\\\": 2180, \\\"$schema\\\": \\\"https://vega.github.io/schema/vega/v4.json\\\", \\\"marks\\\": [{\\\"type\\\": \\\"group\\\", \\\"encode\\\": {\\\"enter\\\": {\\\"strokeWidth\\\": {\\\"value\\\": 0}, \\\"clip\\\": {\\\"value\\\": 0}, \\\"fill\\\": {\\\"value\\\": \\\"#ffffff\\\"}, \\\"width\\\": {\\\"value\\\": 734}, \\\"y\\\": {\\\"value\\\": 0}, \\\"fillOpacity\\\": {\\\"value\\\": 0}, \\\"stroke\\\": {\\\"value\\\": \\\"#000000\\\"}, \\\"height\\\": {\\\"value\\\": 366}, \\\"x\\\": {\\\"value\\\": 0}}}, \\\"marks\\\": [{\\\"type\\\": \\\"group\\\", \\\"encode\\\": {\\\"enter\\\": {\\\"strokeWidth\\\": {\\\"value\\\": 0}, \\\"clip\\\": {\\\"value\\\": 0}, \\\"fill\\\": {\\\"value\\\": \\\"#ffffff\\\"}, \\\"width\\\": {\\\"value\\\": 734}, \\\"y\\\": {\\\"value\\\": 0}, \\\"fillOpacity\\\": {\\\"value\\\": 0}, \\\"stroke\\\": {\\\"value\\\": \\\"#000000\\\"}, \\\"height\\\": {\\\"value\\\": 366}, \\\"x\\\": {\\\"value\\\": 0}}}, \\\"axes\\\": [], \\\"scales\\\": [], \\\"marks\\\": [{\\\"type\\\": \\\"rect\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_main_background\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]\\\"}}, \\\"enter\\\": {\\\"strokeWidth\\\": {\\\"value\\\": 0.5}, \\\"fill\\\": {\\\"value\\\": \\\"#FEFEFE\\\"}, \\\"width\\\": {\\\"value\\\": 700}, \\\"y\\\": {\\\"value\\\": 66}, \\\"fillOpacity\\\": {\\\"value\\\": 1}, \\\"stroke\\\": {\\\"value\\\": \\\"#DEDEDE\\\"}, \\\"height\\\": {\\\"value\\\": 250}, \\\"x\\\": {\\\"value\\\": 33}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"rect\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_top_bar\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]\\\"}}, \\\"enter\\\": {\\\"strokeWidth\\\": {\\\"value\\\": 0.5}, \\\"fill\\\": {\\\"value\\\": \\\"#F5F5F5\\\"}, \\\"width\\\": {\\\"value\\\": 700}, \\\"y\\\": {\\\"value\\\": 43}, \\\"fillOpacity\\\": {\\\"value\\\": 1}, \\\"stroke\\\": {\\\"value\\\": \\\"#DEDEDE\\\"}, \\\"height\\\": {\\\"value\\\": 30}, \\\"x\\\": {\\\"value\\\": 33}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_top_type\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+687\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#595859\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 58}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"''+datum[\\\\\\\"type\\\\\\\"]\\\"}, \\\"x\\\": {\\\"value\\\": 720}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"fontSize\\\": {\\\"value\\\": 12}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_top_title\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+11\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#9B9B9B\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 59}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"''+datum[\\\\\\\"title\\\\\\\"]\\\"}, \\\"x\\\": {\\\"value\\\": 44}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"fontSize\\\": {\\\"value\\\": 15}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"rule\\\", \\\"encode\\\": {\\\"update\\\": {\\\"x2\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+687\\\"}, \\\"y\\\": {\\\"field\\\": \\\"c_rule\\\"}, \\\"y2\\\": {\\\"field\\\": \\\"c_rule\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}}, \\\"enter\\\": {\\\"strokeWidth\\\": {\\\"value\\\": 1}, \\\"y2\\\": {\\\"value\\\": 178}, \\\"x2\\\": {\\\"value\\\": 720}, \\\"y\\\": {\\\"value\\\": 178}, \\\"strokeCap\\\": {\\\"value\\\": \\\"butt\\\"}, \\\"stroke\\\": {\\\"value\\\": \\\"#EDEDEB\\\"}, \\\"x\\\": {\\\"value\\\": 500}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_num_rows\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 106}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"value\\\": \\\"Num. Rows:\\\"}, \\\"x\\\": {\\\"value\\\": 500}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"fontSize\\\": {\\\"value\\\": 12}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_num_unique\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 130}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"value\\\": \\\"Num. Unique:\\\"}, \\\"x\\\": {\\\"value\\\": 500}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"fontSize\\\": {\\\"value\\\": 12}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_missing\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 154}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"value\\\": \\\"Missing:\\\"}, \\\"x\\\": {\\\"value\\\": 500}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"fontSize\\\": {\\\"value\\\": 12}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_num_rows_val\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#5A5A5A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 105}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"toString(format(datum[\\\\\\\"num_row\\\\\\\"], \\\\\\\",\\\\\\\"))\\\"}, \\\"x\\\": {\\\"value\\\": 700}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"fontSize\\\": {\\\"value\\\": 12}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_num_unique_val\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#5A5A5A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 130}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"toString(format(datum[\\\\\\\"num_unique\\\\\\\"], \\\\\\\",\\\\\\\"))\\\"}, \\\"x\\\": {\\\"value\\\": 700}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"fontSize\\\": {\\\"value\\\": 12}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_missing_val\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#5A5A5A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 154}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"toString(format(datum[\\\\\\\"num_missing\\\\\\\"], \\\\\\\",\\\\\\\"))\\\"}, \\\"x\\\": {\\\"value\\\": 700}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"fontSize\\\": {\\\"value\\\": 12}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_frequent_items\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 200}, \\\"fontWeight\\\": {\\\"value\\\": \\\"bold\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\")? \\\\\\\"Frequent Items\\\\\\\":\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 500}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_first_item\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+487\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 200}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 1) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][0][\\\\\\\"label\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 520}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_second_item\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+487\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 200}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 2) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][1][\\\\\\\"label\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 520}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_third_item\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+487\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 200}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 3) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][2][\\\\\\\"label\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 520}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_fourth_item\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+487\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 200}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 4) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][3][\\\\\\\"label\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 520}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_fifth_item\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+487\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 200}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 5) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][4][\\\\\\\"label\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 520}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_first_item\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#7A7A7A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 200}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 1) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][0][\\\\\\\"count\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 700}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_second_item\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#7A7A7A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 200}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 2) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][1][\\\\\\\"count\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 700}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"fontSize\\\": {\\\"value\\\": 10}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_third_item\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#7A7A7A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 200}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 3) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][2][\\\\\\\"count\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 700}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"fontSize\\\": {\\\"value\\\": 10}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_fourth_item\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#7A7A7A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 200}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 4) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][3][\\\\\\\"count\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 700}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"fontSize\\\": {\\\"value\\\": 10}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_fifth_item\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#7A7A7A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 200}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"((datum[\\\\\\\"categorical\\\\\\\"].length >= 5) && (toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"str\\\\\\\"))? toString(datum[\\\\\\\"categorical\\\\\\\"][4][\\\\\\\"count\\\\\\\"]):\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 700}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"fontSize\\\": {\\\"value\\\": 10}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_mean\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 200}, \\\"fontWeight\\\": {\\\"value\\\": \\\"bold\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")? \\\\\\\"Mean:\\\\\\\":\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 500}, \\\"clip\\\": {\\\"value\\\": true}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_min\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 220}, \\\"fontWeight\\\": {\\\"value\\\": \\\"bold\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")? \\\\\\\"Min:\\\\\\\":\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 500}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_max\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 240}, \\\"fontWeight\\\": {\\\"value\\\": \\\"bold\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")? \\\\\\\"Max:\\\\\\\":\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 500}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_median\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 260}, \\\"fontWeight\\\": {\\\"value\\\": \\\"bold\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")? \\\\\\\"Median:\\\\\\\":\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 500}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_stdev\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+467\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#4A4A4A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 280}, \\\"fontWeight\\\": {\\\"value\\\": \\\"bold\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")? \\\\\\\"St. Dev:\\\\\\\":\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 500}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"left\\\"}, \\\"fontSize\\\": {\\\"value\\\": 11}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_mean_val\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#6A6A6A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 198}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")?toString(format(datum[\\\\\\\"mean\\\\\\\"], \\\\\\\",\\\\\\\")):\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 700}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"fontSize\\\": {\\\"value\\\": 10}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_min_val\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#6A6A6A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 218}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")?toString(format(datum[\\\\\\\"min\\\\\\\"], \\\\\\\",\\\\\\\")):\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 700}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"fontSize\\\": {\\\"value\\\": 10}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_max_val\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#6A6A6A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 238}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")?toString(format(datum[\\\\\\\"max\\\\\\\"], \\\\\\\",\\\\\\\")):\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 700}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"fontSize\\\": {\\\"value\\\": 10}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_median_val\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#6A6A6A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 258}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")?toString(format(datum[\\\\\\\"median\\\\\\\"], \\\\\\\",\\\\\\\")):\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 700}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"fontSize\\\": {\\\"value\\\": 10}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"type\\\": \\\"text\\\", \\\"encode\\\": {\\\"update\\\": {\\\"y\\\": {\\\"field\\\": \\\"c_stdev_val\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+667\\\"}}, \\\"enter\\\": {\\\"fontStyle\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#6A6A6A\\\"}, \\\"baseline\\\": {\\\"value\\\": \\\"middle\\\"}, \\\"dx\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"y\\\": {\\\"value\\\": 278}, \\\"fontWeight\\\": {\\\"value\\\": \\\"normal\\\"}, \\\"text\\\": {\\\"signal\\\": \\\"(toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"integer\\\\\\\" || toString(datum[\\\\\\\"type\\\\\\\"]) == \\\\\\\"float\\\\\\\")?toString(format(datum[\\\\\\\"stdev\\\\\\\"], \\\\\\\",\\\\\\\")):\\\\\\\"\\\\\\\"\\\"}, \\\"x\\\": {\\\"value\\\": 700}, \\\"font\\\": {\\\"value\\\": \\\"AvenirNext-Medium\\\"}, \\\"align\\\": {\\\"value\\\": \\\"right\\\"}, \\\"fontSize\\\": {\\\"value\\\": 10}, \\\"dy\\\": {\\\"offset\\\": 0, \\\"value\\\": 0}, \\\"angle\\\": {\\\"value\\\": 0}}}, \\\"from\\\": {\\\"data\\\": \\\"data_2\\\"}}, {\\\"marks\\\": [{\\\"name\\\": \\\"marks\\\", \\\"encode\\\": {\\\"hover\\\": {\\\"fill\\\": {\\\"value\\\": \\\"#7EC2F3\\\"}}, \\\"update\\\": {\\\"x2\\\": {\\\"scale\\\": \\\"x\\\", \\\"field\\\": \\\"right\\\"}, \\\"y\\\": {\\\"scale\\\": \\\"y\\\", \\\"field\\\": \\\"count\\\"}, \\\"y2\\\": {\\\"scale\\\": \\\"y\\\", \\\"value\\\": 0}, \\\"x\\\": {\\\"scale\\\": \\\"x\\\", \\\"field\\\": \\\"left\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#108EE9\\\"}}}, \\\"style\\\": [\\\"rect\\\"], \\\"from\\\": {\\\"data\\\": \\\"new_data\\\"}, \\\"type\\\": \\\"rect\\\"}], \\\"type\\\": \\\"group\\\", \\\"axes\\\": [{\\\"zindex\\\": 1, \\\"scale\\\": \\\"x\\\", \\\"orient\\\": \\\"bottom\\\", \\\"labelOverlap\\\": true, \\\"tickCount\\\": {\\\"signal\\\": \\\"ceil(width/40)\\\"}, \\\"title\\\": \\\"Values\\\"}, {\\\"labels\\\": false, \\\"scale\\\": \\\"x\\\", \\\"orient\\\": \\\"bottom\\\", \\\"grid\\\": true, \\\"gridScale\\\": \\\"y\\\", \\\"zindex\\\": 0, \\\"maxExtent\\\": 0, \\\"tickCount\\\": {\\\"signal\\\": \\\"ceil(width/40)\\\"}, \\\"ticks\\\": false, \\\"minExtent\\\": 0, \\\"domain\\\": false}, {\\\"zindex\\\": 1, \\\"scale\\\": \\\"y\\\", \\\"orient\\\": \\\"left\\\", \\\"labelOverlap\\\": true, \\\"tickCount\\\": {\\\"signal\\\": \\\"ceil(height/40)\\\"}, \\\"title\\\": \\\"Count\\\"}, {\\\"labels\\\": false, \\\"scale\\\": \\\"y\\\", \\\"orient\\\": \\\"left\\\", \\\"grid\\\": true, \\\"gridScale\\\": \\\"x\\\", \\\"zindex\\\": 0, \\\"maxExtent\\\": 0, \\\"tickCount\\\": {\\\"signal\\\": \\\"ceil(height/40)\\\"}, \\\"ticks\\\": false, \\\"minExtent\\\": 0, \\\"domain\\\": false}], \\\"style\\\": \\\"cell\\\", \\\"from\\\": {\\\"facet\\\": {\\\"name\\\": \\\"new_data\\\", \\\"data\\\": \\\"data_2\\\", \\\"field\\\": \\\"numeric\\\"}}, \\\"encode\\\": {\\\"update\\\": {\\\"clip\\\": {\\\"field\\\": \\\"c_clip_val\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+87\\\"}, \\\"width\\\": {\\\"field\\\": \\\"c_width_numeric_val\\\"}}, \\\"enter\\\": {\\\"strokeWidth\\\": {\\\"value\\\": 0}, \\\"fill\\\": {\\\"value\\\": \\\"#ffffff\\\"}, \\\"width\\\": {\\\"value\\\": 250}, \\\"y\\\": {\\\"field\\\": \\\"graph_offset\\\"}, \\\"fillOpacity\\\": {\\\"value\\\": 0}, \\\"stroke\\\": {\\\"value\\\": \\\"#000000\\\"}, \\\"height\\\": {\\\"value\\\": 150}, \\\"x\\\": {\\\"value\\\": 120}}}, \\\"scales\\\": [{\\\"type\\\": \\\"linear\\\", \\\"range\\\": [0, {\\\"signal\\\": \\\"width\\\"}], \\\"name\\\": \\\"x\\\", \\\"domain\\\": {\\\"fields\\\": [\\\"left\\\", \\\"right\\\"], \\\"data\\\": \\\"new_data\\\", \\\"sort\\\": true}, \\\"zero\\\": true, \\\"nice\\\": true}, {\\\"type\\\": \\\"linear\\\", \\\"range\\\": [{\\\"signal\\\": \\\"height\\\"}, 0], \\\"name\\\": \\\"y\\\", \\\"domain\\\": {\\\"data\\\": \\\"new_data\\\", \\\"field\\\": \\\"count\\\"}, \\\"zero\\\": true, \\\"nice\\\": true}], \\\"signals\\\": [{\\\"name\\\": \\\"width\\\", \\\"update\\\": \\\"250\\\"}, {\\\"name\\\": \\\"height\\\", \\\"update\\\": \\\"150\\\"}]}, {\\\"type\\\": \\\"group\\\", \\\"axes\\\": [{\\\"zindex\\\": 1, \\\"scale\\\": \\\"x\\\", \\\"orient\\\": \\\"top\\\", \\\"labelOverlap\\\": true, \\\"tickCount\\\": {\\\"signal\\\": \\\"ceil(width/40)\\\"}, \\\"title\\\": \\\"Count\\\"}, {\\\"labels\\\": false, \\\"scale\\\": \\\"x\\\", \\\"orient\\\": \\\"top\\\", \\\"grid\\\": true, \\\"gridScale\\\": \\\"y\\\", \\\"zindex\\\": 0, \\\"maxExtent\\\": 0, \\\"tickCount\\\": {\\\"signal\\\": \\\"ceil(width/40)\\\"}, \\\"ticks\\\": false, \\\"minExtent\\\": 0, \\\"domain\\\": false}, {\\\"scale\\\": \\\"y\\\", \\\"orient\\\": \\\"left\\\", \\\"title\\\": \\\"Label\\\", \\\"labelOverlap\\\": true, \\\"zindex\\\": 1}], \\\"style\\\": \\\"cell\\\", \\\"from\\\": {\\\"facet\\\": {\\\"name\\\": \\\"data_5\\\", \\\"data\\\": \\\"data_2\\\", \\\"field\\\": \\\"categorical\\\"}}, \\\"scales\\\": [{\\\"type\\\": \\\"linear\\\", \\\"range\\\": [0, 250], \\\"name\\\": \\\"x\\\", \\\"domain\\\": {\\\"data\\\": \\\"data_5\\\", \\\"field\\\": \\\"count\\\"}, \\\"zero\\\": true, \\\"nice\\\": true}, {\\\"paddingInner\\\": 0.1, \\\"type\\\": \\\"band\\\", \\\"paddingOuter\\\": 0.05, \\\"range\\\": [150, 0], \\\"name\\\": \\\"y\\\", \\\"domain\\\": {\\\"data\\\": \\\"data_5\\\", \\\"field\\\": \\\"label\\\", \\\"sort\\\": {\\\"field\\\": \\\"label_idx\\\", \\\"order\\\": \\\"descending\\\", \\\"op\\\": \\\"mean\\\"}}}], \\\"encode\\\": {\\\"update\\\": {\\\"clip\\\": {\\\"field\\\": \\\"c_clip_val_cat\\\"}, \\\"x\\\": {\\\"signal\\\": \\\"datum[\\\\\\\"c_x_axis_back\\\\\\\"]+137\\\"}, \\\"width\\\": {\\\"field\\\": \\\"c_width_numeric_val_cat\\\"}}, \\\"enter\\\": {\\\"strokeWidth\\\": {\\\"value\\\": 0}, \\\"fill\\\": {\\\"value\\\": \\\"#ffffff\\\"}, \\\"width\\\": {\\\"value\\\": 250}, \\\"y\\\": {\\\"field\\\": \\\"graph_offset_categorical\\\"}, \\\"fillOpacity\\\": {\\\"value\\\": 0}, \\\"stroke\\\": {\\\"value\\\": \\\"#000000\\\"}, \\\"height\\\": {\\\"value\\\": 150}, \\\"x\\\": {\\\"value\\\": 170}}}, \\\"marks\\\": [{\\\"name\\\": \\\"marks\\\", \\\"encode\\\": {\\\"hover\\\": {\\\"fill\\\": {\\\"value\\\": \\\"#7EC2F3\\\"}}, \\\"update\\\": {\\\"x2\\\": {\\\"scale\\\": \\\"x\\\", \\\"value\\\": 0}, \\\"y\\\": {\\\"scale\\\": \\\"y\\\", \\\"field\\\": \\\"label\\\"}, \\\"height\\\": {\\\"scale\\\": \\\"y\\\", \\\"band\\\": true}, \\\"x\\\": {\\\"scale\\\": \\\"x\\\", \\\"field\\\": \\\"count\\\"}, \\\"fill\\\": {\\\"value\\\": \\\"#108EE9\\\"}}}, \\\"style\\\": [\\\"bar\\\"], \\\"from\\\": {\\\"data\\\": \\\"data_5\\\"}, \\\"type\\\": \\\"rect\\\"}], \\\"signals\\\": [{\\\"name\\\": \\\"unit\\\", \\\"on\\\": [{\\\"events\\\": \\\"mousemove\\\", \\\"update\\\": \\\"isTuple(group()) ? group() : unit\\\"}], \\\"value\\\": {}}, {\\\"name\\\": \\\"pts\\\", \\\"update\\\": \\\"data(\\\\\\\"pts_store\\\\\\\").length && {count: data(\\\\\\\"pts_store\\\\\\\")[0].values[0]}\\\"}, {\\\"name\\\": \\\"pts_tuple\\\", \\\"on\\\": [{\\\"events\\\": [{\\\"type\\\": \\\"click\\\", \\\"source\\\": \\\"scope\\\"}], \\\"force\\\": true, \\\"update\\\": \\\"datum && item().mark.marktype !== 'group' ? {unit: \\\\\\\"\\\\\\\", encodings: [\\\\\\\"x\\\\\\\"], fields: [\\\\\\\"count\\\\\\\"], values: [datum[\\\\\\\"count\\\\\\\"]]} : null\\\"}], \\\"value\\\": {}}, {\\\"name\\\": \\\"pts_modify\\\", \\\"on\\\": [{\\\"events\\\": {\\\"signal\\\": \\\"pts_tuple\\\"}, \\\"update\\\": \\\"modify(\\\\\\\"pts_store\\\\\\\", pts_tuple, true)\\\"}]}]}]}]}]}\"; var vega_json_parsed = JSON.parse(vega_json); var toolTipOpts = { showAllFields: true }; if(vega_json_parsed[\"metadata\"] != null){ if(vega_json_parsed[\"metadata\"][\"bubbleOpts\"] != null){ toolTipOpts = vega_json_parsed[\"metadata\"][\"bubbleOpts\"]; }; }; vegaEmbed(\"#vis\", vega_json_parsed).then(function (result) { vegaTooltip.vega(result.view, toolTipOpts); }); </script> </body> </html>' src=\"demo_iframe_srcdoc.htm\"> <p>Your browser does not support iframes.</p> </iframe> </body> </html>" ], "text/plain": [ "<IPython.core.display.HTML object>" ] }, "metadata": {}, "output_type": "display_data" }, { "ename": "RuntimeError", "evalue": "Column name course_id does not exist.", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mRuntimeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m~/Applications/anaconda/lib/python3.5/site-packages/turicreate/data_structures/sframe.py\u001b[0m in \u001b[0;36mjoin\u001b[0;34m(self, right, on, how)\u001b[0m\n\u001b[1;32m 4351\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mcython_context\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 4352\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mSFrame\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0m_proxy\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__proxy__\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjoin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mright\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__proxy__\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhow\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mjoin_keys\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4353\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32mturicreate/_cython/cy_sframe.pyx\u001b[0m in \u001b[0;36mturicreate._cython.cy_sframe.UnitySFrameProxy.join\u001b[0;34m()\u001b[0m\n", "\u001b[0;32mturicreate/_cython/cy_sframe.pyx\u001b[0m in \u001b[0;36mturicreate._cython.cy_sframe.UnitySFrameProxy.join\u001b[0;34m()\u001b[0m\n", "\u001b[0;31mRuntimeError\u001b[0m: Column name course_id does not exist.", "\nDuring handling of the above exception, another exception occurred:\n", "\u001b[0;31mRuntimeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m<ipython-input-42-09739df6b78f>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0mcourses\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcourses\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'course_id'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'title'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'provider'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 8\u001b[0;31m \u001b[0mresults\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrecs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjoin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcourses\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mon\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'course_id'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhow\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'inner'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 9\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0;31m#Populate observed user-course data with course info\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/Applications/anaconda/lib/python3.5/site-packages/turicreate/data_structures/sframe.py\u001b[0m in \u001b[0;36mjoin\u001b[0;34m(self, right, on, how)\u001b[0m\n\u001b[1;32m 4350\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4351\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mcython_context\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 4352\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mSFrame\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0m_proxy\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__proxy__\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjoin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mright\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__proxy__\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhow\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mjoin_keys\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4353\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4354\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mfilter_by\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalues\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcolumn_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexclude\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/Applications/anaconda/lib/python3.5/site-packages/turicreate/_cython/context.py\u001b[0m in \u001b[0;36m__exit__\u001b[0;34m(self, exc_type, exc_value, traceback)\u001b[0m\n\u001b[1;32m 47\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshow_cython_trace\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 48\u001b[0m \u001b[0;31m# To hide cython trace, we re-raise from here\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 49\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mexc_type\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mexc_value\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 50\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 51\u001b[0m \u001b[0;31m# To show the full trace, we do nothing and let exception propagate\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mRuntimeError\u001b[0m: Column name course_id does not exist." ] } ], "source": [ "# Get the meta data of the courses\n", "courses = tc.SFrame.read_csv('../data/cursos.dat', header=False, delimiter='|', verbose=False)\n", "courses =courses.rename({'X1':'course_id', 'X2':'title', 'X3':'avg_rating', \n", " 'X4':'workload', 'X5':'university', 'X6':'difficulty', 'X7':'provider'})\n", "courses.show()\n", "\n", "courses = courses[['course_id', 'title', 'provider']]\n", "results = recs.join(courses, on='course_id', how='inner')\n", "\n", "#Populate observed user-course data with course info\n", "userset = frozenset(users)\n", "ix = sf['user_id'].apply(lambda x: x in userset, int) \n", "user_data = sf[ix]\n", "user_data = user_data.join(courses, on='course_id')[['user_id', 'title', 'provider']]" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2019-06-14T16:47:43.829359Z", "start_time": "2019-06-14T16:47:43.778230Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "User: 1\n", "We were told that the user liked these courses: \n", "+-------------------------------+----------+\n", "| title | provider |\n", "+-------------------------------+----------+\n", "| An Introduction to Interac... | coursera |\n", "+-------------------------------+----------+\n", "[1 rows x 2 columns]\n", "\n", "We recommend these other courses:\n", "+-------+----------+\n", "| title | provider |\n", "+-------+----------+\n", "+-------+----------+\n", "[0 rows x 2 columns]\n", "\n", "\n", "User: 2\n", "We were told that the user liked these courses: \n", "+-------------------------------+----------+\n", "| title | provider |\n", "+-------------------------------+----------+\n", "| An Introduction to Interac... | coursera |\n", "+-------------------------------+----------+\n", "[1 rows x 2 columns]\n", "\n", "We recommend these other courses:\n", "+-------+----------+\n", "| title | provider |\n", "+-------+----------+\n", "+-------+----------+\n", "[0 rows x 2 columns]\n", "\n", "\n", "User: 3\n", "We were told that the user liked these courses: \n", "+-------------------------------+----------+\n", "| title | provider |\n", "+-------------------------------+----------+\n", "| An Introduction to Interac... | coursera |\n", "+-------------------------------+----------+\n", "[1 rows x 2 columns]\n", "\n", "We recommend these other courses:\n", "+-------+----------+\n", "| title | provider |\n", "+-------+----------+\n", "+-------+----------+\n", "[0 rows x 2 columns]\n", "\n", "\n", "User: 4\n", "We were told that the user liked these courses: \n", "+-------------------------------+----------+\n", "| title | provider |\n", "+-------------------------------+----------+\n", "| A Beginner's Guide to ... | coursera |\n", "| Gamification | coursera |\n", "+-------------------------------+----------+\n", "[2 rows x 2 columns]\n", "\n", "We recommend these other courses:\n", "+-------+----------+\n", "| title | provider |\n", "+-------+----------+\n", "+-------+----------+\n", "[0 rows x 2 columns]\n", "\n", "\n", "User: 5\n", "We were told that the user liked these courses: \n", "+-------------------------------+----------+\n", "| title | provider |\n", "+-------------------------------+----------+\n", "| Web Intelligence and Big Data | coursera |\n", "+-------------------------------+----------+\n", "[1 rows x 2 columns]\n", "\n", "We recommend these other courses:\n", "+-------+----------+\n", "| title | provider |\n", "+-------+----------+\n", "+-------+----------+\n", "[0 rows x 2 columns]\n", "\n", "\n" ] } ], "source": [ "# Print out some recommendations \n", "for i in range(5):\n", " user = list(users)[i]\n", " print(\"User: \" + str(i + 1))\n", " user_obs = user_data[user_data['user_id'] == user].head(K)\n", " del user_obs['user_id']\n", " user_recs = results[results['user_id'] == str(user)][['title', 'provider']]\n", "\n", " print(\"We were told that the user liked these courses: \")\n", " print (user_obs.head(K))\n", "\n", " print (\"We recommend these other courses:\")\n", " print (user_recs.head(K))\n", "\n", " print (\"\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Readings\n", "- (Looking for more details about the modules and functions? Check out the <a href=\"https://dato.com/products/create/docs/\">API docs</a>.)\n", "- Toby Segaran, 2007, Programming Collective Intelligence. O'Reilly. Chapter 2 Making Recommendations\n", " - programming-collective-intelligence-code/blob/master/chapter2/recommendations.py\n", "- 项亮 2012 推荐系统实践 人民邮电出版社" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 0, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": false, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "48px", "left": "1382.98px", "top": "61.5313px", "width": "164px" }, "toc_section_display": false, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }