{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Using agate in a Jupyter notebook\n", "\n", "First we import agate. Then we create an agate Table by loading data from a CSV file." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "<agate.table.Table object at 0x10f091b00>\n" ] } ], "source": [ "import agate\n", "\n", "table = agate.Table.from_csv('examples/realdata/ks_1033_data.csv')\n", "\n", "table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 1: What was the total cost to Kansas City area counties?\n", "\n", "To answer this question, we first must filter the table to only those rows which refer to a Kansas City area `county`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1575\n", "15\n" ] } ], "source": [ "kansas_city = table.where(lambda r: r['county'] in ('JACKSON', 'CLAY', 'CASS', 'PLATTE'))\n", "\n", "print(len(table.rows))\n", "print(len(kansas_city.rows))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then print the Sum of the costs of all those rows. (The cost column is named `total_cost`.)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "$3716\n" ] } ], "source": [ "print('$%d' % kansas_city.aggregate(agate.Sum('total_cost')))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 2: Which counties spent the most?\n", "\n", "This question is more complicated. First we group the data by `county`, which gives us a TableSet named `counties`. A TableSet is a group of tables with the same columns." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('ALLEN', 'ANDERSON', 'BARTON', 'BOURBON', 'BROWN', 'BUTLER', 'CHAUTAUQUA', 'CHEROKEE', 'CHEYENNE', 'CLAY', 'COFFEY', 'COMANCHE', 'COWLEY', 'CRAWFORD', 'DECATUR', 'DICKINSON', 'DOUGLAS', 'ELLIS', 'ELLSWORTH', 'FINNEY', 'FORD', 'FRANKLIN', 'GEARY', 'PRATT', 'GRAHAM', 'GRANT', 'GRAY', 'GREELEY', 'GREENWOOD', 'HAMILTON', 'HARVEY', 'HASKELL', 'JACKSON', 'JEFFERSON', 'JOHNSON', 'KIOWA', 'LABETTE', 'LEAVENWORTH', 'LINN', 'LOGAN', 'LYON', 'MARION', 'MARSHALL', 'MCPHERSON', 'MEADE', 'MIAMI', 'MONTGOMERY', 'NEMAHA', 'NEOSHO', 'NORTON', 'OSAGE', 'OTTAWA', 'PAWNEE', 'POTTAWATOMIE', 'RAWLINS', 'RENO', 'RICE', 'RILEY', 'ROOKS', 'SALINE', 'SCOTT', 'SEDGWICK', 'SHAWNEE', 'SHERMAN', 'SMITH', 'STAFFORD', 'SUMNER', 'THOMAS', 'WABAUNSEE', 'WICHITA', 'WILSON', 'WOODSON', 'WYANDOTTE')\n" ] } ], "source": [ "# Group by county\n", "counties = table.group_by('county')\n", "\n", "print(counties.keys())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We then use the `aggregate` function to sum the `total_cost` column for each table in the group. The resulting values are collapsed into a new table, `totals`, which has a row for each county and a column named `total_cost_sum` containing the new total." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('county', 'total_cost_sum')\n" ] } ], "source": [ "# Aggregate totals for all counties\n", "totals = counties.aggregate([\n", " ('total_cost_sum', agate.Sum('total_cost'),)\n", "])\n", "\n", "print(totals.column_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we sort the counties by their total cost, limit the results to the top 10 and then print the results as a text bar chart." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "county total_cost_sum\n", "SEDGWICK 977,174.45 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ \n", "COFFEY 691,749.03 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ \n", "MONTGOMERY 447,581.20 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ \n", "JOHNSON 420,628.00 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ \n", "SALINE 245,450.24 ▓░░░░░░░░░░░░░░░░░░ \n", "FINNEY 171,862.20 ▓░░░░░░░░░░░░░ \n", "BROWN 145,254.96 ▓░░░░░░░░░░░ \n", "KIOWA 97,974.00 ▓░░░░░░░ \n", "WILSON 74,747.10 ▓░░░░░ \n", "FORD 70,780.00 ▓░░░░░ \n", "GREENWOOD 69,722.00 ▓░░░░░ \n", "DOUGLAS 68,069.42 ▓░░░░░ \n", "MIAMI 64,691.09 ▓░░░░░ \n", "LYON 51,236.00 ▓░░░░ \n", "HAMILTON 47,989.00 ▓░░░░ \n", "WYANDOTTE 45,259.50 ▓░░░ \n", "CHEYENNE 35,970.92 ▓░░░ \n", "PRATT 35,745.60 ▓░░░ \n", "FRANKLIN 34,309.00 ▓░░░ \n", "HARVEY 27,387.42 ▓░░ \n", " +-----------------+-----------------+------------------+-----------------+\n", " 0 250,000 500,000 750,000 1,000,000\n" ] } ], "source": [ "totals.order_by('total_cost_sum', reverse=True).limit(20).print_bars('county', 'total_cost_sum', width=100)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.0" } }, "nbformat": 4, "nbformat_minor": 0 }