Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add --json option to csvstats with frequency details #1216

Closed
tacman opened this issue Oct 14, 2023 · 3 comments
Closed

add --json option to csvstats with frequency details #1216

tacman opened this issue Oct 14, 2023 · 3 comments
Labels

Comments

@tacman
Copy link

tacman commented Oct 14, 2023

First, a big thanks for publishing csvkit! I only wish I had discovered it earlier.

I like the --csv option for csvstats, and would love to see a --json (or --format=json|csv|text) option. That would allow piping the results to another script, since importing JSON is much easier and faster than importing csv.

I know there are various scripts out there for csvtojson, but as I'm integrating this toolkit into our workflow I'd like to avoid having another tool to install.

Thanks for your consideration.

@tacman tacman changed the title add --json option to csvstats (or --format=json) document csvstat with json Oct 14, 2023
@tacman
Copy link
Author

tacman commented Oct 14, 2023

DOH! This is what I was looking for:

csvstat data.csv --csv | csvjson | jq
[
  {
    "column_id": 1,
    "column_name": "state",
    "type": "Text",
    "nulls": false,
    "unique": 1,
    "min": null,
    "max": null,
    "sum": null,
    "mean": null,
    "median": null,
    "stdev": null,
    "len": 2,
    "freq": "NE"
  },
  {
    "column_id": 2,
    "column_name": "county",
    "type": "Text",

@tacman tacman closed this as completed Oct 14, 2023
@tacman
Copy link
Author

tacman commented Oct 14, 2023

Actually, I'm going to reopen this, and ask that you consider adding a --json option that includes the frequency stats.

csvstat data/subtitles_day.tsv 

15. "movieKind"

	Type of data:          Text
	Contains null values:  False
	Unique values:         2
	Longest value:         5 characters
	Most common values:    tv (1443x)
	                       movie (413x)

csvstat data/subtitles_day.tsv | csvjson | jq 

{
    "column_id": 15,
    "column_name": "movieKind",
    "type": "Text",
    "nulls": false,
    "unique": 2,
    "min": null,
    "max": null,
    "sum": null,
    "mean": null,
    "median": null,
    "stdev": null,
    "len": 5,
    "freq": "tv, movie"
  },
  {

So we lost the frequency data that exists in the text report. Yeah, it's a pain to add that in CSV, but it'd be natural in JSON, e.g.

{ "freq": { "tv": 1443, "movie": 413 }}

@tacman tacman reopened this Oct 14, 2023
@tacman tacman changed the title document csvstat with json add --json option to csvstats with frequency details Oct 14, 2023
@jpmckinney
Copy link
Member

jpmckinney commented Oct 17, 2023

Added

csvstat --json examples/realdata/ks_1033_data.csv -i 2

...
  {
    "column_id": 14,
    "column_name": "federal_supply_class_name",
    "type": "Text",
    "nulls": false,
    "nonnulls": 1575,
    "unique": 30,
    "len": 63,
    "freq": [
      {
        "value": "Guns, through 30 mm",
        "count": 1427
      },
      {
        "value": "Trucks and Truck Tractors, Wheeled",
        "count": 33
      },
      {
        "value": "Vehicular Cab, Body, and Frame Structural Components",
        "count": 28
      },
      {
        "value": "Surface Use Explosive Ordnance Disposal Tools and Equipment",
        "count": 14
      },
      {
        "value": "Medical and Surgical Instruments, Equipment, and Supplies",
        "count": 12
      }
    ]
  }
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants