Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Data Dictionary Resource View #106

Closed
jqnatividad opened this issue Nov 12, 2014 · 4 comments
Closed

Add Data Dictionary Resource View #106

jqnatividad opened this issue Nov 12, 2014 · 4 comments

Comments

@jqnatividad
Copy link
Contributor

It'd be nice if datasets have a "Data Dictionary" resource view. Basically, it will have:

  • the data types of each column in the datastore
  • range of the column (maybe, even computed from the current values in the datastore)
  • top n values (again, computed)
  • have the ability for the administrator, to add descriptive text for each column (markdown-enabled)

Not only will it be a great reference for dataset users, it can even double as a facility for the admin to unambiguously declare datastore datatypes- e.g. the field "date_created" is epoch time so treat it as a date, not just a number; and conversely, this "value" field is a number, not a timestamp (see ckan/ckan#1964 and as per @amercader, with datastore guessed types ckan/ckan#1794).

A ckanext-datadictionary extension perhaps?

@gjlawran
Copy link

gjlawran commented Jan 7, 2015

+1 @jqnatividad to having a standard structure for a data dictionary for all resources - perhaps starting with DataStore hosted resources -http://docs.ckan.org/en/ckan-2.2/datastore.html - in particular I am interested in making this data dictionary content easily searched to make it easier to discover resources that may share common keys for integration (mashups) purposes.

Not as concerned with markdown, and data-types and statistics - but field names and descriptions would be great - especially making them easily searchable.

@Aaron-M
Copy link

Aaron-M commented Feb 18, 2015

+1 @jqnatividad We are working on an excel addin that assists users to do some basic data QA, and record metadata about their datatset, including creating a Data Dictionary as you describe. We are also working on enabling it to post the data into CKAN (as tab separated files (one for each worksheet), or as a complete xlsx). Once we have it working to our satisfaction we will release it to the community for others to use (we are hopefully pretty close now).
See #3

It would be good if any work on a ckan-datadictionary extension tied in with the format of data dictionary that we are using e.g.

Worksheet WS_Description Field Description Type Units
Raw data Data as entered off the plot sheets with notes Tree tag Unique identifier (metal tag with unique letter and number combination) that marks each ISObas patch Text
Raw data Data as entered off the plot sheets with notes Sample_Date Date of sampling DateTime
Raw data Data as entered off the plot sheets with notes Habitat Habitat that patch was growing in after the experiment was set up (i.e. after the reciprocal transplant Text
Raw data Data as entered off the plot sheets with notes Control y/n for whether the ISObas patch was a control = y (no exclosure), or = n (exclosure) - the opposite of Exclosure Text
Raw data Data as entered off the plot sheets with notes Exclosure y/n for whether the ISObas patch had an exclosure = y (exclosure), = n (no exclosure) - the opposite of Control Text
Raw data Data as entered off the plot sheets with notes Diam_1 Diameter at widest point leaf to leaf (these two diameter measurements may be reversed, i.e. diameter 1 might be the smaller number) Numeric mm
Raw data Data as entered off the plot sheets with notes Diam_2 Diameter at next widest point, leaf to leaf, perpendicular to Diam. 1 (these two diameter measurements may be reversed, i.e. diameter 1 might be the smaller number) Numeric mm
Raw data Data as entered off the plot sheets with notes No_spike Number of spikelets per ISObas patch Numeric Number
Comp exp Raw data reformatted for saving to "Comp exp.txt" as input for "Population monitoring plots script.R" Tag Unique identifier (metal tag with unique letter and number combination) that marks each ISObas patch Text
Comp exp Etc etc

@vithalalapati
Copy link

Did anyone ever get the dictionary working at the resource level?

@jqnatividad
Copy link
Contributor Author

This is now available :)
ckan/ckan#3414

BTW @Aaron-M , did you finish the work on the Excel add-in? I think it will still be useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants