You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The City of Toronto has led the way in Canada with it's initial creation of its data portal and its dataset offerings. In recent years, the open data community has felt that there's been some stagnation in the City open data policies, and the departmental embrace of the underlying principles. There is renewed debate in Toronto about how the City can do better.
During these conversations, City staff stakeholders (in particular Harvey Low) have repeatedly expressed frustration at the metric with which the community shallowly compares progress between cities -- often via dataset counts. They've rightfully brought up that dataset organization greatly colours any comparison. For example, it's frequently mentioned that City of Toronto packages city-wide data together as a dataset, whereas NYC after releases borough-specific datasets.
To be clear, the community critique of City of Toronto open data policy is more nuanced than criticism of the dataset count. (e.g. value of datasets to citizens, rather than numerical criteria). But the city staff definitely have a point: only having dataset count as the overall metric with which to compare between cities does a disservice to the conversation.
It would be great to use the Data Package Spec as a launch point to discuss better metrics, so that criticism can be accounted for in the comparison of open data policy between cities.
Solution
I feel the following would work to resolve the above concerns for tabular data package:
Add a boolean property to describe specific columns as dataColumn.
Add a integer dataPointCount property to resource metadata (and perhaps summed in overall data package metadata).
Since the columns that contain significantly countable data are labelled as such, we can easily script the generation of the data point count. At the portal level, we could then have a much better basis of comparison both within cities (ie. city departments, districts, stewards, etc.) and between cities themselves.
Would the above suggestion be something we'd consider adding to the spec? Obviously, I'm interested in further conversation and other ideas :)
The text was updated successfully, but these errors were encountered:
We are doing lots of work on data quality tooling and specs, which I know you know as you are using goodtables.
I'm super interested in codifying other data points than raw count of published data sets as part of a much wider discussion around open data portals and so on.
In terms of what can be specified in these specs, let's continue this discussion over at #364 and I'll close this for now as a duplicate.
Context
The City of Toronto has led the way in Canada with it's initial creation of its data portal and its dataset offerings. In recent years, the open data community has felt that there's been some stagnation in the City open data policies, and the departmental embrace of the underlying principles. There is renewed debate in Toronto about how the City can do better.
During these conversations, City staff stakeholders (in particular Harvey Low) have repeatedly expressed frustration at the metric with which the community shallowly compares progress between cities -- often via dataset counts. They've rightfully brought up that dataset organization greatly colours any comparison. For example, it's frequently mentioned that City of Toronto packages city-wide data together as a dataset, whereas NYC after releases borough-specific datasets.
To be clear, the community critique of City of Toronto open data policy is more nuanced than criticism of the dataset count. (e.g. value of datasets to citizens, rather than numerical criteria). But the city staff definitely have a point: only having dataset count as the overall metric with which to compare between cities does a disservice to the conversation.
It would be great to use the Data Package Spec as a launch point to discuss better metrics, so that criticism can be accounted for in the comparison of open data policy between cities.
Solution
I feel the following would work to resolve the above concerns for tabular data package:
dataColumn
.dataPointCount
property to resource metadata (and perhaps summed in overall data package metadata).Since the columns that contain significantly countable data are labelled as such, we can easily script the generation of the data point count. At the portal level, we could then have a much better basis of comparison both within cities (ie. city departments, districts, stewards, etc.) and between cities themselves.
Would the above suggestion be something we'd consider adding to the spec? Obviously, I'm interested in further conversation and other ideas :)
The text was updated successfully, but these errors were encountered: