Variables and associated values coded for D-PLACE societies are organized into datasets, according to their source.
Each dataset is identified by a short textual ID, e.g. "EA" for data originating from the
Ethnographic Atlas or "Binford" for the data from the Hunter- and Gatherer Database. The
data files for a dataset are kept in a subdirectory of datasets
named with the dataset ID and must
consist of the following files:
variables.csv
: The list of variables, coded in a dataset; must contain columnsid
: D-PLACE-wide unique identifier for the variabletitle
definition
type
: one ofOrdinal
,Continuous
,Categorical
category
: comma-separated list of categories a variable belongs to.units
source
changes
notes
data.csv
: The coded values; must contain columnssoc_id
: Reference to a D-PLACE society ID.var_id
: Reference to a D-PLACE variable ID.code
: Reference to a categorical value described incodes.csv
or a literal value.sub_case
year
comment
references
: Semicolon-separated list of reference keys.source_coded_data
admin_comment
and may optionally also provide files:
codes.csv
: A list of category descriptions for categorical variables:var_id
: Reference to a D-PLACE variable ID.code
description
name
references.csv
: A list of references:key
: The key used to refer to this source in the datacitation
: The full citation.
societies.csv
: A list of additional societies coded in the dataset with columns:soc_id
: D-PLACE-wide unique identifier for the societyxd_id
pref_name_for_society
ORIG_name_and_ID_in_this_dataset
alt_names_by_society
main_focal_year
HRAF_name_ID
HRAF_link
origLat
origLong
Lat
Long
Comment
on locationglottocode
: Code for the most specific Glottolog languoid which can be assigned to this society.glottocode_comment
: Comment on the assignment of a glottocode to this society.
societies_mapping.csv
: A CSV file mapping society IDs to similar societies in other datasets.
If a dataset provides societies (possibly exclusively), it is considered a "society set" as well (or exclusively). While the D-PLACE web interface distinguishes these two ways of contributing to D-PLACE, the data model does not - because this property can be computed.
For a dataset to be considered for import into D-PLACE it must be registered, i.e. listed in the file index.csv
, which also provides additional metadata for the dataset. index.csv
has the following columns:
id
: The dataset ID, i.e. the name of the subdirectory ofdatasets
the data is kept in.type
: one ofenvironmental
,cultural
.name
description
year
author
reference
: Full citation of the source
Explicit registration may be somewhat redundant in keeping the dataset ID in two places - the registry and the directory name - but allows for better control over what is considered ready for import, thus makes it possible to work on datasets in their "final place" until they are finished.
Each dataset may contribute its own set of societies. Relations among the societies from different datasets are stored in a CSV mapping file societies_mapping.csv
in the form
id,related
<soc-id>,<qualified-soc-id>[;<qualified-soc-id>]*
where <qualified-soc-id>
is a string composed as <dataset-id>: <original name> [<soc-id>]
.
Currently the only type of relation specified in the data is "equivalence", but this may be a misnomer, since this implies that the sets of equivalent societies form a partition of the set of all societies, which is not the case.
Note that changing the xd_id
of a society requires re-computing the D-PLACE
internal society relations.