-
Notifications
You must be signed in to change notification settings - Fork 2
Work with documents
Adrian Viehweger edited this page Mar 22, 2017
·
4 revisions
Documents are nested hash maps. In zoo, they appear in three forms:
- DotMap object for object-like navigation and quick changes
- dict, where we make changes to a schema more explicit
- JSON formatted string for data movement (backup, sharing etc.)
We'll create the following document using both a dict and a DotMap object. Note that if we wanted to enforce a schema, DotMap is not ideal, because it allows assignment to non-existing keys as well as replacement of existing ones. If schema enforcement is desired, we use zoo.utils.deep_set()
and deep_get()
, with explicit arguments for key creation (force
) and replacement (replace
). For further details on how to use these two functions look at the zoo.utils
test file.
Enforcing a schema:
for i in islice(df.iterrows(), 50):
j = i[1]
if 'influenza b' not in j.isolate.lower():
d = deepcopy(schema) # important, otherwise schema is modified
entries = {
'_id': str(uuid4()),
'metadata.host': j.host.lower(),
'metadata.location': j.country,
'metadata.date': parse_date(j.date),
}
for k, v in entries.items():
# print(k)
deep_set(d, k, v, replace=True)
deep_get(d, 'metadata.alt_id').append({'genbank': j.genbank})
deep_set(d, 'relative.taxonomy.subtype', j.subtype)
deep_set(d, 'derivative.segment_number', j.segment_number, force=True)
deep_set(
d, 'relative.taxonomy.nomenclature',
re.search('\((.*)\)', j.isolate).group(1))
# format: 'Influenza A virus (A/Hong Kong/1/1968(H3N2))'
# returns: 'A/Hong Kong/1/1968(H3N2)'
# stackoverflow, 15864800
deep_set(
d,
'metadata.host_detail',
parse_nomenclature_iav(
deep_get(
d, 'relative.taxonomy.nomenclature'
))['host'].lower(),
force=True)
Being more relaxed about the schema (and prone accidental key assignments):
d = DotMap(schema)
d._id = str(uuid4())
d.metadata.ids.append({'genbank': j.genbank})
d.metadata.host = j.host.lower()
d.metadata.location = j.country
# Parsers for common tasks
d.metadata.date = parse_date(j.date)
# Create attributes that are not present in schema? No problem.
d.metadata.segment_number = j.segment_number
d.relative.taxonomy.subtype = j.subtype
d.derivative.update({'seqlen': j.seqlen})
d.relative.taxonomy.nomenclature = re.search(
'\((.*)\)', j.isolate).group(1)
# format: 'Influenza A virus (A/Hong Kong/1/1968(H3N2))'
# returns: 'A/Hong Kong/1/1968(H3N2)'
# stackoverflow, 15864800
d.metadata.host_detail = parse_nomenclature_iav(
d.relative.taxonomy.nomenclature)['host'].lower()
# easy transformation
dm = DotMap(d)
d = dm.to_dict()