For devs, have a way to only import some of the data set to their local machine #285

ghost · 2021-09-08T16:53:23Z

Is your feature request related to a problem? Please describe.

Importing the full data set takes a long time and takes a lot of disk space.

For dev's, it would be great if there was some flag that just let them import a sample set.

Describe the solution you’d like

Some kind of easy flag to the usual procedure, eg

iati crawler download-and-update --sample

I hacked this; if you edit iati_datastore/iatilib/crawler.py , update_dataset function, in the call to queue.enqueue( update_activities ... add the parameter at_front=True. Then just run the worker as normal, and activities will start to appear in your database straight away - just stop worker after a bit and you have your sample data set of only some activities! (If you don't do this hack it tries to process all datasets before any activities come in, so you sit at an activities=0 for a long time)

Describe alternatives you’ve considered

n/a

Additional context

n/a

The text was updated successfully, but these errors were encountered:

andylolz · 2021-09-08T18:08:26Z

I think I usually do the following:

iati crawler download

…followed by:

iati crawler update --dataset=[dataset name]

I think that will just update a single named dataset?

It does download everything, though – it would be nice to have a workaround for that.

in the call to queue.enqueue( update_activities ... add the parameter at_front=True.

Ooh – that’s really nice! I wonder if there’s any reason we shouldn’t make that modification?! It would be much nicer to prepend queue items than to append them.

ghost · 2021-09-09T09:13:51Z

iati crawler download
iati crawler update --dataset="aidenvironment-2020-01-activities"
iati queue background

Doesn't work for me in a fresh box:

  File "/vagrant/iati_datastore/iatilib/crawler.py", line 284, in update_dataset
    fetch_dataset_metadata(dataset)
  File "/vagrant/iati_datastore/iatilib/crawler.py", line 80, in fetch_dataset_metadata
    d = iatikit.data().datasets.get(dataset.name)
AttributeError: 'NoneType' object has no attribute 'name'

Indeed, the datasets table in the database is empty - is there another stage that needs done?

Ooh – that’s really nice! I wonder if there’s any reason we shouldn’t make that modification?! It would be much nicer to prepend queue items than to append them.

Probably fine to make - it would improve "time till new activity data appears in site" (but do nothing for "total time of data load"), if that's a priority?

#285

ghost · 2022-06-15T11:41:53Z

is there another stage that needs done?

It might be crawler fetch-dataset-list (after download & before update) - maybe just have to run once.

ghost pushed a commit that referenced this issue May 20, 2022

crawler: Change order tasks are added to queue

299fe2d

#285

ghost mentioned this issue May 20, 2022

crawler: Change order tasks are added to queue #360

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For devs, have a way to only import some of the data set to their local machine #285

For devs, have a way to only import some of the data set to their local machine #285

ghost commented Sep 8, 2021

andylolz commented Sep 8, 2021

ghost commented Sep 9, 2021

ghost commented Jun 15, 2022

For devs, have a way to only import some of the data set to their local machine #285

For devs, have a way to only import some of the data set to their local machine #285

Comments

ghost commented Sep 8, 2021

Is your feature request related to a problem? Please describe.

Describe the solution you’d like

Describe alternatives you’ve considered

Additional context

andylolz commented Sep 8, 2021

ghost commented Sep 9, 2021

ghost commented Jun 15, 2022