Skip to content

how to pull from pubdb

Alex Ma edited this page Jan 17, 2023 · 20 revisions

Pulling objects from PubDB (i.e., updating catalog so it has the latest changes from PubDB) requires downloading two files (pubdb_output__papers.json and pubdb_output__presentations.json), rerunning the data process scripts, and then making a pull request against the master branch.

This process requires pip3 and virtualenv to be installed, either on your local machine or on cider.caida.org

  1. clone this (catalog-data) repository to a machine that has pip3 and virtualenv installed.

  2. git checkout master to make sure you're in the master branch

  3. git pull to get the latest version of master

  4. git checkout -b pubdb to make a new branch

  5. Download the PubDB export files and rename them

    • download output of https://staff.caida.org/cgi-bin/publicationsdb/editing/PANDA-Papers-json.pl renamed to pubdb_output__papers.json
    • download output of https://staff.caida.org/cgi-bin/publicationsdb/editing/PANDA-Presentations-json.pl renamed to pubdb_output__presentations.json
  6. From the repo root directory, set up the environment and run a test for error messages. You will need to have some modules installed in python3. You can do this using virtualenv.

    • To initialize virtualenv on your machine:
      # First time initial installation only
      virtualenv env
      source env/bin/activate
      pip3 install bs4
      pip3 install nltk
      pip3 install unidecode
      pip3 install pyyaml
      pip3 install requests
      pip3 install jsonschema
    • Once per bash shell, if you haven't activated your virtual env yet, run this to test the build:
      # If you haven't activated virtualenv yet
      # You need only do this once per shell
      source env/bin/activate
      
      make
  7. if there aren't any error messages related to your recent changes

    1. git commit -a . to commit the two files into the pubdb branch, with a commit message indicating the last change(s) to PubDB that you're merging in.
    2. git push to push to changes to the origin repo.
    3. create a pull request to pull pubdb into master (base: master <-- compare: pubdb)
      1. confirm the merge
      2. git branch -d pubdb to delete the pubdb branch