Skip to content

Implementing multiple citations per post

cjlee112 edited this page Oct 8, 2013 · 3 revisions

Background

Discussed in issue 80.

Stage 1: basic support for multiple citations

outline: add citationType to Post, citations array to Paper. Add spnet.incoming support for multiple citations, following rule above for finding "primary" paper. Edit template to display citations from Paper.citations.

What info should Paper.citations array store? Tradeoff between providing an efficient display of other citations (without running extra queries), vs. necessity of updating multiple records when these data are changed. Candidates:

  • citationType: specifies relation to this paper
  • author: name and ID? (duplicated on Post record)
  • post title (duplicated on Post record)
  • post date (duplicated on Post record)

Of these, only title seems potentially useful to change. And having potential inconsistency between title shown on a cited paper vs. on the main Post wouldn't be that big a problem.

In general, we're moving towards just showing a summary line for each post (possible with a Show Text toggle). The above list seems adequate for that summary line.

I'm imagining this as looking like a little "bibliography" at the bottom of the page...

Stage 2: Refactoring Recommendations vs Posts

OK, I've implemented multi-paper citation support and citationType, in the form of a new Citation class. An obvious corollary of this work is that we should get rid of the Recommendation class: it should just be a Post with citationType indicating a recommendation. The fact that we have these two nearly identical classes, which then have to be handled individually by various pieces of code, has resulted in a lot of extra complexity and redundant code. Getting rid of this will make the code both simpler and more general, so I see this as win-win. Shouldn't be too hard: the two classes are virtually identical.

A few decisions to make:

  • should we retain a separate template for displaying recommendations, or just use one template? Ah, the joy of diff: the current get_rec.html and get_post.html templates are almost identical. So merge to one template.
  • should we retain the /papers/PAPERID/recs/AUTHORID REST interface? It's possible that people have linked to those URLs, so perhaps we shouldn't break them? Note however one minor catch: going forward, it will now be possible for someone to write more than one rec for the same paper (whereas this REST interface assumes only ONE rec from that person). Even if we retain this REST URI, it should be considered deprecated; all spnet templates and code would only generate /posts/POSTID URIs. Hmm, this all hardly seems worth the trouble. How many people out there have linked to this URI? I suspect most people link only to paper URI (e.g. /arxiv/ID). Decision: ask people whether this is really needed.

Additional code to refactor:

  • core: delete Recommendation class (also in connect);
  • incoming:
  • bulk: easy, just use Post instead of Recommendation.
  • apptree: get rid of rootRecs; simplify PostCollection._search(); retain old recs URI?
  • dbclean: write code for marking all posts as citationType="discuss"; moving all recs to become posts with citationType of either recommend or mustread.
  • tests: switch tests to use Post instead of Recommendation.

Initial Testing

After using mongodump to get a copy of the latest database from selectedpapers.net, extracted to my local ./dump directory, I restored into my test platform mongodb via:

/path/to/bin/mongorestore --drop

Then within the Python interpreter I ran the full update procedure to update the database to the new unified posts schema (after first initializing the database connection):

>>> import connect
>>> dbconn = connect.init_connection()
>>> import dbclean
>>> dbclean.unified_posts()
converting recs to posts...
deleting old rec records...
updating deliveries received...
adding multiple citations...
multiple citations for z123hhgg3ymexdd0u22mft5y5vnsdttmb: primary /arxiv/1211.0763
  added citation to /arxiv/1305.6050
...

The last step rescans all existing posts for multiple paper citations. Try taking a look at these posts and papers to see that the interface properly displays these citations (in both directions), e.g. URLs like