Skip to content

nimh-dsst/gitpubs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository has the data and code developed for a project assessing github usage in pubmed papers. This work was (mostly) completed during OHBM 2019 hackathon.

The data directory contains a series of csv files that describe basic details of the pubmed papers. These papers can be obtained at the BioC API.

Because getting papers from the API takes a while, you can also download a subset of fulltext papers from this googledrive link (large-ish download. ~300mb). All of these papers are ones that contain the text string github. In the original corpus, there are about 20k papers that contain matches to this string. At present, the zipped directory contains ~17k. Each file in this zip directory contains 100 papers as a list of json objects and can be read as below:

with open('git_papes/papes_0.txt') as infile:
	dat = json.load(infile)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •