Skip to content
zverok edited this page Jun 18, 2015 · 8 revisions

Retrieving Wikipedia pages with infoboxer

From Wikipedia

# One page
# also aliased is Infoboxer.wp

# Several pages (in one API request)
Infoboxer.wikipedia.get('Argentina', 'Bolivia', 'Chile')

# From non-English Wikipedia
# or, if it looks cleaner for you

From sister projects

Wikimedia sister projects are all the publicly available wikis operated by the Wikimedia Foundation, including Wikipedia.

Infoboxer.wikivoyage.get('Chiang Mai')

From Wikia wikis

Wikia hosts a lot of of interesting Wikis, all published under copyleft and very interesting to stud. So, Infoboxer provides shortcut for this, too:

# Default language
Infoboxer.wikia('tardis').get('Eleventh Doctor')

# Other language:
Infoboxer.wikia('tardis', :fr).get('Onzième Docteur')

From any MediaWiki installation

As simple as that:'').get('My Product')

Note: this assumes you have api.php installed as usual at /w/api.php. If it is not so, use slightly more verbose version with full api URL:'').get('My Product')

Setting User-Agent header

(You should do it before any significant amount of data extraction, per [Wikipedia terms|Wikipedia terms and conditions]):

UA = 'MyCoolTool/1.1 (; [email protected])'

# All requests to all wikis will be with your User-Agent:
Infoboxer.user_agent = UA

# or, alternatively, just for one target site:
client = Infoboxer.wikipedia(user_agent: UA)

Next: Extracting information