Skip to content
zverok edited this page Jun 18, 2015 · 8 revisions

Retrieving Wikipedia pages with infoboxer

From Wikipedia

# One page
Infoboxer.wikipedia.get('Argentina')
# also aliased is Infoboxer.wp

# Several pages (in one API request)
Infoboxer.wikipedia.get('Argentina', 'Bolivia', 'Chile')

# From non-English Wikipedia
Infoboxer.wikipedia('fr').get('Argentine')
# or, if it looks cleaner for you
Infoboxer.wikipedia(:fr).get('Argentine')

From sister projects

Wikimedia sister projects are all the publicly available wikis operated by the Wikimedia Foundation, including Wikipedia.

Infoboxer.wiktionary.get('test')
Infoboxer.wikiquote.get('Vonnegut')
Infoboxer.commons.get('Category:Kittens')
Infoboxer.wikivoyage.get('Chiang Mai')

From Wikia wikis

Wikia hosts a lot of of interesting Wikis, all published under copyleft and very interesting to stud. So, Infoboxer provides shortcut for this, too:

# Default language
Infoboxer.wikia('tardis').get('Eleventh Doctor')

# Other language:
Infoboxer.wikia('tardis', :fr).get('Onzième Docteur')

From any MediaWiki installation

As simple as that:

Infoboxer.wiki('http://mydomain.com').get('My Product')

Note: this assumes you have api.php installed as usual at /w/api.php. If it is not so, use slightly more verbose version with full api URL:

Infoboxer.wiki('http://mydomain.com/myapipath/api.php').get('My Product')

Setting User-Agent header

(You should do it before any significant amount of data extraction, per [Wikipedia terms|Wikipedia terms and conditions]):

UA = 'MyCoolTool/1.1 (http://example.com/MyCoolTool/; [email protected])'

# All requests to all wikis will be with your User-Agent:
Infoboxer.user_agent = UA

# or, alternatively, just for one target site:
client = Infoboxer.wikipedia(user_agent: UA)

Next: Extracting information