Wikipedia terms and conditions

When using Infoboxer for massive data extraction from Wikipedia, you should consider this:

Before using the data, you should consider Wikipedia's license. Here is some explanation of how to properly reuse the content
There's no official API request limits, and documentation explicitly states that

If you make your requests in series rather than in parallel (i.e. wait for the one request to finish before sending a new request, such that you're never making more than one request at the same time), then you should definitely be fine." here
Official documentation explicitly requires you to specify User-Agent header. Infoboxer provides some default header, but docs say:

Don't use the default User-Agent provided by your client library, but make up a custom header that identifies your script or service and provides some type of means of contacting you (e.g., an e-mail address). here

With Infoboxer, you do the latter like this:

UA = 'MyCoolTool/1.1 (http://example.com/MyCoolTool/; [email protected])'

# All requests to all wikis will be with your User-Agent:
Infoboxer.user_agent = UA

# or, alternatively, just for one target site:
client = Infoboxer.wikipedia(user_agent: UA)

(copyleft) 2015 Victor 'Zverok' Shepelev

Intro
Showcase
Retrieving pages
Extracting data
Advanced topics
Development
- Contributing
- Roadmap
Molybdenum?..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wikipedia terms and conditions

Clone this wiki locally