-
Notifications
You must be signed in to change notification settings - Fork 16
Wikipedia terms and conditions
zverok edited this page Jun 17, 2015
·
4 revisions
When using Infoboxer for massive data extraction from Wikipedia, you should consider this:
- Before using the data, you should consider Wikipedia's license. Here is some explanation of how to properly reuse the content
- There's no official API request limits, and documentation explicitly
states that
If you make your requests in series rather than in parallel (i.e. wait for the one request to finish before sending a new request, such that you're never making more than one request at the same time), then you should definitely be fine." here
- Official documentation explicitly requires you to specify User-Agent
header. Infoboxer provides some default header, but docs say:
Don't use the default User-Agent provided by your client library, but make up a custom header that identifies your script or service and provides some type of means of contacting you (e.g., an e-mail address). here
With Infoboxer, you do the latter like this:
UA = 'MyCoolTool/1.1 (http://example.com/MyCoolTool/; [email protected])'
# All requests to all wikis will be with your User-Agent:
Infoboxer.user_agent = UA
# or, alternatively, just for one target site:
client = Infoboxer.wikipedia(user_agent: UA)