This application will start up a server when run locally, pull 10 random pages from Wikipedia, and index them to be searched. The server can then be hit with REST requests over HTTP to search against the indexed data.
- JDK SE 1.8 (Higher should be fine but has not been tested)
- Maven (or simply
brew install maven
) - Git
In the command line, cd into the parent directory that you want the project in, then run the following command:
git clone https://github.com/adam-bates/wikisearch.git
Now that we have the project locally, let's go in and build it:
cd wikisearch && mvn clean package
And to run the application, assuming we're in the project's root directory, we'll run:
mvn spring-boot:run
Note: If the above command isn't working, try:
java -jar target/wikisearch-0.0.1-SNAPSHOT.jar
So we know what the application is, and we're able to run it locally, but we still need to know how to use it.
I've defined the API spec below for what is currently implemented, although I'm not formally versioning the project, and there could be updates in the future.
GET /wiki/pages
Returns the list of wiki pages including the search score from the query.
Param | Type | Required | Default Value | Description |
---|---|---|---|---|
query | string | false | null | If exists, acts as a search query to filter results. Uses Apache's Lucene query syntax. |
field | enum | false | content | Data field to search on. Must be "content", "title", or "id" |
limit | integer | false | 10 | Positive integer used to limit results |
{
pagesReturned: integer,
pages: [
{
id: integer,
score: float,
wikiPageId: integer,
wikiPageLink: string,
title: string,
content: string
}
]
}
Example Request: GET /wiki/pages?query=funny&limit=2
Example Response:
{
totalTermsIndexed: 1234,
pages: [
{
id: 0,
score: 0.121,
wikiPageId: 111111,
wikiPageLink: "https://en.wikipedia.org/wiki/Example_Page_1",
title: "Example Page 1,
content: "Some example page content."
},
{
id: 1,
score: 0.089,
wikiPageId: 222222,
wikiPageLink: "https://en.wikipedia.org/wiki/Example_Page_2",
title: "Example Page 2,
content: "Some example page content."
}
]
}
GET /wiki/pages/{id}
Returns the wiki page specified by the id.
Variable | Type | Description |
---|---|---|
id | integer | ID of the expected wiki page result. |
{
id: integer,
score: float,
wikiPageId: integer,
wikiPageLink: string,
title: string,
content: string
}
Example Request: GET /wiki/pages/4
Example Response:
{
id: 4,
score: 1,
wikiPageId: 111111,
wikiPageLink: "https://en.wikipedia.org/wiki/Example_Page_1",
title: "Example Page 1,
content: "Some example page content."
}
GET /wiki/terms
Returns a breakdown of indexed terms, including the total frequency in the index, as well as the number of indexed wiki pages the term is included in.
Param | Type | Required | Default Value | Description |
---|---|---|---|---|
limit | integer | false | 10 | Positive integer used to limit results |
{
totalTermsIndexed: long,
terms: [
{
totalFrequency: long,
wikiPageFrequency: integer,
term: string
}
]
}
Example Request: GET /wiki/terms?limit=2
Example Response:
{
totalTermsIndexed: 1234,
terms: [
{
totalFrequency: 150,
wikiPageFrequency: 10,
term: "the"
},
{
totalFrequency: 120,
wikiPageFrequency: 9,
term: "of"
}
]
}
The application starts a local server (Most likely at http://localhost:8080) using Java's Spring, and was bootstrapped using Spring Boot.
On startup of the server, MediWiki's API is used to pull 10 random pages from Wikipedia and extract their content. It then uses Apache's Lucene to index and store the content of the pages with the title and page ID.
On API requests (see above API definition) it responds with indexed data based on the specific search.