Skip to content
This repository has been archived by the owner on Dec 5, 2020. It is now read-only.

Delete page forbidden #270

Open
amihaiemil opened this issue Sep 18, 2017 · 8 comments
Open

Delete page forbidden #270

amihaiemil opened this issue Sep 18, 2017 · 8 comments

Comments

@amihaiemil
Copy link
Member

delete page does not work, AWS returns status 403 FORBIDDEN: http://charles.amihaiemil.com/logs.html?log=/d0181af5-5eda-444c-8654-eb488275935a.log

@amihaiemil
Copy link
Member Author

@charlesmike delete this page please

@charlesmike
Copy link
Member

@charlesmike delete this page please

@amihaiemil Some steps failed when processing your command. See logs for details.
Try again and if the error persists please, open an issue.

@amihaiemil
Copy link
Member Author

@charlesmike delete this page please

@charlesmike
Copy link
Member

@charlesmike delete this page please

@amihaiemil Some steps failed when processing your command. See logs for details.
Try again and if the error persists please, open an issue.

@amihaiemil
Copy link
Member Author

@SherifWaly I will explain later today what the problem is

@SherifWaly
Copy link
Contributor

@amihaiemil What is the problem here ?

@amihaiemil
Copy link
Member Author

@SherifWaly When we index each page, we use the url in plain format as id (e.g. the id of an indexed document is http://example.com/path/to/page.html).

Now, we use this id for deletion and I think the problem is that, because of the special characters contained in the url (e.g. /), the AWS signature is not generated correctly, thus we get 403 FORBIDDEN when trying to perform the operation.

We need to stop using the plain URL as id, and turn it into a Base64-encoded String instead. I will come back with the details in about an hour :)

@amihaiemil
Copy link
Member Author

amihaiemil commented Oct 13, 2017

@SherifWaly The problem is quite straight forward - everywhere we use the ID, we have the URL of the page and need to turn it into a Base64-encoded String. See an example of encoding here (first answer) -- we don't have Java8, so use the class from Apache Commons (if we don't have the dependency, declare it with maven)

Now, so far, the ID is used in 2 places:

1) When indexing the page/pages

When we index the pages, we turn them into an JSON "bulk", specific to ElasticSearch (more details about _bulk API of ElasticSearch here, if you're curious).

The class responsible for turning page(s) into the bulk object is EsBulkJson -- there, in the method preparePage, you have to encode the page's url before assigning it as ID

2) When we perform the delete page operation

In class AmazonElasticSearch, method delete(final String type, final String id) -- there, the ID also has to be encoded.

Just these 2 changes, and fix failing unit tests, if any.

Don't hurry with this one, you can also do it next week (this weekend I won't have my laptop with me anyway, until Sunday evening)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants