You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
)
Also:
- Fix a bug that broke extracting the quotation from a text fragment link because it was normalized first
- Fix a bug that broke extracting the quotation from from a link that had a document fragment
- Log an error rather than throwing when the UI constructs a text fragment link using a URL that already has a text fragment, since some persisted URLs have not been normalized (#494)
- Fix a bug that overwrote MediaExcerpt and UrlLocator entities (without their customizations) because we hadn't configured a MediaExcerpt basis type for our Justification normalization schema.
---------
Signed-off-by: Carl Gieringer <[email protected]>
When I backfilled URL-normalization, I did so with a version of normalizeUrl that always appended a slash to the path if it was missing. This normalized index.html to index.html/ which is not what we want. I had missed this caveat from https://en.wikipedia.org/wiki/URI_normalization#Normalization_process:
However, there is no way to know if a URI path component represents a directory or not. RFC 3986 notes that if the former URI redirects to the latter URI, then that is an indication that they are equivalent.
We should re-run URL normalization without this mistake. We should first probably introduce a URL and normalized URL to help with bugs like this in the future, in case we lose information in the normalization.
#492 added URL normalization and the requesting of canonical URLs. We should backfill these procedures to existing URLs:
url
tonormalizeUrl(url)
.canonical_url_confirmations
. Request the confirmation if none is present.See also #496.
The text was updated successfully, but these errors were encountered: