-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Switch from tika-parsers to tika-core (#5217)
- Loading branch information
1 parent
c6ded5d
commit 29cf4f2
Showing
5 changed files
with
66 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Fully Support UTF-8 Only For LaTeX Files | ||
|
||
## Context and Problem Statement | ||
|
||
The feature [search for citations](https://github.com/JabRef/help.jabref.org/issues/210) displays the content of LaTeX files. | ||
The LaTeX files are text files and might be encoded arbitrarily. | ||
|
||
## Considered Options | ||
|
||
* Support UTF-8 encoding only | ||
* Support ASCII encoding only | ||
* Support (nearly) all encodings | ||
|
||
## Decision Outcome | ||
|
||
Chosen option: "Support UTF-8 encoding only", because comes out best (see below). | ||
|
||
### Positive Consequences | ||
|
||
* All content of LaTeX files are displayed in JabRef | ||
|
||
### Negative Consequences | ||
|
||
* When a LaTeX files is encoded in another encoding, the user might see strange characters in JabRef | ||
|
||
## Pros and Cons of the Options | ||
|
||
### Support UTF-8 encoding only | ||
|
||
* Good, because covers most tex file encodings | ||
* Good, because easy to implement | ||
* Bad, because does not support encodings used before around 2010 | ||
|
||
### Support ASCII encoding only | ||
|
||
* Good, because easy to implement | ||
* Bad, because does not support any encoding at all | ||
|
||
### Support (nearly) all encodings | ||
|
||
* Good, because easy to implement | ||
* Bad, because it relies on Apache Tika's `CharsetDetector`, which resides in `tika-parsers`. | ||
This causes issues during compilation (see https://github.com/JabRef/jabref/pull/3421#issuecomment-524532832). | ||
Example: `error: module java.xml.bind reads package javax.activation from both java.activation and jakarta.activation`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters