-
Notifications
You must be signed in to change notification settings - Fork 763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge text handling #3121
base: master
Are you sure you want to change the base?
Huge text handling #3121
Conversation
this actually looks good, I just quickly skimmed through maybe only one concern, I think we should WARN instead of FINE print if a file is skipped because of limits ... (unless FINE is printed by default to log file ... but then I'd like to see those as WARN on console too ... ) |
What happens if index is created with particular limits and then the limits are changed ? |
No file is skipped. It is still included but under the |
I tried to describe that above, but to clarify: You can change If you change
|
sorry, I meant "trimmed" down, not skipped |
@tarzanek , that's done. |
I suppose it would be straight-forward to store a value for uncompressed size in the |
Oh but that would mean decompressing entirely. Probably not a good idea. |
5dca6b3
to
1d15e51
Compare
cdcbea3
to
7559eb7
Compare
Just rebased on master since this needed revision to accommodate the |
I will take a look; also needs rebase. |
7559eb7
to
8ae1950
Compare
Just trivial conflicts upon rebase |
1b51b08
to
2fb1b2e
Compare
5403dfd
to
29aad0e
Compare
Just rebasing for trivial conflicts related to R analyzer and then again after parallel detection merged |
opengrok-indexer/src/test/java/org/opengrok/indexer/index/HugeTextTest.java
Show resolved
Hide resolved
29aad0e
to
358d2f6
Compare
Rebased for trivial conflict in search.jsp |
Also, move some logic properly to AnalyzerGuru that had crept into IndexDatabase.
f6bdc40
to
36245a5
Compare
Rebased for PageConfig.java re-lo, and git automatic-merge took care of it |
Hello,
Please consider for integration this patch to add Huge Text file handling.
Indexer
andConfiguration
get two new settings,hugeTextThresholdBytes
(default 1_000_000) andhugeTextLimitCharacters
(default 5_000_000). The threshold determines when OpenGrok will override aPLAIN
genre file as ahugetext
DATA
file instead. The character limit determines how much to read and index forhugetext
(with contextless truncation); the limit may be zero.hugeTextThresholdBytes
is checked for applicable files with each run, while no state forhugeTextLimitCharacters
is stored. ChanginghugeTextLimitCharacters
after indexing would require touching affected source code files to revise the index.For affected gzip and bzip2 files, changes to either
hugeTextThresholdBytes
orhugeTextLimitCharacters
would require touching affected compressed files to revise the index.Thank you.