You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Would estimating timeouts be really helpful rather than lower bounds? I made a proposal for a change in the response from Krill to make the lower bound more explicit, setting the results to -1 and adding a different result key, like total_till_time_exceeded or similar. Would that help?
Estimation could be added, but wouldn't be very sophisticated for VC. It would estimate only on the whole corpus.
Estimating frequencies or some other workaround is required if the frequency query is somewhere deeply hidden like in all collocation analysis functions, but also in simple frequency queries over vectors of queries and vcs.
Just lower bounds would render the whole API client idea useless - maybe unless this happens rarely or can be resolved by a retry or something.
I label this as an enhancement, as estimation would be a completely different feature and I guess would need to be implemented on Krill's side. At least to return the necessary numbers.
As I said: It may not work well with the current numbers we get. I am not an expert in this field of statistics, but I would assume to get a reasonable estimation, we would need a rough percentage of how much of the data in question we already have searched until the timeout - and how much is left. We can give this information for the whole index (i.e. how many documents have we passed in relation to the whole corpus), but as far as I can see, we can't give this information for a VC for now, because a VC is not balanced over the whole corpus/index.
To be able to do that, we would have to calculate the number of documents in the VC and the number of documents in the VC we already have passed (at least roughly). We could do that in a single run.
I see three options for this:
Doing it in the first run everytime. This would slow down all searches.
Doing it only if an "estimation" flag is set.
Doing it after a timeout. This, however, would render the purpose of the timeout meaningless, as the calculation in a redundant run could be quite costly.
At the moment, the warnings are formulated rather unspecifically:
and do not point out the consequences.
The frequency values in the case of timeouts are currently not to be understood as estimates, but as lower bounds - without this being explicit.
Estimating frequencies probably also requires changes in Krill.
The text was updated successfully, but these errors were encountered: