Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the recording of the search history #9685

Closed
Siedlerchr opened this issue Mar 18, 2023 · 21 comments · Fixed by #9794
Closed

Improve the recording of the search history #9685

Siedlerchr opened this issue Mar 18, 2023 · 21 comments · Fixed by #9794
Assignees
Labels
FirstTimeCodeContribution Triggers GitHub Greeter Workflow good first issue An issue intended for project-newcomers. Varies in difficulty. search

Comments

@Siedlerchr
Copy link
Member

Siedlerchr commented Mar 18, 2023

I played around with this feature locally and I noticed that sometimes if I type the search query slowly enough, some of the prefixes of the query are recorded along the actual query. Not having a search button that starts the search task made it hard to select the correct time to perform the search. Currently, JabRef waits for the search box to be idle for 400ms and then performs the query. This is a good time to show the search results because JabRef will feel more interactive, but not so good for adding the query to the history because the user may be unhappy with the results and continue to modify the query, or he may simply be a slow typist like myself :^)

I gave this some thought and these are a few suggestions:

1. Record query when the search box loses focus
Thinking about the search workflow, it would look something like this:

  1. User wants to search for something
  2. User derives a query that matches the pattern he/she has in mind
  3. User types the query and modifies it until it shows the intended results
  4. User selects the entry he/she is searching for

Looking at the workflow above, it seems best to record the search query after step 4: User selects the entry he/she is searching for a.k.a the search box loses focus.

2. Merge recorded search queries with a certain similarity score that were recorded in a close timestamp
If I want to search for 'hello', in the worst case the search history would record all of the prefixes: 'h', 'he', 'hel', 'hell' and 'hello'. We can develop an algorithm that can understand that all of these queries refer to the same query which is 'hello'. We already have a utility class to calculate the similarity between two strings: StringSimilarity, so that's one piece of the problem solved.

After a query is recorded the algorithm would loop through the list of the recorded queries and decides if the new query could be an update to an older query. It goes like this:

  1. The algorithm notices that 'h' and 'he' were recorded 1s apart and they are similar enough so it would remove 'h' and insert 'he'
  2. The algorithm notices that 'he' and 'hel' were recorded 1.5s apart and they are similar enough so it would remove 'he' and insert 'hel'
  3. etc...

Originally posted by @HoussemNasri in #9647 (comment)

@dkokkotas
Copy link
Contributor

Hello there,

Currently being an academic student in the technology & software engineering field, I pursue my first contribution to OSS, and conceive the present issue as an ideal opportunity to start my journey to contributing.
Therefore, based on your labelling of the issue as a good first issue and my status as a "newcomer", I'm willing to get assigned and handle the issue.

Below, my knowledge background, relatively to the issue's software implementation needs.

  • hold experience in java software development
  • be familiar with
    • build & design and use of databases
    • SQL language

Getting started with JabRef, I have already overviewed the devs documentation, built from source and run successfully the app.

Any further guidelines & directions for the issue's software requirements, appreciated!

Thanks in advance,
Dimitrios K.

@Siedlerchr
Copy link
Member Author

@demetres12 Welcome! Thanks for your interest in JabRef. This sounds great already.
For this issue, I suggest you take a look at the linked PR to see what code is related to this issue. I think @HoussemNasri already did a good job at defining the requirements. I would start with the first task :) That should be relatively straightforward.

If you have any questions just ask here

#9647

@dkokkotas
Copy link
Contributor

@Siedlerchr Thank you for the prompt response! I will keep you informed on the progress.

@ThiloteE ThiloteE moved this from Free to take to Reserved in Good First Issues Mar 20, 2023
@ThiloteE ThiloteE moved this from Free to take to Reserved in Candidates for University Projects Mar 20, 2023
@Alexandra-Stath
Copy link
Contributor

Hello,
I am a student of a technology field University and for my semester course I have been required to work on an Open Source Software project. I would like to contribute to this issue in collaboration with Dimitrios by taking on the second task.
Please assign me this issue

Alexandra Stathopoulou

@HoussemNasri
Copy link
Member

HoussemNasri commented Mar 22, 2023

@Alexandra-Stath These are more like solution suggestions, more than tasks. They can be implemented together, but I think it would be overkill. I say we implement one of them and wait for user feedback. If someone asks for it then we shall provide.

In the meantime, you are welcome to check other good first issues; although most of them are assigned. Have a look at the issues and if you find an assigned issue that had no activity for like a month mention me and I'll assign you.

@dkokkotas
Copy link
Contributor

What?

Something else we could take into consideration, relatively to the search recording (this time ui-oriented issue) can be described generally as the addition of a pop-up window, providing info about:

  • previous searches' content
  • date & time of searching
  • a column of checkboxes, in case user desires to remove some of them
  • any other option we consider as not redundant.

How?

So, how i conceive that step-by-step in a simplistic way:

  1. User searches for sth --> finds desirable results --> selects & completes search --> search recorded
  2. In case he/she needs to track the search history --> selects "Search from history..."
  3. A pop-up window surfaces, (1) presenting basic data referring to the search log (2) allowing user to edit accordingly the log

Why?

  • improve and enhance UX
  • increase interactivity and user-friendliness
  • right now the user cannot clear a part of a history, but only remove the whole history

Thanks in advance for your time.
Dimitrios K.

@HoussemNasri
Copy link
Member

Hi @demetres12

Thank you for giving this more thought

The pop-up window approach was suggested in a recent PR #9659. The PR was not merged because it was a duplicate. Carl had a few comments about the approach, but nothing that would have prevented merging it. He also suggested using an auto-complete pop-up below the search bar.

However, I must point out that the scope of the issue is smaller than you suggest. Of course, you are free to put on more work on this; it will certainly make our users happy, but you don't have to.

Thanks again!

@dkokkotas
Copy link
Contributor

my wrong, didn't check if the issue was referred elsewhere.

@dkokkotas
Copy link
Contributor

@yenniejunvu
As you continuously intervene & attempt to get involved in ALREADY assigned issues (you also created a pr for the #9699, when Alexanda-Stath stated that as a part of a uni project it is not allowed to cooperate with other contributors).
I am wondering if you consider your actions as proper and fair.
Obviously, open source projects are about teamwork, collaboration and a code of conduct. I strongly believe you don't accept and respect work ethics. Our time and effort.

@ThiloteE "In principle, yes you can submit a PR for this issue, even though it has been reserved for another student or group of students.". How we define reserved and assigned?

Regarding the above situation, I request @Siedlerchr, @HoussemNasri, @koppor comment on the aforementioned situation.

Looking forward to seeing your actions.

@yenniejunvu
Copy link
Contributor

@demetres12 I am so sorry about that, for this issue I did not see that you were assigned as I was just perusing good first issues. That is completely my bad. Merging my branches are not necessary / do not require review. I can delete the branch if you would like.

As for the PR for #9699, I naively assumed that you guys were not working on it anymore. I kept my PR as a draft and let your partner know that I would not let it be merged unless she was okay with it, but did not get a response. Still, I should not have made that assumption, and for that I apologize.

I will no longer work on this current branch, but feel free to take a look at the draft I have so far if you'd like. Again, sorry about that, I will be more aware in the future.

@dkokkotas
Copy link
Contributor

@yenniejunvu
Thank you for the quick response.
I would appreciate if you delete the branch as you mentioned, because I need to show my progress from scratch.

@yenniejunvu
Copy link
Contributor

Will do. Again, I apologize. Good luck in your uni course. :)

Additionally, if you are looking for other good first issues for your class, I recommend the issues where you refactor from a JAXB dependancy to STaX. I believe there are a couple of those issues still open. It is not too difficult once you are familiar with both libraries.

@ThiloteE
Copy link
Member

ThiloteE commented Apr 18, 2023

Thank you @yenniejunvu for your cooperation. I hope this was enough to settle the issues.

For context/clarification.
This is my full statement (#9699 (comment)), which has been cited here:

In principle, yes you can submit a PR for this issue, even though it has been reserved for another student or group of students.
We at JabRef are happy this issue is of interest to multiple parties and of course would review all PRs, if there are any.
Collaboration will certainly improve and certify your teamwork skills.

Regardless, I am concerned, since both of you seem to be students. Students are special:

  • Not all grading schemes take teamwork into account --> Might be harder for your lecturer to grade you.
  • Multiple people working on the same issue might produce multiple pull-requests. --> JabRef developers may not merge all of them, if they provide similar functionality --> Will this affect your grade?

For the above concerns, unless you explicitly can come to terms with these limitations and Alexandra-Stath also agrees, I would advise to choose another issue. One that is not yet reserved.

Because I have been cited to support an argument, I want to explain myself. The above and following lines are my (current) personal opinion, which may or may not represent other JabRef maintainers. I will make sure to raise this issue in the next developers call.

  1. About Communication

    I think it is important for anybody who is not assigned to inquire about the current state of affairs of an issue, if somebody is already assigned. To honour unknown variables, out of respect and fairness, communication is necessary to come to a common understanding.

    What I imagine should happen ideally, would probably be something along those lines:

    A: Gets assigned to fix this issue
    B: (not assigned): can we do this together?
    A: No, sorry. (optional: because of reasons x, y, z)
    B: ok, I will look for another issue.
    --> Pull-request by A.

    OR

    A: Gets assigned to fix this issue
    B: (not assigned): Can we do this together?
    A: Yes, I am happy to. Let's do a voice/video call. Here is how you can reach me: Text me via JabRef's Gitter.
    B: Cool! I will text you.
    B: Gets assigned to fix this issue too.
    --> Either a common pull-request
    --> or two pull-requests and the better one gets merged.

  2. Assignees / reserved

    We at JabRef try to keep trouble away from students and their lecturers alike. Multiple people working on fixing the same issue may not be troublesome. In some circumstances it may even be desired, but unfortunately, it CAN be troublesome in others. Especially if grading is involved. Hence, I am personally of the opinion, it is good, if students have their "own" issue and support students that want to keep "their" issue exclusive. For this reason, we have the "assignees" and the "reserved" status in the projects page.

    @yenniejunvu, you can see those statuses in the top right corner of the issue. Not all projects and organisations use these or use them the way JabRef maintainers do. Somebody who is not familiar with these statuses may fail to notice them, so I can understand your initial focus on simply "good first issues". At least, now you know.

  3. Students are expected to choose issues that have the "free to take" status...

    ... but it is not that easy, as the vast majority of students refrain from unassigning themselves from the issue, once they decide to not work on it. From an administrative perspective it is hard to determine when students have given up or simply need a little bit of time. We maintainers usually give them some time and after some weeks or months without activity, we unassign them. In some cases, newcomers communicate they want to take over the issue and if the prior assigned students do not respond to the takeover, we assign the issue to the newcomers.

@dkokkotas
Copy link
Contributor

dkokkotas commented Apr 18, 2023

@ThiloteE
Thank you for the detailed and explanatory approach. Really appreciate your time. As you correctly stated, grading and it's criteria differ from student to student.

@dkokkotas
Copy link
Contributor

dkokkotas commented Apr 22, 2023

Taking into account both suggestions, I think that the merge of recorded search queries, while being more reasonal approach, it could return incorrect results.

Regarding the attachment of a listener on the focus property, we shall consider the following. The user may interrupt the process of typing in the search box and the focus will be lost, but that doesn't necessarily mean that he won't "come back" and complete the search process.

Using the StringSimilarity IsSimilar method, we check if the edit distance between the new search recorded and the older one is less than 4 (based on the METRIC_THRESHOLD variable).

CASE A : UNWANTED MERGE
What if we type "Dimitriou" as an author surname, we modify the search query and type "Dimitriadi". The two aforementioned are different surnames and don't refer to the same author.

There will be one recording for "Dimitriou" and another for "Dimitriadi". So the algorithm checks if there is a similarity between the two of them. Incorrectly,

  • IsSimilar will return true, as if we replace 'o' with 'a', 'u' with 'd' and then add 'i',
  • the METRIC_DISTANCE.distance() will be less than 4.
  • that ends up to an unwanted merge

CASE B : WE NEED MERGE

Another example, is if we want to type "Dimitriou", and by mistake we type "Dimt". What we do then is delete of the 't' character and keep going with the rest of the surname.

  • "Dimt" and "Dimitriou" will be recorded.
  • The METRIC_DISTANCE.distance() will be greater than 4 and so
  • IsSimilar method returns false.

So, calculating/checking the number of insertions, deletions, and substitutions for transforming a string to another, may be efficient for "h", "he", "hel" etc, but aren't there occasions that will not provide the desired results?

ABOUT CASE A
Here it comes the concept of close timestamp. Setting a time period limit and checking whether the two recordings were close enough to be merged could optimize the whole process. But how we tell if the time limit is suitable?
Suppose we have already set the limit at 2s and the recording of the two queries doesn't exceed it. Consequently, both string similarity and close timestamp will wrongly merge.

ABOUT CASE B
A combination of both solutions (change listener on focus + merge queries) could come in handy for the case B. If the focus is active/not lost on the search box, and the user modifies "Dimt" to "Dimitriou", there we be only one recording, and that would be the right one. Even in case the focus is lost and the search process still needs modification, string similarity and close timestamp concepts, will be there to prevent undesirable outcome.

My understanding of the improvement of search history storing, is that applying both suggestions could be a way to optimize the recording procedure as there could be many cases that logic errors emerge.

@HoussemNasri
Regarding your suggestion "The algorithm notices that 'he' and 'hel' were recorded 1.5s apart and they are similar enough so it would remove 'he' and insert 'hel'.",

  • what is a proper time period limit between the two recordings (1.5 or 2 seconds)?
  • I think that applying java.lang.System.currentTimeMillis() method could be helpful.

@dkokkotas
Copy link
Contributor

dkokkotas commented Apr 22, 2023

A. Following the merge recording search queries suggestion,
a basic structure of code could be like the following.

import org.jabref.logic.util.strings.StringSimilarity;

////               ////               ////

private final StringSimilarity match;

////               ////               ////

public StateManager() {
        this.match = new StringSimilarity();
    }

////               ////               ////

public void addSearchHistory(String search) {
        searchHistory.remove(search);
        searchHistory.add(search);
        if (searchHistory.size() > 1) {
            outer:
                for (int i = searchHistory.size() - 2; i >= 0; i--) {
                    if (match.isSimilar(search, searchHistory.get(i))) {
                        searchHistory.remove(searchHistory.get(i));
                        break outer;
                   }
              }
       }
}

B. Following the focus listener attachment search queries suggestion,
a basic structure of code could be like the following.

private BooleanProperty searchBoxFocused = new SimpleBooleanProperty(false);

////               ////               ////

public void initialize() {
    searchBoxFocused.bind(textField.focusedProperty());
    searchBoxFocused.addListener((observable, oldPropertyValue, newPropertyValue) -> {
        if (!newPropertyValue) {
           /* haven't already figured out what to do here,
           *  as we just use the listener as the source for the
           *  searchBoxFocused target, as they are
           *  unidirectionally bound
           */ 
        }
    });
}

////               ////               ////

public void addSearchHistory(String search) {
    if (!searchBoxFocused.get()) {
        searchHistory.remove(search);
        searchHistory.add(search);
    }
}

About testing what I considered to be mirroring properly the issue's resolving is stated below

@Test
    void testRedundantPrefixesRecording() {
        StateManager stateManager = new StateManager();
        stateManager.addSearchHistory("N");
        stateManager.addSearchHistory("Ni");
        stateManager.addSearchHistory("Nik");
        stateManager.addSearchHistory("Niko");
        stateManager.addSearchHistory("Nikol");
        stateManager.addSearchHistory("Nikola");
        stateManager.addSearchHistory("Nikolao");
        stateManager.addSearchHistory("Nikolaou");
        List<String> lastSearchHistory = stateManager.getWholeSearchHistory();
        List<String> expected = List.of("Nikolaou");

        Assertions.assertEquals(expected, lastSearchHistory);
    }

Any suggestions & directions for my previous comments, appreciated!

@HoussemNasri
Copy link
Member

HoussemNasri commented Apr 24, 2023

what is a proper time period limit between the two recordings (1.5 or 2 seconds)?

It's a heuristic problem so there are no perfect values, you can play around with different ones and choose the most suitable. Then, you derive a function that given the similarity score and the period between two recordings it can decide whether to replace or add the recording.

Also, the similarity threshold doesn't have to be 4, you can use StringSimilarity#editDistanceIgnoreCase and compare the returned score to a custom threshold.

@dkokkotas
Copy link
Contributor

dkokkotas commented Apr 24, 2023

So regarding the improvement of recording don't you think that implementing both solutions ( focus lost + (string similarity + close timestamp) would be more appropriate. In case the user, for some reason, leave the search box, edit distance check and timestamp could be another mechanism, ensuring avoidance of not proper recordings.
We type "N", "Ni", "Nik", "Niko","Nikol", "Nikola", then we accidentally leave the search bar and finally we come back and complete by entering "Nikolaou". While attaching a change listener will end up in two recordings (i.e. "Nikola", "Nikolaou"), the StringSimilarity class usage, will loop those two and merge them correctly.

@HoussemNasri
Copy link
Member

In my opinion, if I type Nikola then leave the search box and come back and write Nikolaou, I expect both entries to be recorded. As a general rule, having a missing record is worst than having an extra, just not too much extras as the current behavior.

@calixtus
Copy link
Member

devcall:
We examined the challenges associated with this issue and identified the factors contributing to the confusion: the absence of the "First time contribution" label, no apparent activity within three weeks, and a PR submitted without acknowledging the prior assignee. Typically, we encourage assignees to open a draft pull request to receive early feedback, as we have noticed that some assignees may become unresponsive. To prevent confusion for those interested, we will incorporate this step into our future processes. Please maintain your interest in JabRef, as we are always eager to review any submitted PRs. Thanks!

@ThiloteE ThiloteE added the FirstTimeCodeContribution Triggers GitHub Greeter Workflow label Apr 24, 2023
@github-actions
Copy link
Contributor

As a general advice for newcomers: check out Contributing for a start. Also, guidelines for setting up a local workspace is worth having a look at.

Feel free to ask here at GitHub, if you have any issue related questions. If you have questions about how to setup your workspace use JabRef's Gitter chat. Try to open a (draft) pull-request early on, so that people can see you are working on the issue and so that they can see the direction the pull request is heading towards. This way, you will likely receive valuable feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FirstTimeCodeContribution Triggers GitHub Greeter Workflow good first issue An issue intended for project-newcomers. Varies in difficulty. search
Projects
Archived in project
Archived in project
Development

Successfully merging a pull request may close this issue.

7 participants