Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix expansion of bracketed expressions in RegExpBasedFileFinder #7338

Merged
merged 18 commits into from
Jan 24, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ public static String expandBrackets(String pattern, Character keywordDelimiter,
* @param database The {@link BibDatabase} for field resolving. May be null.
* @return a function accepting a bracketed expression and returning the result of expanding it
*/
private static Function<String, String> expandBracketContent(Character keywordDelimiter, BibEntry entry, BibDatabase database) {
public static Function<String, String> expandBracketContent(Character keywordDelimiter, BibEntry entry, BibDatabase database) {
return (String bracket) -> {
String expandedPattern;
List<String> fieldParts = parseFieldAndModifiers(bracket);
Expand Down
62 changes: 41 additions & 21 deletions src/main/java/org/jabref/logic/util/io/RegExpBasedFileFinder.java
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@
import java.nio.file.Path;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.function.BiPredicate;
import java.util.function.Function;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
Expand All @@ -22,8 +22,13 @@
import org.jabref.model.entry.BibEntry;
import org.jabref.model.strings.StringUtil;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

class RegExpBasedFileFinder implements FileFinder {

private static final Logger LOGGER = LoggerFactory.getLogger(RegExpBasedFileFinder.class);

private static final String EXT_MARKER = "__EXTENSION__";

private static final Pattern ESCAPE_PATTERN = Pattern.compile("([^\\\\])\\\\([^\\\\])");
Expand Down Expand Up @@ -58,6 +63,35 @@ public static String expandBrackets(String bracketString, BibEntry entry, BibDat
return expandedStringBuffer.toString();
}

private Pattern createFileNamePattern(String[] fileParts, String extensionRegExp, BibEntry entry) throws IOException {
// Last step: check if the given file can be found in this directory
// Protect the extension marker so that it isn't treated as a bracketed pattern
String filePart = fileParts[fileParts.length - 1].replace("[extension]", EXT_MARKER);

// expandBracketContent is the default function for expanding the content of a bracketed expression [field:modifier]
Function<String, String> expandBracket = BracketedPattern.expandBracketContent(keywordDelimiter, entry, null);
// we want to post-process the content so that it can be used as a regex for finding a file name
Function<String, String> bracketToFileNameRegex = expandBracket.andThen(RegExpBasedFileFinder::toFileNameRegex);

String expandedBracketAsRegexpLiterals = BracketedPattern.expandBrackets(filePart, bracketToFileNameRegex);

String fileNamePattern = expandedBracketAsRegexpLiterals
.replaceAll(EXT_MARKER, extensionRegExp) // Replace the extension marker
.replaceAll("\\\\\\\\", "\\\\");
try {
return Pattern.compile('^' + fileNamePattern + '$', Pattern.CASE_INSENSITIVE);
} catch (PatternSyntaxException e) {
LOGGER.warn("There is a syntax error in the regular expression \"{}\" used to search for a file", fileNamePattern, e);
throw new IOException("There is a syntax error in the regular expression used to search for files", e);
}
}

private static String toFileNameRegex(String content) {
var cleanedContent = FileNameCleaner.cleanFileName(content);
return content.equals(cleanedContent) ? Pattern.quote(content) :
"(" + Pattern.quote(content) + ")|(" + Pattern.quote(cleanedContent) + ")";
}

/**
* Method for searching for files using regexp. A list of extensions and directories can be
* given.
Expand Down Expand Up @@ -184,28 +218,14 @@ private List<Path> findFile(final BibEntry entry, final Path directory, final St
} // End process directory information
}

// Last step: check if the given file can be found in this directory
String filePart = fileParts[fileParts.length - 1].replace("[extension]", EXT_MARKER);
String filenameToLookFor = expandBrackets(filePart, entry, null, keywordDelimiter).replaceAll(EXT_MARKER, extensionRegExp);

try {
final Pattern toMatch = Pattern.compile('^' + filenameToLookFor.replaceAll("\\\\\\\\", "\\\\") + '$',
Pattern.CASE_INSENSITIVE);
BiPredicate<Path, BasicFileAttributes> matcher = (path, attributes) -> toMatch.matcher(path.getFileName().toString()).matches();
resultFiles.addAll(collectFilesWithMatcher(actualDirectory, matcher));
} catch (UncheckedIOException | PatternSyntaxException e) {
throw new IOException("Could not look for " + filenameToLookFor, e);
}

return resultFiles;
}

private List<Path> collectFilesWithMatcher(Path actualDirectory, BiPredicate<Path, BasicFileAttributes> matcher) {
Pattern toMatch = createFileNamePattern(fileParts, extensionRegExp, entry);
BiPredicate<Path, BasicFileAttributes> matcher = (path, attributes) -> toMatch.matcher(path.getFileName().toString()).matches();
try (Stream<Path> pathStream = Files.find(actualDirectory, 1, matcher, FileVisitOption.FOLLOW_LINKS)) {
return pathStream.collect(Collectors.toList());
} catch (UncheckedIOException | IOException ioe) {
return Collections.emptyList();
k3KAW8Pnf7mkmdSMPHz27 marked this conversation as resolved.
Show resolved Hide resolved
resultFiles.addAll(pathStream.collect(Collectors.toList()));
} catch (UncheckedIOException uncheckedIOException) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this even an UncheckedIOException?
If needed, the UncheckedException is just a wrapper for the IOEception. So you could call throw uncheckedException.getCause()

Copy link
Member Author

@k3KAW8Pnf7mkmdSMPHz27 k3KAW8Pnf7mkmdSMPHz27 Jan 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I am not sure why an UncheckedIOException is caught in this part of the code. I don't have much experience with nio.* but my interpretation of the API is that the UncheckedIOException must be caught in the parts of the code that make use of the Path reference.
I don't know if there are any other potential issues with a lazily loaded file system walk. Based on DirectoryStream and Files.walk I'd guess it could be thrown if depth > 1 and there is a cycle, hence, not in this part of the code unless it is changed.

throw new IOException(uncheckedIOException);
}
return resultFiles;
}

private boolean isSubDirectory(Path rootDirectory, Path path) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,19 @@ void testYearAuthFirstPageFindFiles() throws Exception {
result);
}

@Test
void findAssociatedFilesContainingRegexpFromBracketedExpression() throws Exception {
var bibEntry = new BibEntry().withField(StandardField.TITLE, "Regexp from [A-Z]");

var extension = Collections.singletonList("pdf");
var directory = Collections.singletonList(Path.of(FILES_DIRECTORY));
RegExpBasedFileFinder fileFinder = new RegExpBasedFileFinder("[TITLE]\\\\.[extension]", ',');

List<Path> result = fileFinder.findAssociatedFiles(bibEntry, directory, extension);
assertEquals(Collections.singletonList(Path.of("src/test/resources/org/jabref/logic/importer/unlinkedFilesTestFolder/Regexp from [A-Z].pdf")),
result);
k3KAW8Pnf7mkmdSMPHz27 marked this conversation as resolved.
Show resolved Hide resolved
}

@Test
void testAuthorWithDiacritics() throws Exception {
// given
Expand Down