Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shell escape reference image files [VS-796] [WX-910] #6989

Merged
merged 3 commits into from
Jan 24, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
{
"wf_reference_disk_test.broad_reference_file_input": "gs://gcp-public-data--broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.fai",
"wf_reference_disk_test.nirvana_reference_file_input": "gs://broad-public-datasets/gvs/vat-annotations/Nirvana/3.18.1/SupplementaryAnnotation/GRCh38/phyloP_hg38.npd.idx"
"wf_reference_disk_test.nirvana_reference_file_input": "gs://broad-public-datasets/gvs/vat-annotations/Nirvana/3.18.1/SupplementaryAnnotation/GRCh38/phyloP_hg38.npd.idx",
"wf_reference_disk_test.nirvana_reference_file_metachar_input": "gs://broad-public-datasets/gvs/vat-annotations/Nirvana/3.18.1/SupplementaryAnnotation/GRCh38/1000_Genomes_Project_(SV)_Phase_3_v5a.nsi"
}
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,23 @@ task check_if_localized_as_symlink {
input {
File broad_reference_file_input
File nirvana_reference_file_input
File nirvana_reference_file_metachar_input
}
String broad_input_symlink = "broad_input_symlink.txt"
String nirvana_input_symlink = "nirvana_input_symlink.txt"
String nirvana_metachar_input_symlink = "nirvana_metachar_input_symlink.txt"
command {
# Print true if input is a symlink, otherwise print false.
if test -h ~{broad_reference_file_input}; then echo true; else echo false; fi > ~{broad_input_symlink}
if test -h ~{nirvana_reference_file_input}; then echo true; else echo false; fi > ~{nirvana_input_symlink}

# Quotes added here due to the metachar in the filename.
if test -h "~{nirvana_reference_file_metachar_input}"; then echo true; else echo false; fi > ~{nirvana_metachar_input_symlink}
}
output {
Boolean is_broad_input_symlink = read_boolean("~{broad_input_symlink}")
Boolean is_nirvana_input_symlink = read_boolean("~{nirvana_input_symlink}")
Boolean is_nirvana_metachar_input_symlink = read_boolean("~{nirvana_metachar_input_symlink}")
}
runtime {
docker: "ubuntu:latest"
Expand All @@ -26,14 +32,17 @@ workflow wf_reference_disk_test {
input {
File broad_reference_file_input
File nirvana_reference_file_input
File nirvana_reference_file_metachar_input
}
call check_if_localized_as_symlink {
input:
broad_reference_file_input = broad_reference_file_input,
nirvana_reference_file_input = nirvana_reference_file_input
nirvana_reference_file_input = nirvana_reference_file_input,
nirvana_reference_file_metachar_input = nirvana_reference_file_metachar_input
}
output {
Boolean is_broad_input_file_a_symlink = check_if_localized_as_symlink.is_broad_input_symlink
Boolean is_nirvana_input_file_a_symlink = check_if_localized_as_symlink.is_nirvana_input_symlink
Boolean is_nirvana_metachar_input_file_a_symlink = check_if_localized_as_symlink.is_nirvana_metachar_input_symlink
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,11 @@ object ActionCommands {
def escape: String = StringEscapeUtils.escapeXSI(path.pathAsString)
}

implicit class ShellString(val string: String) extends AnyVal {
// The command String runs in Bourne shell so shell metacharacters in filenames must be escaped
def escape: String = StringEscapeUtils.escapeXSI(string)
}

private def makeContentTypeFlag(contentType: Option[ContentType]) = contentType.map(ct => s"""-h "Content-Type: $ct"""").getOrElse("")

def makeContainerDirectory(containerPath: Path) = s"mkdir -p ${containerPath.escape}"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -179,11 +179,12 @@ class PipelinesApiAsyncBackendJobExecutionActor(standardParams: StandardAsyncExe
(implicit gcsTransferConfiguration: GcsTransferConfiguration): String = {
// Generate a mapping of reference inputs to their mounted paths and a section of the localization script to
// "faux localize" these reference inputs with symlinks to their locations on mounted reference disks.
import cromwell.backend.google.pipelines.common.action.ActionCommands._
val referenceFilesLocalizationScript = {
val symlinkCreationCommandsOpt = referenceInputsToMountedPathsOpt map { referenceInputsToMountedPaths =>
referenceInputsToMountedPaths map {
case (input, absolutePathOnRefDisk) =>
s"mkdir -p ${input.containerPath.parent.pathAsString} && ln -s $absolutePathOnRefDisk ${input.containerPath.pathAsString}"
s"mkdir -p ${input.containerPath.parent.pathAsString.escape} && ln -s ${absolutePathOnRefDisk.escape} ${input.containerPath.pathAsString.escape}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other places we might be vulnerable to this, or is there something specific about the way reference disk paths are handled that bypasses other escaping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Cromwell is appropriately robust to these potential filename issues in the "normal" localization code paths thanks to a liberal use of this shellEscaped function. We were able to localize this particular file just fine (apart from the slowness) when we weren't using the reference disk.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice! Is there a reason we can't use that function here?

Copy link
Contributor Author

@mcovarr mcovarr Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could definitely use that function here if that's the preference. I modeled these changes on the existing implicit class ShellPath which allows for invoking .escape as if it were a method on String. If using shellEscaped this would (arguably) look a little noisier:

Suggested change
s"mkdir -p ${input.containerPath.parent.pathAsString.escape} && ln -s ${absolutePathOnRefDisk.escape} ${input.containerPath.pathAsString.escape}"
s"mkdir -p ${shellEscaped(input.containerPath.parent.pathAsString)} && ln -s ${shellEscaped(absolutePathOnRefDisk)} ${shellEscaped(input.containerPath.pathAsString)}"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer minimizing the number of paths for string escaping... but also acknowledge that this is a total nitpick. :)

}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -192,11 +192,12 @@ class PipelinesApiAsyncBackendJobExecutionActor(standardParams: StandardAsyncExe
(implicit gcsTransferConfiguration: GcsTransferConfiguration): String = {
// Generate a mapping of reference inputs to their mounted paths and a section of the localization script to
// "faux localize" these reference inputs with symlinks to their locations on mounted reference disks.
import cromwell.backend.google.pipelines.common.action.ActionCommands._
val referenceFilesLocalizationScript = {
val symlinkCreationCommandsOpt = referenceInputsToMountedPathsOpt map { referenceInputsToMountedPaths =>
referenceInputsToMountedPaths map {
case (input, absolutePathOnRefDisk) =>
s"mkdir -p ${input.containerPath.parent.pathAsString} && ln -s $absolutePathOnRefDisk ${input.containerPath.pathAsString}"
s"mkdir -p ${input.containerPath.parent.pathAsString.escape} && ln -s ${absolutePathOnRefDisk.escape} ${input.containerPath.pathAsString.escape}"
}
}

Expand Down