Skip to content

Commit

Permalink
[Bug] Rich Text Files are not inspected with 'Extract Indicators From…
Browse files Browse the repository at this point in the history
… File - Generic v2' (#3822)

* fixed bugs

* added description to #10 task

* enhanced test to check problematic file

* now script tries to decode('unicode_escape') and only if it fails will drop it

* added https://mock.com to white list
  • Loading branch information
DeanArbel authored Jul 9, 2019
1 parent 45910b4 commit 96f8254
Show file tree
Hide file tree
Showing 6 changed files with 149 additions and 90 deletions.
108 changes: 55 additions & 53 deletions Playbooks/playbook-Extract_Indicators_From_File_-_Generic_v2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@ starttaskid: "0"
tasks:
"0":
id: "0"
taskid: 7c475fe8-ef6f-4203-8649-88c8ee1355a4
taskid: 7820586b-f2e6-4673-8dd8-08caab0d1173
type: start
task:
id: 7c475fe8-ef6f-4203-8649-88c8ee1355a4
id: 7820586b-f2e6-4673-8dd8-08caab0d1173
version: -1
description: Start
name: ""
description: ""
iscommand: false
brand: ""
nexttasks:
Expand All @@ -38,10 +38,10 @@ tasks:
ignoreworker: false
"1":
id: "1"
taskid: 7d301230-7301-4a9c-879f-2e3b4366a3d7
taskid: 7872ce6f-5e06-4c00-846b-6de0d54307b1
type: condition
task:
id: 7d301230-7301-4a9c-879f-2e3b4366a3d7
id: 7872ce6f-5e06-4c00-846b-6de0d54307b1
version: -1
name: Is there a file?
description: |
Expand Down Expand Up @@ -77,10 +77,10 @@ tasks:
ignoreworker: false
"2":
id: "2"
taskid: 864c6f5e-7214-4a36-8c37-80f7057bd36e
taskid: e06408f8-c0cb-4516-87c0-014108497f20
type: regular
task:
id: 864c6f5e-7214-4a36-8c37-80f7057bd36e
id: e06408f8-c0cb-4516-87c0-014108497f20
version: -1
name: Set file to local context
description: Set the input file into local context.
Expand Down Expand Up @@ -112,13 +112,13 @@ tasks:
ignoreworker: false
"3":
id: "3"
taskid: cc0c7ac6-f764-47df-8f1e-cfbdcbd25c4e
taskid: 832242ed-942e-4094-8cf4-003921e7ef7a
type: title
task:
id: cc0c7ac6-f764-47df-8f1e-cfbdcbd25c4e
id: 832242ed-942e-4094-8cf4-003921e7ef7a
version: -1
name: Done
description: ""
description: Done
type: title
iscommand: false
brand: ""
Expand All @@ -135,13 +135,13 @@ tasks:
ignoreworker: false
"4":
id: "4"
taskid: 94a70634-4521-42ed-8830-9673d9b25e1b
taskid: ab59c7dd-3e82-4efe-8e36-d3065d88a590
type: title
task:
id: 94a70634-4521-42ed-8830-9673d9b25e1b
id: ab59c7dd-3e82-4efe-8e36-d3065d88a590
version: -1
name: Extract Indicators From Files
description: ""
description: Extracts indicators from files
type: title
iscommand: false
brand: ""
Expand All @@ -163,10 +163,10 @@ tasks:
ignoreworker: false
"5":
id: "5"
taskid: 5e2997db-65a5-49a7-8f9a-87892701e8fe
taskid: 6bc85e4c-1f70-41b7-8bec-3c6410c9ad53
type: condition
task:
id: 5e2997db-65a5-49a7-8f9a-87892701e8fe
id: 6bc85e4c-1f70-41b7-8bec-3c6410c9ad53
version: -1
name: Is there a text-based file?
description: Checks if there is a text-based file in context. Skips MSG and
Expand Down Expand Up @@ -198,6 +198,14 @@ tasks:
value:
simple: ASCII text
ignorecase: true
- operator: containsString
left:
value:
simple: File.Type
iscontext: true
right:
value:
simple: Rich Text Format
- - operator: notContainsString
left:
value:
Expand Down Expand Up @@ -254,10 +262,10 @@ tasks:
ignoreworker: false
"6":
id: "6"
taskid: ded29993-f01d-4b33-8a4a-9d2453eaab31
taskid: d2c83c31-2092-4c86-81ad-359fcf0ee1e6
type: regular
task:
id: ded29993-f01d-4b33-8a4a-9d2453eaab31
id: d2c83c31-2092-4c86-81ad-359fcf0ee1e6
version: -1
name: Extract indicators from text-based file
description: Extracts indicators from text-based files.
Expand All @@ -282,6 +290,14 @@ tasks:
value:
simple: ASCII text
ignorecase: true
- operator: containsString
left:
value:
simple: File.Type
iscontext: true
right:
value:
simple: Rich Text Format
- - operator: notContainsString
left:
value:
Expand Down Expand Up @@ -331,7 +347,7 @@ tasks:
view: |-
{
"position": {
"x": 92.5,
"x": 82.5,
"y": 950
}
}
Expand All @@ -340,10 +356,10 @@ tasks:
ignoreworker: false
"7":
id: "7"
taskid: 542f7e7a-6978-480e-8356-cc7cda51a6d7
taskid: 79feacf1-749d-4213-85ce-9b87809c59cf
type: condition
task:
id: 542f7e7a-6978-480e-8356-cc7cda51a6d7
id: 79feacf1-749d-4213-85ce-9b87809c59cf
version: -1
name: Is there a PDF file?
description: Checks if there is a PDF file in context.
Expand Down Expand Up @@ -399,10 +415,10 @@ tasks:
ignoreworker: false
"8":
id: "8"
taskid: dbdad3fa-957f-490b-8fb5-3a827687360b
taskid: f954fb51-0ed0-44d2-837a-64f852f3dd9d
type: regular
task:
id: dbdad3fa-957f-490b-8fb5-3a827687360b
id: f954fb51-0ed0-44d2-837a-64f852f3dd9d
version: -1
name: Extract indicators from PDF file
description: Load a PDF file's content and metadata into context.
Expand Down Expand Up @@ -454,10 +470,10 @@ tasks:
ignoreworker: false
"9":
id: "9"
taskid: 6a791d6e-4678-44c7-8783-dad7ff0afdd3
taskid: 596a82ed-b1a5-4a82-8eca-c351ae07e8c3
type: condition
task:
id: 6a791d6e-4678-44c7-8783-dad7ff0afdd3
id: 596a82ed-b1a5-4a82-8eca-c351ae07e8c3
version: -1
name: Is there a Word file?
description: Checks if there is a Word file (DOC, DOCX) in context.
Expand Down Expand Up @@ -571,13 +587,13 @@ tasks:
ignoreworker: false
"10":
id: "10"
taskid: c552720b-f6fc-49be-8c2a-1beb7178c737
taskid: 371fb092-fd79-4550-8b2e-d3c945d2cc10
type: title
task:
id: c552720b-f6fc-49be-8c2a-1beb7178c737
id: 371fb092-fd79-4550-8b2e-d3c945d2cc10
version: -1
name: No File To Parse
description: ""
description: No File To Parse
type: title
iscommand: false
brand: ""
Expand All @@ -597,10 +613,10 @@ tasks:
ignoreworker: false
"11":
id: "11"
taskid: 9fe7f253-7529-438f-871b-5eba2873f43d
taskid: e08be4df-5efa-4f1a-834d-05a15f8fd63f
type: regular
task:
id: 9fe7f253-7529-438f-871b-5eba2873f43d
id: e08be4df-5efa-4f1a-834d-05a15f8fd63f
version: -1
name: Extract indicators from Word file
description: Extracts indicators from word files (DOC, DOCX).
Expand Down Expand Up @@ -632,22 +648,6 @@ tasks:
right:
value:
simple: Microsoft Word
- operator: isEqualString
left:
value:
simple: File.Info
iscontext: true
right:
value:
simple: doc
- operator: isEqualString
left:
value:
simple: File.Info
iscontext: true
right:
value:
simple: docx
- - operator: isNotEqualString
left:
value:
Expand Down Expand Up @@ -725,10 +725,10 @@ tasks:
ignoreworker: false
"12":
id: "12"
taskid: 89a330ac-fde1-4324-8802-a6bd17d9375d
taskid: 01853d00-5297-4e59-83fa-1aec87f7b3aa
type: condition
task:
id: 89a330ac-fde1-4324-8802-a6bd17d9375d
id: 01853d00-5297-4e59-83fa-1aec87f7b3aa
version: -1
name: Were images extracted?
description: Checks whether images were extracted from PDF files.
Expand Down Expand Up @@ -801,13 +801,14 @@ tasks:
ignoreworker: false
"13":
id: "13"
taskid: 2d1ba725-7a79-4ea0-8c84-94ce9007608b
taskid: 60827e91-7450-4f79-8e82-863488b15371
type: condition
task:
id: 2d1ba725-7a79-4ea0-8c84-94ce9007608b
id: 60827e91-7450-4f79-8e82-863488b15371
version: -1
name: Is Image OCR enabled?
description: Checks whether there is an active instance of the Image OCR integration enabled.
description: Checks whether there is an active instance of the Image OCR integration
enabled.
type: condition
iscommand: false
brand: ""
Expand Down Expand Up @@ -847,13 +848,14 @@ tasks:
ignoreworker: false
"14":
id: "14"
taskid: 9a4d578f-2723-4b9e-82c0-97bb5f7afccf
taskid: 19fe4f88-c55b-4f13-814c-1ad0aae31cae
type: regular
task:
id: 9a4d578f-2723-4b9e-82c0-97bb5f7afccf
id: 19fe4f88-c55b-4f13-814c-1ad0aae31cae
version: -1
name: Extract text from images
description: Extracts text from PNG, JPEG, and GIF image files, and uses auto-extract to get reputation for indicators.
description: Extracts text from PNG, JPEG, and GIF image files, and uses auto-extract
to get reputation for indicators.
script: '|||image-ocr-extract-text'
type: regular
iscommand: true
Expand Down
2 changes: 1 addition & 1 deletion Releases/LatestRelease/ExtractIndicatorsFromTextFile.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
-
- Fixed a bug where the script would throw an error for certain texts with "\\" character
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
- Fixed an issue where certain RTF files were not handled correctly.
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,12 @@
return_error("File was not found")

with open(filePath, mode='r') as f:
data = f.read(maxFileSize).decode('unicode_escape').encode('utf-8')
data = f.read(maxFileSize)
try:
data = data.decode('unicode_escape').encode('utf-8')
# unicode_escape might throw UnicodeDecodeError for strings that contain \ char followed by ascii characters
except UnicodeDecodeError:
data = data.encode('utf-8')

# Extract indicators (omitting context output, letting auto-extract work)
indicators_hr = demisto.executeCommand("extractIndicators", {
Expand Down
Loading

0 comments on commit 96f8254

Please sign in to comment.