[WIP] Fix false-negatives and false-positives #99

KevinHock · 2018-03-29T02:59:14Z

So after my last vulnerabilities PR and my 2 re-organization PRs I think I'll try my hand at finding the false-negatives of our last evaluation. Fixing bugs and false-positives is easier than false-negatives so I'm looking forward to this :D

(The crashes in the last evaluation were caused by us analyzing invalid python and there was also an infinite loop in get_first_node that I now fixed 104cf80.)

…aluation

…-p's f-n's and the infinite loop in get_first_node

KevinHock · 2018-04-04T01:48:45Z

example/november_2017_evaluation/false_positives/cheapskate_get_image.py

+  if not url:
+    abort(400)
+
+  if is_image(url):


This one seems bypass-able, so not a false-positive, but it easily could have been, in order to do this well we would need to be able to solve predicates, via like Z3str3 or something. I feel these are acceptable for the foreseeable future.

…(rhs_vars), Move add_blackbox_or_builtin_call to expr_visitor where it should be, and CALL_IDENTIFIER to expr_visitor_helper, some flake8 on vulnerabilities_test

KevinHock · 2018-04-04T02:48:04Z

pyt/expr_visitor.py

+        # .args is only used in get_sink_args
+        # We make a new list because right_hand_side_variables is extended in assignment_call_node
+        call_node.right_hand_side_variables = rhs_vars
+        call_node.args = list(rhs_vars)


If we did not make a new list, things like this would happen:

> User input at line 8, trigger word "request.args.get(": ¤call_1 = ret_request.args.get('image_name') Reassigned in: File: example/vulnerable_code/path_traversal_sanitised.py > Line 8: image_name = ¤call_1 File: example/vulnerable_code/path_traversal_sanitised.py > reaches line 10, trigger word "replace(": ¤call_2 = ret_image_name.replace('..', '')

though the problem on master was that we didn't include ~call_N in .args, and thus in the return value of get_sink_args and so we had a false-negative that's now fixed. However, or_in_sink.py remains unsolved, for the time being.

The reason or_in_sink.py doesn't pass yet is because we don't if isinstance for BoolOp in the 'loop through args' code in visit_Call, should be easy enough.

…ython-3-6-project-structure-leads-to-runtimewarning

KevinHock · 2018-04-26T02:16:59Z

So I was half way done with expr_star_visitor when I decided to re-org things again and make things more user-friendly. 😮

… vulnerabilities_test.py

…ALL_IDENTIFIER back to what it is on master

KevinHock · 2018-07-04T01:47:36Z

WHAT IS LEFT

Write more tests, especially having many BoolOps -- like in if_else_in_sink.py e.g. return redirect(request.args.get('Laundry') and 'crazy' or request.args.get('French') and 'foo' or request.args.get('The'))

Make existing tests pass

ControlFlowExpr inherit nicely from namedtuple

Clean up ControlFlowNode.connect and ControlFlowExpr.connect

random:
Improve visit_keyword
to make foo= make foo tainted
to make it work with **kwargs, etc.
Kill BUILTINS tuple, make everything blackbox
Replace isinstance(expr, AssignmentNode) with CallAssignment node
Fix if if with IfExp

An aside:
Inherit from named_tuple, etc.

AFTER WORK
New usage picture in README (PR to the readthedocs repository)

Test out redirect(request.args.get('next')) on current branch
Test out FlaskBB BoolOp redirect on current branch

Root README: There are 2 things you will need to do to configure PyT
    Taint entry points
    Look over the file of sources and sinks (maybe blackbox functions too)

At home:
    See how to parse routing.py
    Do Django CBVs too
    See how hard it is to do object.function() inlining

…up StrNode to be cleaner

KevinHock · 2018-07-07T19:33:12Z

The current problem with

    return redirect(request.args.get('The') if 'hey' or 'you' else request.args.get('French') if 'foo' else request.args.get('Aces') and 'c')

is that expr_star_handler return the tainted aces call as a last expression, because StrNodes never get added to cfg_expressions, get_last_expressions can't see that aces is not the last expression. Hmmm, I fixed it with this:

--- a/pyt/cfg/expr_visitor.py
+++ b/pyt/cfg/expr_visitor.py
@@ -86,6 +86,7 @@ class ExprVisitor(StmtVisitor):
         variables = list()
         visual_variables = list()
         first_node = None
+        last_node_should_not_be_ignored = False
         node_not_to_step_past = self.nodes[-1]
 
         for expr in exprs:
@@ -112,7 +113,8 @@ class ExprVisitor(StmtVisitor):
                 label.visit(expr)
                 visual_variables.append(label.result)
 
-            if not isinstance(node, StrNode):
+            last_node_should_not_be_ignored = not isinstance(node, StrNode)
+            if last_node_should_not_be_ignored:
                 cfg_expressions.append(node)
                 if not first_node:
                     if isinstance(node, ControlFlowExpr):
@@ -130,7 +132,7 @@ class ExprVisitor(StmtVisitor):
         return ConnectExpressions(
             first_expression=first_node,
             all_expressions=cfg_expressions,
-            last_expressions=get_last_expressions(cfg_expressions) if cfg_expressions else [],
+            last_expressions=get_last_expressions(cfg_expressions) if last_node_should_not_be_ignored else [],
             variables=variables,
             visual_variables=visual_variables
         )

Now it only reports a vuln, if aces was the last expression!

…a StrNode the last expression, instead of nothing. e.g. return redirect(request.args.get('The') if 'hey' or 'you' else request.args.get('French') if 'foo' else request.args.get('Aces') and 'c')

…ast_var_in_and_is_not_tainted with test

… match. Did boolop.connect to each boolop last expression.

KevinHock · 2018-07-26T19:20:15Z

inner_most_call notes

HOW WE CURRENTLY SET THE INNER MOST CALL ATTRIBUTE FOR BLACKBOX CALLS:

When looping through args and keyword args in add_blackbox_call
If the arg is a call
We do things, explained below.

This code essentially does 3 things:

  The last arg that is a function call gets connected to the outer call node
  Relevant code:
      # connect other_inner to outer in e.g.
      `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(image_name))`
  I think this may be not needed if the code is
  e.g.
     `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(image_name), some_var)`
  but maybe it is, currently we do it anyway.


  The first arg that is a function call gets set as_ the inner_most_call
  Calling it the inner_most_call is inaccurate, it should be called first_call
  We obtain the inner_most_call by looping through them in _get_inner_most_function_call
  Relevant code:
     # I should only set this once per loop, inner in e.g.
     # `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(image_name))`
     # (inner_most_call is used when predecessor is a ControlFlowNode in connect_control_flow_node)
     call_node.inner_most_call = return_value_of_nested_call
  Should we still do this if the first arg is not a function call?
  e.g.   
     `scrypt.outer(foo, scrypt.inner(image_name), scrypt.other_inner(image_name))`
  I think we should because control flow really does go into the inner function first.


  We connect one arg that is a function call, to the next arg that is a function call
  Relevant code:
     # connect inner to other_inner in e.g.
     # `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(image_name))`
     # I should probably loop to the inner most call of other_inner here.
     try:
         last_return_value_of_nested_call.connect(return_value_of_nested_call.first_node)
     except AttributeError:
         last_return_value_of_nested_call.connect(return_value_of_nested_call)
  Like the comment says, I am not sure if I should connect inner to other_inner or some_thing_else in
  e.g.
     `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(some_thing_else(image_name)))`

In save_def_args_in_temp of process_function we do the same thing pretty much.

HOW WE CURRENTLY USE THE INNER MOST CALL ATTRIBUTE:

When connecting a ControlFlowNode to an AssignmentCallNode
We loop through the last_nodes of the ControlFlowNode and
connect them with the inner_most_call_node of the AssignmentCallNode

_get_inner_most_function_call:
Loops through calls and
For blackbox calls, this is the inner_most_call
For user-defined calls, this is the .first_node

So blackbox calls are done well, and user-defined calls are done lazily.

Code more or less:
stmt_star_handler ->
connect_nodes ->
if node isinstance ControlFlowNode ->
for last in control_flow_node.last_nodes:
if isinstance(next_node, AssignmentCallNode):
inner_most_call_node = _get_inner_most_function_call(call_node)
last.connect(inner_most_call_node)

KevinHock · 2018-07-26T19:26:50Z

This line seems dead

pyt/pyt/cfg/expr_visitor.py

Line 308 in fbca4d2

first_node.inner_most_call = return_value_of_nested_call.first_node

b/c of the above. (i.e. we only use first_node of user-defined functions.

OPEN QUESTIONS

Question

Why do we only do this when connecting a ControlFlowNode to an AssignmentCallNode?
e.g.
If we have
foo = 5
outer(inner(foo))
I still want to connect foo = 5 to inner() and not outer, right?

Answer

We do connect_to_allowed -1 stuff that we are trying to get away from

Question

What tests do we currently have for the above?

Answer

???

Removed the if CALL_IDENTIFIER in garbage Removed the node_not_to_step_past.connect(first_node) Fixed up _get_inner_most_expression, connect_nodes and wrote the top 4 expr_cfg_tests # This is used in connect_nodes call_node.first_expression = connected_expressions.first_expression Fixed remaining expr_cfg_tests Left off BinOp for another day

Make a false_negatives directory with the issues from the 11/16/17 ev…

3b4bf1b

…aluation

KevinHock added the cool label Mar 29, 2018

KevinHock added 3 commits March 29, 2018 18:56

Made a folder example/november_2017_evaluation with a directory for f…

153501f

…-p's f-n's and the infinite loop in get_first_node

Fix the foddy infinite loop

104cf80

Delete the infinite_loop folder since it is fixed

4156984

KevinHock changed the title ~~Fix false-negatives!~~ Fix false-negatives and false-positives Mar 31, 2018

KevinHock changed the title ~~Fix false-negatives and false-positives~~ [WIP] Fix false-negatives and false-positives Apr 3, 2018

KevinHock commented Apr 4, 2018

View reviewed changes

Make source_nested_in_sink.py test and pass it by making .args = list…

97074f3

…(rhs_vars), Move add_blackbox_or_builtin_call to expr_visitor where it should be, and CALL_IDENTIFIER to expr_visitor_helper, some flake8 on vulnerabilities_test

KevinHock commented Apr 4, 2018

View reviewed changes

KevinHock added 4 commits April 4, 2018 18:13

Make a test for if_else_in_sink

98b49b6

Fix double import trap https://stackoverflow.com/questions/43393764/p…

ff1bc89

…ython-3-6-project-structure-leads-to-runtimewarning

[False-negs] Problem illustration commit

6c7a8d2

[comments] w/r/t ConnectStatements

e9dd5e4

KevinHock mentioned this pull request Apr 17, 2018

Fix the foddy infinite loop #118

Merged

KevinHock added 8 commits May 10, 2018 16:21

save current work emergency

63c3f96

save current work emergency

8367f68

Merge branch 'master' into fix_oakie_false_negative

2578c29

Added changes back to file that git merge missed

6226a77

Fix whitespace

62e7420

Fixed some imports that got broken by the merge, indented EXPECTED in…

8b72e52

… vulnerabilities_test.py

Delete tests/vulnerabilities_test.py that I accidentally added, Fix C…

d42243c

…ALL_IDENTIFIER back to what it is on master

Mostly finished BoolOp, IfExp and expr_star_handler

695bbde

KevinHock added 2 commits July 5, 2018 14:32

Handle BoolOp inside of BoolOp, Make ConnectExpressions a class, Fix …

7ecf462

…up StrNode to be cleaner

Finished IfExp, found bug in expr_star_handler

1fa04e8

KevinHock added 2 commits July 7, 2018 12:35

Fix case where false-positives could occur by making the node before …

4081900

…a StrNode the last expression, instead of nothing. e.g. return redirect(request.args.get('The') if 'hey' or 'you' else request.args.get('French') if 'foo' else request.args.get('Aces') and 'c')

Add vulnerable_code_with_expressions/last_var_in_and_is_tainted and l…

c6d9e62

…ast_var_in_and_is_not_tainted with test

KevinHock added 2 commits July 9, 2018 20:58

Change visit_IfExp to handle .test better, change expr_cfg_test.py to…

4d2ad46

… match. Did boolop.connect to each boolop last expression.

Add example of connecting_control_flow_exprs_in_connect_expressions.py

5b1f262

KevinHock mentioned this pull request Aug 23, 2018

Add a logger and verbosity options #168

Closed

KevinHock closed this Apr 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix false-negatives and false-positives #99

[WIP] Fix false-negatives and false-positives #99

KevinHock commented Mar 29, 2018 •

edited

Loading

KevinHock Apr 4, 2018 •

edited

Loading

KevinHock Apr 4, 2018 •

edited

Loading

KevinHock Apr 6, 2018

KevinHock commented Apr 26, 2018

KevinHock commented Jul 4, 2018 •

edited

Loading

KevinHock commented Jul 7, 2018

KevinHock commented Jul 26, 2018

KevinHock commented Jul 26, 2018

[WIP] Fix false-negatives and false-positives #99

[WIP] Fix false-negatives and false-positives #99

Conversation

KevinHock commented Mar 29, 2018 • edited Loading

KevinHock Apr 4, 2018 • edited Loading

Choose a reason for hiding this comment

KevinHock Apr 4, 2018 • edited Loading

Choose a reason for hiding this comment

KevinHock Apr 6, 2018

Choose a reason for hiding this comment

KevinHock commented Apr 26, 2018

KevinHock commented Jul 4, 2018 • edited Loading

WHAT IS LEFT

KevinHock commented Jul 7, 2018

KevinHock commented Jul 26, 2018

HOW WE CURRENTLY SET THE INNER MOST CALL ATTRIBUTE FOR BLACKBOX CALLS:

HOW WE CURRENTLY USE THE INNER MOST CALL ATTRIBUTE:

KevinHock commented Jul 26, 2018

OPEN QUESTIONS

Question

Answer

Question

Answer

KevinHock commented Mar 29, 2018 •

edited

Loading

KevinHock Apr 4, 2018 •

edited

Loading

KevinHock Apr 4, 2018 •

edited

Loading

KevinHock commented Jul 4, 2018 •

edited

Loading