Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fix false-negatives and false-positives #99

Closed
wants to merge 24 commits into from

Conversation

KevinHock
Copy link
Collaborator

@KevinHock KevinHock commented Mar 29, 2018

So after my last vulnerabilities PR and my 2 re-organization PRs I think I'll try my hand at finding the false-negatives of our last evaluation. Fixing bugs and false-positives is easier than false-negatives so I'm looking forward to this :D

(The crashes in the last evaluation were caused by us analyzing invalid python and there was also an infinite loop in get_first_node that I now fixed 104cf80.)

@KevinHock KevinHock added the cool label Mar 29, 2018
@KevinHock KevinHock changed the title Fix false-negatives! Fix false-negatives and false-positives Mar 31, 2018
@KevinHock KevinHock changed the title Fix false-negatives and false-positives [WIP] Fix false-negatives and false-positives Apr 3, 2018
if not url:
abort(400)

if is_image(url):
Copy link
Collaborator Author

@KevinHock KevinHock Apr 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seems bypass-able, so not a false-positive, but it easily could have been, in order to do this well we would need to be able to solve predicates, via like Z3str3 or something. I feel these are acceptable for the foreseeable future.

…(rhs_vars), Move add_blackbox_or_builtin_call to expr_visitor where it should be, and CALL_IDENTIFIER to expr_visitor_helper, some flake8 on vulnerabilities_test
# .args is only used in get_sink_args
# We make a new list because right_hand_side_variables is extended in assignment_call_node
call_node.right_hand_side_variables = rhs_vars
call_node.args = list(rhs_vars)
Copy link
Collaborator Author

@KevinHock KevinHock Apr 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we did not make a new list, things like this would happen:

 > User input at line 8, trigger word "request.args.get(":
     ¤call_1 = ret_request.args.get('image_name')
Reassigned in:
    File: example/vulnerable_code/path_traversal_sanitised.py
     > Line 8: image_name = ¤call_1
File: example/vulnerable_code/path_traversal_sanitised.py
 > reaches line 10, trigger word "replace(":
    ¤call_2 = ret_image_name.replace('..', '')

though the problem on master was that we didn't include ~call_N in .args, and thus in the return value of get_sink_args and so we had a false-negative that's now fixed. However, or_in_sink.py remains unsolved, for the time being.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason or_in_sink.py doesn't pass yet is because we don't if isinstance for BoolOp in the 'loop through args' code in visit_Call, should be easy enough.

@KevinHock
Copy link
Collaborator Author

So I was half way done with expr_star_visitor when I decided to re-org things again and make things more user-friendly. 😮

@KevinHock
Copy link
Collaborator Author

KevinHock commented Jul 4, 2018

WHAT IS LEFT

Write more tests, especially having many BoolOps -- like in if_else_in_sink.py e.g. return redirect(request.args.get('Laundry') and 'crazy' or request.args.get('French') and 'foo' or request.args.get('The'))

Make existing tests pass

ControlFlowExpr inherit nicely from namedtuple

Clean up ControlFlowNode.connect and ControlFlowExpr.connect

random:
Improve visit_keyword
to make foo= make foo tainted
to make it work with **kwargs, etc.
Kill BUILTINS tuple, make everything blackbox
Replace isinstance(expr, AssignmentNode) with CallAssignment node
Fix if if with IfExp

An aside:
Inherit from named_tuple, etc.

AFTER WORK
New usage picture in README (PR to the readthedocs repository)

Test out redirect(request.args.get('next')) on current branch
Test out FlaskBB BoolOp redirect on current branch

Root README: There are 2 things you will need to do to configure PyT
    Taint entry points
    Look over the file of sources and sinks (maybe blackbox functions too)

At home:
    See how to parse routing.py
    Do Django CBVs too
    See how hard it is to do object.function() inlining

@KevinHock
Copy link
Collaborator Author

The current problem with

    return redirect(request.args.get('The') if 'hey' or 'you' else request.args.get('French') if 'foo' else request.args.get('Aces') and 'c')

is that expr_star_handler return the tainted aces call as a last expression, because StrNodes never get added to cfg_expressions, get_last_expressions can't see that aces is not the last expression. Hmmm, I fixed it with this:

--- a/pyt/cfg/expr_visitor.py
+++ b/pyt/cfg/expr_visitor.py
@@ -86,6 +86,7 @@ class ExprVisitor(StmtVisitor):
         variables = list()
         visual_variables = list()
         first_node = None
+        last_node_should_not_be_ignored = False
         node_not_to_step_past = self.nodes[-1]
 
         for expr in exprs:
@@ -112,7 +113,8 @@ class ExprVisitor(StmtVisitor):
                 label.visit(expr)
                 visual_variables.append(label.result)
 
-            if not isinstance(node, StrNode):
+            last_node_should_not_be_ignored = not isinstance(node, StrNode)
+            if last_node_should_not_be_ignored:
                 cfg_expressions.append(node)
                 if not first_node:
                     if isinstance(node, ControlFlowExpr):
@@ -130,7 +132,7 @@ class ExprVisitor(StmtVisitor):
         return ConnectExpressions(
             first_expression=first_node,
             all_expressions=cfg_expressions,
-            last_expressions=get_last_expressions(cfg_expressions) if cfg_expressions else [],
+            last_expressions=get_last_expressions(cfg_expressions) if last_node_should_not_be_ignored else [],
             variables=variables,
             visual_variables=visual_variables
         )

Now it only reports a vuln, if aces was the last expression!

…a StrNode the last expression, instead of nothing. e.g. return redirect(request.args.get('The') if 'hey' or 'you' else request.args.get('French') if 'foo' else request.args.get('Aces') and 'c')
@KevinHock
Copy link
Collaborator Author

inner_most_call notes

HOW WE CURRENTLY SET THE INNER MOST CALL ATTRIBUTE FOR BLACKBOX CALLS:

When looping through args and keyword args in add_blackbox_call
If the arg is a call
We do things, explained below.

This code essentially does 3 things:

  The last arg that is a function call gets connected to the outer call node
  Relevant code:
      # connect other_inner to outer in e.g.
      `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(image_name))`
  I think this may be not needed if the code is
  e.g.
     `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(image_name), some_var)`
  but maybe it is, currently we do it anyway.


  The first arg that is a function call gets set as_ the inner_most_call
  Calling it the inner_most_call is inaccurate, it should be called first_call
  We obtain the inner_most_call by looping through them in _get_inner_most_function_call
  Relevant code:
     # I should only set this once per loop, inner in e.g.
     # `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(image_name))`
     # (inner_most_call is used when predecessor is a ControlFlowNode in connect_control_flow_node)
     call_node.inner_most_call = return_value_of_nested_call
  Should we still do this if the first arg is not a function call?
  e.g.   
     `scrypt.outer(foo, scrypt.inner(image_name), scrypt.other_inner(image_name))`
  I think we should because control flow really does go into the inner function first.


  We connect one arg that is a function call, to the next arg that is a function call
  Relevant code:
     # connect inner to other_inner in e.g.
     # `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(image_name))`
     # I should probably loop to the inner most call of other_inner here.
     try:
         last_return_value_of_nested_call.connect(return_value_of_nested_call.first_node)
     except AttributeError:
         last_return_value_of_nested_call.connect(return_value_of_nested_call)
  Like the comment says, I am not sure if I should connect inner to other_inner or some_thing_else in
  e.g.
     `scrypt.outer(scrypt.inner(image_name), scrypt.other_inner(some_thing_else(image_name)))`

In save_def_args_in_temp of process_function we do the same thing pretty much.

HOW WE CURRENTLY USE THE INNER MOST CALL ATTRIBUTE:

When connecting a ControlFlowNode to an AssignmentCallNode
We loop through the last_nodes of the ControlFlowNode and
connect them with the inner_most_call_node of the AssignmentCallNode

_get_inner_most_function_call:
Loops through calls and
For blackbox calls, this is the inner_most_call
For user-defined calls, this is the .first_node

So blackbox calls are done well, and user-defined calls are done lazily.

Code more or less:
stmt_star_handler ->
connect_nodes ->
if node isinstance ControlFlowNode ->
for last in control_flow_node.last_nodes:
if isinstance(next_node, AssignmentCallNode):
inner_most_call_node = _get_inner_most_function_call(call_node)
last.connect(inner_most_call_node)

@KevinHock
Copy link
Collaborator Author

This line seems dead

first_node.inner_most_call = return_value_of_nested_call.first_node

b/c of the above. (i.e. we only use first_node of user-defined functions.

OPEN QUESTIONS

Question

Why do we only do this when connecting a ControlFlowNode to an AssignmentCallNode?
e.g.
If we have
foo = 5
outer(inner(foo))
I still want to connect foo = 5 to inner() and not outer, right?

Answer

We do connect_to_allowed -1 stuff that we are trying to get away from

Question

What tests do we currently have for the above?

Answer

???

   Removed the if CALL_IDENTIFIER in garbage
   Removed the node_not_to_step_past.connect(first_node)
   Fixed up _get_inner_most_expression, connect_nodes and wrote the top 4 expr_cfg_tests
        # This is used in connect_nodes
        call_node.first_expression = connected_expressions.first_expression
   Fixed remaining expr_cfg_tests
   Left off BinOp for another day
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant