Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzz Shrinking feature #351

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Fuzz Shrinking feature #351

wants to merge 17 commits into from

Conversation

MeditationDuck
Copy link
Member

@MeditationDuck MeditationDuck commented Oct 10, 2024

Fuzz Shrinking

1. Fuzz Test

To generate a crash log when the fuzz test is failing:

wake test tests/fuzz_test.py

2. Shrinking

To shrink the fuzz using the latest failure in the fuzz test:

wake test -SH

you can specify the test path. It verifies the testing target is the same:

wake test tests/fuzz_test.py -SH

Or specify a crash log directly:

wake test -SH .wake/logs/crashes/20241010_035704.txt

3. Reproduce the Error by Shrunk File

To reproduce the shrunk test:

wake test -SR

You can also specify the test here as well:

wake test tests/test_fuzz.py -SR

Alternatively, specify a shrunk data file:

wake test -SR .wake/logs/shrank/20241010_042322.bin

Shrinking phase

Shrinking tries to remove flows using two algorithms.
First, remove multiple flows thus it is faster, we print the progress of the removed flow.

Second, try to remove flow one by one.

✅ remove flow by flow kind (takes O(n))

✅ remove flow by brute force (takes O(n^2)

✅ flow fail because of removed flow dependency and it is not the target fail flow, can be removed.(checked precondition and un-executed flow will be removed in brute force shrink)

  • I clicked on "Allow edits from maintainers"

@jaczkal
Copy link
Member

jaczkal commented Oct 11, 2024

Does it also produce a crash log for failed invariants on assertions?

@jaczkal jaczkal requested a review from michprev October 11, 2024 14:04
@MeditationDuck
Copy link
Member Author

yes

@MeditationDuck
Copy link
Member Author

Now, shrinking work with this

wake test -SH

@michprev michprev force-pushed the feat/fuzz-crash-shrink branch 2 times, most recently from 81df7ae to 6256be2 Compare December 5, 2024 15:22
wake/cli/test.py Outdated
type=str,
help="Path to the shrink log file.",
is_flag=False,
flag_value=0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, true. we uses flag as well, but it better be string I will change to "-1"

wake/cli/test.py Outdated
type=str,
help="Path of shrank file.",
is_flag=False,
flag_value=0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question, why 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -1462,6 +1462,9 @@ class Chain(ABC):

tx_callback: Optional[Callable[[TransactionAbc], None]]

def __deepcopy__(self, memo):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be tricky

on one side I understand why it's needed (FuzzTest members will contain Chain transitively), on the other snapshot + revert won't completely restore the state, just a subset

for example, default accounts won't be restored

we should either "backup" more attributes or find a better way

Copy link
Member Author

@MeditationDuck MeditationDuck Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the data stored in the FuzzTest class would not stored,
we store random state already in data collecting phase.
chain and account information couls store in chain.snapshot().

Even tester defined outside of FuzztestClass, chain state would stored.
since those chains are added in wake.testing.core.connected_chains as global variable and Shrinking use this array.

And we can add missing member in Chain() in snapshot function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think there are a few attributes missing to be stored in snapshot, then it should be good

@@ -43,6 +43,7 @@ class JsonRpcCommunicator:
_protocol: ProtocolAbc
_request_id: int
_connected: bool
_interrupt_received: bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed, I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, true

Comment on lines +57 to +61
# from wake.development.transactions import Error
if type(e1) == Error and type(e2) == Error:
# If it was the Error(TransactionRevertedError), compare message content.
if e1.message != e2.message:
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why must there be an extra check for Error? What about other exceptions of the same type but different values/members?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a transaction error.

If the transaction reverts with Error().

like, revert("Switch Pushed");

In this case, it is reverted without a custom error. It is usually with string. We compare the message.

But if there is no message with just require() and also multiple places, it is unable to distinguish.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but what if we are trying to reproduce NotOwner(0x742d35Cc6634C0532925a3b844Bc454e4438f44e) but encounter NotOwner(0x19E7E376E7C213B7E7e7e46cc70A5dD086DAff2A)?

shouldn't we just compare with ==?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see the next comment

Comment on lines 66 to 81
frame1 = None
for frame1 in tb1:
if is_relative_to(
Path(frame1.filename), Path.cwd()
) and not is_relative_to(
Path(frame1.filename), Path().cwd() / "pytypes"
):
break
frame2 = None
for frame2 in tb2:
if is_relative_to(
Path(frame2.filename), Path.cwd()
) and not is_relative_to(
Path(frame2.filename), Path().cwd() / "pytypes"
):
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are searching here for a frame with an exception that happened in our cwd but not in pytypes, correct?

so the comparison does not only take into account the exception data but also the location where it happened?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly.
If it was a transaction error, rely on the Error type also if transaction. Error then relies on message. And does not care about another error argument.

If it was an error in Python, rely on the file and line number in Python. so we do not care about the actual value.

There was a lot of consideration.
Since it could depend on the definition of the "same error".

The purpose of shrinking is to create a minimum flow sequence to reproduce the same error.

The same error could be different.

  1. the same error is emitted.
  2. the same assertion in Python fails. (fails at the same line in the Python test)

or

  1. the same error with the same argument is emitted. (and also in exactly the same flow)
  2. the same assertion in Python fails with the same value. (and also in exactly the same flow)

I decided to implement 1. and 2. since these conditions could significantly shorten the test.
For example, If it was 3. 4. and Error was emitted with TransferError(nft_id=10), then at least the test required to emit 10 NFT and make an error. This would be redundant.

However, one possible issue is that when checking the balance for each account, the shrinken result shows an unbalance for different accounts.

   @invariant(period=30)
    def invariant_erc20_balances(self):
        for contract in self.erc20_balances:
            for acc in self.erc20_balances[contract]: 
                assert contract.balanceOf(acc) == self.erc20_balances[contract][acc] # <- error in same file and same line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Then we should at least implement the same logic for Panic error as it behaves the same as Error.

Still I think we should implement strict shrinking feature where we compare the errors exactly with ==.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's true.

I already have an exact match for the flow number. I can extend this also Error matching.

IGNORE_FLOW_INDEX = True # True if you accept it could reproduce same error earlier.

Comment on lines 685 to 689
set_sequence_initial_internal_state(
pickle.dumps(
random.getstate()
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed when we're not in the shrinking mode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the crash log file as well. We accept not only seeds but also random states in the test argument.

Screenshot 2024-12-06 at 16 06 22

Also, the crash log file stores only the flow number and random state at the beginning of the sequence. So, shrinking or re-fuzzing using a crash log file does not need to be repeated.

Comment on lines 122 to 132
def revert(self, python_instance: FuzzTest, chains: Tuple[Chain, ...]):
assert self.chain_states != [], "Chain snapshot is missing"
assert self._python_state is not None, "Python state snapshot is missing "
assert self.flow_number is not None, "Flow number is missing"

python_instance.__dict__.update(copy.deepcopy(self._python_state.__dict__))

self._python_state = None
for temp_chain, chain in zip(self.chain_states, chains):
chain.revert(temp_chain)
self.chain_states = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, we always do revert just to create another snapshot just after the revert.

In the case of chain snapshot I'm afraid it's necessary to call it again but I don't think we need to create deepcopy snapshot again. Could it be optimized? Also, I don't understand why is deepcopy used in this revert function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I remember, Anvil creates a snapshot and returns the ID, but once the snapshot is used, it is removed, and reverting to that ID will fail.

But, the Python instance seems to be working with direct assignment.

Comment on lines 252 to 257
with print_ignore():
test_instance._flow_num = 0
test_instance.pre_sequence()
exception_content = None
try:
with redirect_stdout(open(os.devnull, 'w')), redirect_stderr(open(os.devnull, 'w')):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the redirection applied twice? once in print_ignore and for the second time directly here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was redundant. removed.

@michprev
Copy link
Member

michprev commented Dec 6, 2024

Based on my measurements, most of the time is spent in invariants - can be even 70%. Do you think it would be possible to skip the execution of invariants when trying to reproduce the exception when deciding whether to keep a flow or not?
Of course, the last invariant for the erroring flow wouldn't be skipped.

Of course, it brings up some problems to handle:

  • Python state (incl. random generator state) may be altered in invariants
    • but we can handle it in the same way as skipped flows, I believe
  • chain state may be altered
    • this can be a big issue as we would never reproduce the bug
    • but it's not recommended to send txs / change chain state in invariants
    • we might even implement auto snapshot & revert for invariants (with optional override) - in this case, we would know what invariants must not be skipped
  • we might miss the opportunity to encounter the bug in one of earlier invariant executions
    • but I think the proposed optimization would be on average more efficient

What do you think?

@MeditationDuck
Copy link
Member Author

It could depend on the project and situation.
I like having invariant and detecting the same error because if it is found there, the shrinking phase is significantly faster.
Also, if those invariants find another error, they will take the flow required and proceed to the next flow.
But also, I agree that the shrinking is used after a lot of fuzz was run. In this case, we know only one invariant would fail at specific flow.

@MeditationDuck
Copy link
Member Author

MeditationDuck commented Dec 6, 2024

Having invariant
pros

  • find shortcut error,
  • find early in the fuzz test that the error is unrelated to the target.
    (but most of the time, it will succeed and overrun)

cons

  • slower around 70%

It can select specific invariants. (target error occurred in erc20 balance, then do only this test)

It can change check or not depending on the shrinking removal

Brute force removal usually does not find significant shortcuts and differs from removing by flow kinds.

I think a bit more thinking is required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants