Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cycle in requirements.txt causes infinite recursion #354

Closed
spencerschrock opened this issue Apr 19, 2023 · 7 comments · Fixed by #366
Closed

cycle in requirements.txt causes infinite recursion #354

spencerschrock opened this issue Apr 19, 2023 · 7 comments · Fixed by #366
Assignees
Labels
bug Something isn't working

Comments

@spencerschrock
Copy link
Contributor

spencerschrock commented Apr 19, 2023

The logic that handles referring to other requirement files can recurse infinitely (or at least until the stack hits the goroutine 1Gb limit).

func ParseRequirementsTxt(pathToLockfile string) ([]PackageDetails, error) {
packages := map[string]PackageDetails{}
file, err := os.Open(pathToLockfile)
if err != nil {
return []PackageDetails{}, fmt.Errorf("could not open %s: %w", pathToLockfile, err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
line := removeComments(scanner.Text())
if strings.HasPrefix(line, "-r ") {
details, err := ParseRequirementsTxt(
filepath.Join(filepath.Dir(pathToLockfile), strings.TrimPrefix(line, "-r ")),
)

Consider the following requirements.txt:

-r requirements.txt
iterpipes3==0.4
numpy==1.16.4

Running osv-scanner on this file will crash the program (and my VM when trying to replicate)

runtime: goroutine stack exceeds 1000000000-byte limit
fatal error: stack overflow

For what it's worth, it's a problematic requirements.txt for pip too:

pip install -r requirements.txt 
ERROR: Exception:
Traceback (most recent call last):
...
RecursionError: maximum recursion depth exceeded while calling a Python object

Would some sort of cycle detection in osv-scanner make sense? Or is this too much work to handle what is essentially bad input?
The alternative on my end is just removing the offending repo from my dataset, which I'm fine doing.

@G-Rath
Copy link
Collaborator

G-Rath commented Apr 19, 2023

I think an easy way to handle this could be to just track what requirement files have been previously included (and include the one that is about to be included) because that should also optimize for more complex tree shapes that result in the same file being included multiple times in a way that is not a cycle e.g.

# base.txt
# (other packages)

# test.txt
-r base.txt
# (other packages)

# dev
-r base.txt
-r test.txt
# (other packages)

I would also be interested to know if you have any occurrences of that in your dataset - they'd not be noticable on the final output since packages are deduplicated.

@spencerschrock
Copy link
Contributor Author

I think an easy way to handle this could be to just track what requirement files have been previously included (and include the one that is about to be included) because that should also optimize for more complex tree shapes that result in the same file being included multiple times in a way that is not a cycle e.g.

+1, sounds like a straight-forward way to avoid processing the same file twice.

# base.txt
# (other packages)

# test.txt
-r base.txt
# (other packages)

# dev
-r base.txt
-r test.txt
# (other packages)

I would also be interested to know if you have any occurrences of that in your dataset - they'd not be noticable on the final output since packages are duplicated.

Hmm, I'm not sure we have the data to identify these.
Cycles were easy to spot via crash logs, but since we're just interested in the list of vulns produced by DoScan that's all that's currently stored. I assume these get de-duped somewhere, but if that's not the case I could write a query to try and find an occurence.

@G-Rath
Copy link
Collaborator

G-Rath commented Apr 19, 2023

Hmm, I'm not sure we have the data to identify these.

It's hard to comment without knowing the details of your setup, but if you have the ability to run a modified version of the scanner you could easily have the scanner record every -r occurrence and then output that either to stdout/stderr or append it to a debug file.

@oliverchang oliverchang added the bug Something isn't working label Apr 19, 2023
@spencerschrock
Copy link
Contributor Author

It's hard to comment without knowing the details of your setup, but if you have the ability to run a modified version of the scanner

Across the whole 1.2M repos no, but locally and across a smaller subset of projects with python in the name? I can let it run and see if it finds anything interesting overnight.

@spencerschrock
Copy link
Contributor Author

spencerschrock commented Apr 20, 2023

I think this is the situation you were talking about:
https://github.com/caizhengxin/python-libpcap

  • requirements.txt to requirements/dev.txt
    • requirements/dev.txt to requirements/publish.txt
      • requirements/publish.txt to requirements/base.txt
    • requirements/dev.txt to requirements/test.txt
      • requirements/test.txt to requirements/base.txt

There's also:

I don't think it creates any duplicate packages though, as you use a map here so it gets de-duped:

for _, detail := range details {
packages[detail.Name+"@"+detail.Version] = detail
}

Scanned /tmp/python-libpcap/requirements.txt file and found 12 packages
2023/04/19 23:03:48 package {bumpversion 0.0.0  PyPI PyPI}
2023/04/19 23:03:48 package {coverage 0.0.0  PyPI PyPI}
2023/04/19 23:03:48 package {cython 0.29.23  PyPI PyPI}
2023/04/19 23:03:48 package {flake8 0.0.0  PyPI PyPI}
2023/04/19 23:03:48 package {mock 0.0.0  PyPI PyPI}
2023/04/19 23:03:48 package {pre-commit 1.18.3  PyPI PyPI}
2023/04/19 23:03:48 package {pytest 0.0.0  PyPI PyPI}
2023/04/19 23:03:48 package {pytest-cov 0.0.0  PyPI PyPI}
2023/04/19 23:03:48 package {sphinx 0.0.0  PyPI PyPI}
2023/04/19 23:03:48 package {sphinx-rtd-theme 0.0.0  PyPI PyPI}
2023/04/19 23:03:48 package {sphinxcontrib-napoleon 0.0.0  PyPI PyPI}
2023/04/19 23:03:48 package {tox 0.0.0  PyPI PyPI}

Finally there's https://github.com/luxonis/depthai-python which is slightly different scenario. As DoScan walks directory it will do the following:

  • start at docs/readthedocs/requirements.txt -> docs/requirements.txt -> docs/requirements_mkdoc.txt
  • start at docs/requirements.txt -> docs/requirements_mkdoc.txt

So some of the packages will be duplicated across queries since they start from different lockfiles. Which I think is the correct behavior one would want to see in the results, but does cause some extra queries to OSV.

@oliverchang
Copy link
Collaborator

@another-rex can you take a look here?

@G-Rath
Copy link
Collaborator

G-Rath commented May 4, 2023

@oliverchang this is also one I'm working on over in the detector - I've not got a fix yet, but will probably have one in the next couple of days, so feel free to assign it to me if you want

@another-rex another-rex assigned G-Rath and unassigned another-rex May 4, 2023
robotdana added a commit to robotdana/osv-scanner that referenced this issue May 4, 2023
Ported from G-Rath/osv-detector#191

Note that this is not actually supported by `pip` itself, but doing so
actually optimizes the parser a bit anyway by only reading each file
exactly once regardless of how often it is required

fixes: google#354

Co-authored-by: Gareth Jones <[email protected]>
robotdana added a commit to robotdana/osv-scanner that referenced this issue May 5, 2023
Ported from G-Rath/osv-detector#191

Note that this is not actually supported by `pip` itself, but doing so
actually optimizes the parser a bit anyway by only reading each file
exactly once regardless of how often it is required

fixes: google#354

Co-authored-by: Gareth Jones <[email protected]>
robotdana added a commit to robotdana/osv-scanner that referenced this issue May 5, 2023
Ported from G-Rath/osv-detector#191

Note that this is not actually supported by `pip` itself, but doing so
actually optimizes the parser a bit anyway by only reading each file
exactly once regardless of how often it is required

fixes: google#354

Co-authored-by: Gareth Jones <[email protected]>
another-rex pushed a commit that referenced this issue May 5, 2023
Ported from G-Rath/osv-detector#191

Note that this is not actually supported by `pip` itself, but doing so
actually optimizes the parser a bit anyway by only reading each file
exactly once regardless of how often it is required

fixes: #354

Co-authored-by: Gareth Jones <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants