cycle in requirements.txt causes infinite recursion #354

spencerschrock · 2023-04-19T17:21:16Z

The logic that handles referring to other requirement files can recurse infinitely (or at least until the stack hits the goroutine 1Gb limit).

osv-scanner/pkg/lockfile/parse-requirements-txt.go

Lines 96 to 113 in 2c101c1

    
           func ParseRequirementsTxt(pathToLockfile string) ([]PackageDetails, error) { 
        
           	packages := map[string]PackageDetails{} 
        
           	file, err := os.Open(pathToLockfile) 
        
           	if err != nil { 
        
           		return []PackageDetails{}, fmt.Errorf("could not open %s: %w", pathToLockfile, err) 
        
           	} 
        
           	defer file.Close() 
        
           	scanner := bufio.NewScanner(file) 
        
           	for scanner.Scan() { 
        
           		line := removeComments(scanner.Text()) 
        
           		if strings.HasPrefix(line, "-r ") { 
        
           			details, err := ParseRequirementsTxt( 
        
           				filepath.Join(filepath.Dir(pathToLockfile), strings.TrimPrefix(line, "-r ")), 
        
           			)

Consider the following requirements.txt:

-r requirements.txt
iterpipes3==0.4
numpy==1.16.4

Running osv-scanner on this file will crash the program (and my VM when trying to replicate)

runtime: goroutine stack exceeds 1000000000-byte limit
fatal error: stack overflow

For what it's worth, it's a problematic requirements.txt for pip too:

pip install -r requirements.txt 
ERROR: Exception:
Traceback (most recent call last):
...
RecursionError: maximum recursion depth exceeded while calling a Python object

Would some sort of cycle detection in osv-scanner make sense? Or is this too much work to handle what is essentially bad input?
The alternative on my end is just removing the offending repo from my dataset, which I'm fine doing.

The text was updated successfully, but these errors were encountered:

G-Rath · 2023-04-19T19:52:08Z

I think an easy way to handle this could be to just track what requirement files have been previously included (and include the one that is about to be included) because that should also optimize for more complex tree shapes that result in the same file being included multiple times in a way that is not a cycle e.g.

# base.txt
# (other packages)

# test.txt
-r base.txt
# (other packages)

# dev
-r base.txt
-r test.txt
# (other packages)

I would also be interested to know if you have any occurrences of that in your dataset - they'd not be noticable on the final output since packages are deduplicated.

spencerschrock · 2023-04-19T20:09:17Z

I think an easy way to handle this could be to just track what requirement files have been previously included (and include the one that is about to be included) because that should also optimize for more complex tree shapes that result in the same file being included multiple times in a way that is not a cycle e.g.

+1, sounds like a straight-forward way to avoid processing the same file twice.

# base.txt
# (other packages)

# test.txt
-r base.txt
# (other packages)

# dev
-r base.txt
-r test.txt
# (other packages)
I would also be interested to know if you have any occurrences of that in your dataset - they'd not be noticable on the final output since packages are duplicated.

Hmm, I'm not sure we have the data to identify these.
Cycles were easy to spot via crash logs, but since we're just interested in the list of vulns produced by DoScan that's all that's currently stored. I assume these get de-duped somewhere, but if that's not the case I could write a query to try and find an occurence.

G-Rath · 2023-04-19T20:13:26Z

Hmm, I'm not sure we have the data to identify these.

It's hard to comment without knowing the details of your setup, but if you have the ability to run a modified version of the scanner you could easily have the scanner record every -r occurrence and then output that either to stdout/stderr or append it to a debug file.

spencerschrock · 2023-04-19T22:28:32Z

It's hard to comment without knowing the details of your setup, but if you have the ability to run a modified version of the scanner

Across the whole 1.2M repos no, but locally and across a smaller subset of projects with python in the name? I can let it run and see if it finds anything interesting overnight.

spencerschrock · 2023-04-20T06:13:52Z

I think this is the situation you were talking about:
https://github.com/caizhengxin/python-libpcap

requirements.txt to requirements/dev.txt
- requirements/dev.txt to requirements/publish.txt
  - requirements/publish.txt to requirements/base.txt
- requirements/dev.txt to requirements/test.txt
  - requirements/test.txt to requirements/base.txt

There's also:

https://github.com/allure-framework/allure-python (multiple places, one starts at allure-behave/requirements.txt)

I don't think it creates any duplicate packages though, as you use a map here so it gets de-duped:

osv-scanner/pkg/lockfile/parse-requirements-txt.go

Lines 119 to 121 in 2c101c1

    
           for _, detail := range details { 
        
           	packages[detail.Name+"@"+detail.Version] = detail 
        
           }

Scanned /tmp/python-libpcap/requirements.txt file and found 12 packages
2023/04/19 23:03:48 package {bumpversion 0.0.0  PyPI PyPI}
2023/04/19 23:03:48 package {coverage 0.0.0  PyPI PyPI}
2023/04/19 23:03:48 package {cython 0.29.23  PyPI PyPI}
2023/04/19 23:03:48 package {flake8 0.0.0  PyPI PyPI}
2023/04/19 23:03:48 package {mock 0.0.0  PyPI PyPI}
2023/04/19 23:03:48 package {pre-commit 1.18.3  PyPI PyPI}
2023/04/19 23:03:48 package {pytest 0.0.0  PyPI PyPI}
2023/04/19 23:03:48 package {pytest-cov 0.0.0  PyPI PyPI}
2023/04/19 23:03:48 package {sphinx 0.0.0  PyPI PyPI}
2023/04/19 23:03:48 package {sphinx-rtd-theme 0.0.0  PyPI PyPI}
2023/04/19 23:03:48 package {sphinxcontrib-napoleon 0.0.0  PyPI PyPI}
2023/04/19 23:03:48 package {tox 0.0.0  PyPI PyPI}

Finally there's https://github.com/luxonis/depthai-python which is slightly different scenario. As DoScan walks directory it will do the following:

start at docs/readthedocs/requirements.txt -> docs/requirements.txt -> docs/requirements_mkdoc.txt
start at docs/requirements.txt -> docs/requirements_mkdoc.txt

So some of the packages will be duplicated across queries since they start from different lockfiles. Which I think is the correct behavior one would want to see in the results, but does cause some extra queries to OSV.

oliverchang · 2023-05-03T23:16:48Z

@another-rex can you take a look here?

G-Rath · 2023-05-04T00:00:01Z

@oliverchang this is also one I'm working on over in the detector - I've not got a fix yet, but will probably have one in the next couple of days, so feel free to assign it to me if you want

Ported from G-Rath/osv-detector#191 Note that this is not actually supported by `pip` itself, but doing so actually optimizes the parser a bit anyway by only reading each file exactly once regardless of how often it is required fixes: google#354 Co-authored-by: Gareth Jones <[email protected]>

Ported from G-Rath/osv-detector#191 Note that this is not actually supported by `pip` itself, but doing so actually optimizes the parser a bit anyway by only reading each file exactly once regardless of how often it is required fixes: #354 Co-authored-by: Gareth Jones <[email protected]>

oliverchang added the bug Something isn't working label Apr 19, 2023

oliverchang assigned another-rex May 3, 2023

another-rex assigned G-Rath and unassigned another-rex May 4, 2023

robotdana mentioned this issue May 4, 2023

fix: handle cyclical -rs in requirements.txt #366

Merged

another-rex closed this as completed in #366 May 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cycle in requirements.txt causes infinite recursion #354

cycle in requirements.txt causes infinite recursion #354

spencerschrock commented Apr 19, 2023 •

edited

Loading

G-Rath commented Apr 19, 2023 •

edited

Loading

spencerschrock commented Apr 19, 2023

G-Rath commented Apr 19, 2023

spencerschrock commented Apr 19, 2023

spencerschrock commented Apr 20, 2023 •

edited

Loading

oliverchang commented May 3, 2023

G-Rath commented May 4, 2023

cycle in requirements.txt causes infinite recursion #354

cycle in requirements.txt causes infinite recursion #354

Comments

spencerschrock commented Apr 19, 2023 • edited Loading

G-Rath commented Apr 19, 2023 • edited Loading

spencerschrock commented Apr 19, 2023

G-Rath commented Apr 19, 2023

spencerschrock commented Apr 19, 2023

spencerschrock commented Apr 20, 2023 • edited Loading

oliverchang commented May 3, 2023

G-Rath commented May 4, 2023

spencerschrock commented Apr 19, 2023 •

edited

Loading

G-Rath commented Apr 19, 2023 •

edited

Loading

spencerschrock commented Apr 20, 2023 •

edited

Loading