Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix swig ownership when assigning to struct pointer field #3139

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mdouze
Copy link
Contributor

@mdouze mdouze commented Nov 22, 2023

Summary: When faiss is called from SWIG, it is usually assumed that Python will do all the memory management, therefore assigning to a pointer in C++ should not transfer ownerhsip (default SWIG behavior). This diff fixed that.

Differential Revision: D51529085

Summary: When faiss is called from SWIG, it is usually assumed that Python will do all the memory management, therefore assigning to a pointer in C++ should not transfer ownerhsip (default SWIG behavior). This diff fixed that.

Differential Revision: D51529085
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51529085

@mdouze
Copy link
Contributor Author

mdouze commented Nov 22, 2023

hopefully this will finally fix #2996

@junjieqi
Copy link
Contributor

@mdouze do we still need this one?

@junjieqi junjieqi requested a review from mnorris11 July 31, 2024 18:08
@junjieqi junjieqi removed their assignment Jul 31, 2024
@mnorris11
Copy link

Even though the final assert fails since the ownership disappears, it seems like the memory does not increase. This typemap looks right to me when I test it. We can discuss on next steps.

@mnorris11
Copy link

Current status after discussion:

ASAN does not seem to detect these Python memory leaks. There are no ASAN errors in a simple repro unit test like

def test_ownership_2(self):
        d = 32
        quantizer = faiss.IndexFlatL2(d)
        quantizer.this.own(False)

The destructor of IndexFlatL2 does not get called when quantizer.this.own(False). Removing this line will call the destructor.

We should introduce some Python unit test to catch these memory leaks.

mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Aug 29, 2024
…h#3704)

Summary:
Pull Request resolved: facebookresearch#3704

Background
--
Issue: facebookresearch#2996

Prior attempted fix: https://github.com/facebookresearch/faiss/pull/3007/files#diff-d704f33c46a4ef2936c8cf2a66b05f1993e25e79ee5c19d4b63c4e0cf46b0a42

Partially copied from facebookresearch#3139 with an additional unit test.

It is a confirmed and reproducible memory leak every time. There is a workaround. See the comments on facebookresearch#2996.

Relevant SWIG docs: https://www.swig.org/Doc4.1/SWIGDocumentation.html#Typemaps_nn2

Current status
--

`buck test faiss/tests:test_search_params -- test_ownership_2`

Test prints:

without SWIG fix:
`[0.49, 0.82, 1.15, 1.39, 1.67, 1.88, 2.16, 2.36, 2.65, 2.85, 3.13, 3.34, 3.62, 3.82, 4.11, 4.31, 4.6, 4.8, 5.08, 5.28]`

with SWIG fix:
`[0.52, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71]`

Note: This test is not ideal. Ideally we could enable ASAN to catch these memory leaks. But ASAN does not seem to catch these Python memory leaks. Example:
```
def test_ownership_3(self):
        d = 32
        quantizer = faiss.IndexFlatL2(d)
        quantizer.this.own(False)
```
The above test produces no ASAN error, even though the `quantizer` object leaks.

Differential Revision: D60151215
mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Aug 29, 2024
…h#3704)

Summary:
Pull Request resolved: facebookresearch#3704

Background
--
Issue: facebookresearch#2996

Prior attempted fix: https://github.com/facebookresearch/faiss/pull/3007/files#diff-d704f33c46a4ef2936c8cf2a66b05f1993e25e79ee5c19d4b63c4e0cf46b0a42

Partially copied from facebookresearch#3139 with an additional unit test.

It is a confirmed and reproducible memory leak every time. There is a workaround. See the comments on facebookresearch#2996.

Relevant SWIG docs: https://www.swig.org/Doc4.1/SWIGDocumentation.html#Typemaps_nn2

Current status
--

`buck test faiss/tests:test_search_params -- test_ownership_2`

Test prints:

without SWIG fix:
`[0.49, 0.82, 1.15, 1.39, 1.67, 1.88, 2.16, 2.36, 2.65, 2.85, 3.13, 3.34, 3.62, 3.82, 4.11, 4.31, 4.6, 4.8, 5.08, 5.28]`

with SWIG fix:
`[0.52, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71]`

Note: This test is not ideal. Ideally we could enable ASAN to catch these memory leaks. But ASAN does not seem to catch these Python memory leaks. Example:
```
def test_ownership_3(self):
        d = 32
        quantizer = faiss.IndexFlatL2(d)
        quantizer.this.own(False)
```
The above test produces no ASAN error, even though the `quantizer` object leaks.

Differential Revision: D61992599
mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Aug 29, 2024
…h#3810)

Summary:
Pull Request resolved: facebookresearch#3810

Pull Request resolved: facebookresearch#3704

Background
--
Issue: facebookresearch#2996

Prior attempted fix: https://github.com/facebookresearch/faiss/pull/3007/files#diff-d704f33c46a4ef2936c8cf2a66b05f1993e25e79ee5c19d4b63c4e0cf46b0a42

Partially copied from facebookresearch#3139 with an additional unit test.

It is a confirmed and reproducible memory leak every time. There is a workaround. See the comments on facebookresearch#2996.

Relevant SWIG docs: https://www.swig.org/Doc4.1/SWIGDocumentation.html#Typemaps_nn2

Current status
--

`buck test faiss/tests:test_search_params -- test_ownership_2`

Test prints:

without SWIG fix:
`[0.49, 0.82, 1.15, 1.39, 1.67, 1.88, 2.16, 2.36, 2.65, 2.85, 3.13, 3.34, 3.62, 3.82, 4.11, 4.31, 4.6, 4.8, 5.08, 5.28]`

with SWIG fix:
`[0.52, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71]`

Note: This test is not ideal. Ideally we could enable ASAN to catch these memory leaks. But ASAN does not seem to catch these Python memory leaks. Example:
```
def test_ownership_3(self):
        d = 32
        quantizer = faiss.IndexFlatL2(d)
        quantizer.this.own(False)
```
The above test produces no ASAN error, even though the `quantizer` object leaks.

Differential Revision: D61992599
mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Sep 12, 2024
…h#3810)

Summary:
Pull Request resolved: facebookresearch#3810

Pull Request resolved: facebookresearch#3704

Background
--
Issue: facebookresearch#2996

Prior attempted fix: https://github.com/facebookresearch/faiss/pull/3007/files#diff-d704f33c46a4ef2936c8cf2a66b05f1993e25e79ee5c19d4b63c4e0cf46b0a42

Partially copied from facebookresearch#3139 with an additional unit test.

It is a confirmed and reproducible memory leak every time. There is a workaround. See the comments on facebookresearch#2996.

Relevant SWIG docs: https://www.swig.org/Doc4.1/SWIGDocumentation.html#Typemaps_nn2

Current status
--

`buck test faiss/tests:test_search_params -- test_ownership_2`

Test prints:

without SWIG fix:
`[0.49, 0.82, 1.15, 1.39, 1.67, 1.88, 2.16, 2.36, 2.65, 2.85, 3.13, 3.34, 3.62, 3.82, 4.11, 4.31, 4.6, 4.8, 5.08, 5.28]`

with SWIG fix:
`[0.52, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71]`

Note: This test is not ideal. Ideally we could enable ASAN to catch these memory leaks. But ASAN does not seem to catch these Python memory leaks. Example:
```
def test_ownership_3(self):
        d = 32
        quantizer = faiss.IndexFlatL2(d)
        quantizer.this.own(False)
```
The above test produces no ASAN error, even though the `quantizer` object leaks.

Why change HNSW test?
--
This fix causes the HNSW test to fail with heap-use-after-free. This is because the index.storage.get_distance_computer() somehow gets freed during clone_index, but only when reassigning to the same variable like `index.storage = clone(index.storage)`. I checked in https://fburl.com/code/qw6fznjt, and it is non-null before returning on the CPP side.

After adding the temp variable, I also had to set `index.own_fields = False`, otherwise we get a heap-use-after-free again due to it being deleted already.

Differential Revision: D61992599
mnorris11 pushed a commit to mnorris11/faiss that referenced this pull request Sep 12, 2024
…h#3810)

Summary:
Pull Request resolved: facebookresearch#3810

Pull Request resolved: facebookresearch#3704

Background
--
Issue: facebookresearch#2996

Prior attempted fix: https://github.com/facebookresearch/faiss/pull/3007/files#diff-d704f33c46a4ef2936c8cf2a66b05f1993e25e79ee5c19d4b63c4e0cf46b0a42

Partially copied from facebookresearch#3139 with an additional unit test.

It is a confirmed and reproducible memory leak every time. There is a workaround. See the comments on facebookresearch#2996.

Relevant SWIG docs: https://www.swig.org/Doc4.1/SWIGDocumentation.html#Typemaps_nn2

Current status
--

`buck test faiss/tests:test_search_params -- test_ownership_2`

Test prints:

without SWIG fix:
`[0.49, 0.82, 1.15, 1.39, 1.67, 1.88, 2.16, 2.36, 2.65, 2.85, 3.13, 3.34, 3.62, 3.82, 4.11, 4.31, 4.6, 4.8, 5.08, 5.28]`

with SWIG fix:
`[0.52, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71]`

Note: This test is not ideal. Ideally we could enable ASAN to catch these memory leaks. But ASAN does not seem to catch these Python memory leaks. Example:
```
def test_ownership_3(self):
        d = 32
        quantizer = faiss.IndexFlatL2(d)
        quantizer.this.own(False)
```
The above test produces no ASAN error, even though the `quantizer` object leaks.

Why change HNSW test?
--
This fix causes the HNSW test to fail with heap-use-after-free. This is because the index.storage.get_distance_computer() somehow gets freed during clone_index, but only when reassigning to the same variable like `index.storage = clone(index.storage)`. I checked in https://fburl.com/code/qw6fznjt, and it is non-null before returning on the CPP side.

After adding the temp variable, I also had to set `index.own_fields = False`, otherwise we get a heap-use-after-free again due to it being deleted already.

Differential Revision: D61992599
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

5 participants