Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is this memory leak? #6133

Closed
RitChan opened this issue Sep 22, 2022 · 4 comments
Closed

Is this memory leak? #6133

RitChan opened this issue Sep 22, 2022 · 4 comments
Assignees
Labels
potential bug Something that looks like a bug but not yet confirmed

Comments

@RitChan
Copy link

RitChan commented Sep 22, 2022

Describe the bug
Data-oriented class with python objects may cause memory leak.

To Reproduce

import gc
import os
import psutil
import taichi as ti


@ti.data_oriented
class X:
    def __init__(self):
        self.py_l = [0] * 5242880  # a list containing 5M integers (5 * 2^20)

    @ti.kernel
    def run(self):
        for i in range(1):
            pass


def get_process_memory():
    process = psutil.Process(os.getpid())
    mem_info = process.memory_info()
    return mem_info.rss


def main():
    ti.init(ti.cpu)
    for i in range(20):
        X().run()
        gc.collect()
        print(f"Iteration {i}, memory usage: {get_process_memory() / 1e6} MB")


if __name__ == '__main__':
    main()

Log/Screenshots

[Taichi] version 1.1.2, llvm 10.0.0, commit f25cf4a2, win, python 3.7.9
[Taichi] Starting on arch=x64
Iteration 0, memory usage: 272.740352 MB
Iteration 1, memory usage: 314.687488 MB
Iteration 2, memory usage: 356.634624 MB
Iteration 3, memory usage: 398.58176 MB
Iteration 4, memory usage: 440.528896 MB
Iteration 5, memory usage: 482.476032 MB
Iteration 6, memory usage: 524.423168 MB
Iteration 7, memory usage: 566.370304 MB
Iteration 8, memory usage: 608.31744 MB
Iteration 9, memory usage: 650.264576 MB
Iteration 10, memory usage: 692.211712 MB
Iteration 11, memory usage: 734.158848 MB
Iteration 12, memory usage: 776.105984 MB
Iteration 13, memory usage: 818.05312 MB
Iteration 14, memory usage: 860.000256 MB
Iteration 15, memory usage: 901.947392 MB
Iteration 16, memory usage: 943.894528 MB
Iteration 17, memory usage: 985.841664 MB
Iteration 18, memory usage: 1027.7888 MB
Iteration 19, memory usage: 1069.735936 MB

Additional comments
Data-oriented class with only numpy objects is fine.

import gc
import os

import numpy as np
import psutil
import taichi as ti


@ti.data_oriented
class X:
    def __init__(self):
        self.np_l = np.zeros(shape=5242880 * 20, dtype="f4")  # 100 M
        # self.py_l = [0] * 5242880

    @ti.kernel
    def run(self):
        for i in range(1):
            pass


def get_process_memory():
    process = psutil.Process(os.getpid())
    mem_info = process.memory_info()
    return mem_info.rss


def main():
    ti.init(ti.cpu)
    for i in range(20):
        X().run()
        gc.collect()
        print(f"Iteration {i}, memory usage: {get_process_memory() / 1e6} MB")


if __name__ == '__main__':
    main()

Output:

[Taichi] version 1.1.2, llvm 10.0.0, commit f25cf4a2, win, python 3.7.9
[Taichi] Starting on arch=x64
Iteration 0, memory usage: 230.866944 MB
Iteration 1, memory usage: 230.875136 MB
Iteration 2, memory usage: 230.883328 MB
Iteration 3, memory usage: 230.887424 MB
Iteration 4, memory usage: 230.89152 MB
Iteration 5, memory usage: 230.895616 MB
Iteration 6, memory usage: 230.907904 MB
Iteration 7, memory usage: 230.912 MB
Iteration 8, memory usage: 230.916096 MB
Iteration 9, memory usage: 230.920192 MB
Iteration 10, memory usage: 230.924288 MB
Iteration 11, memory usage: 230.93248 MB
Iteration 12, memory usage: 230.936576 MB
Iteration 13, memory usage: 230.940672 MB
Iteration 14, memory usage: 230.944768 MB
Iteration 15, memory usage: 230.948864 MB
Iteration 16, memory usage: 230.957056 MB
Iteration 17, memory usage: 230.961152 MB
Iteration 18, memory usage: 230.965248 MB
Iteration 19, memory usage: 230.969344 MB
@RitChan RitChan added the potential bug Something that looks like a bug but not yet confirmed label Sep 22, 2022
@taichi-gardener taichi-gardener moved this to Untriaged in Taichi Lang Sep 22, 2022
@bobcao3
Copy link
Collaborator

bobcao3 commented Sep 22, 2022

What happens if you add a ti.sync() after each run?

@turbo0628
Copy link
Member

Reproduced the same result on my machine, ti.sync() cannot help.

It's likely to be a memory leak.

@jim19930609 jim19930609 assigned jim19930609 and unassigned ailzhang Sep 23, 2022
@turbo0628 turbo0628 moved this from Untriaged to Todo in Taichi Lang Sep 23, 2022
@jim19930609
Copy link
Contributor

This memory leak is also observed if we switch the list to numpy array, if we use np.random.random([5242880]) instead of np.zero(...).

The problem with np.zero(..) is that all elements point to a single memory of zero therefore the memory for np.zeros[5242880] is fairly small.

@jim19930609
Copy link
Contributor

jim19930609 commented Sep 30, 2022

  1. Diagnose

The root cause for this memory leak lies in the following dependency chain:
program->compiled_kernels[...] = run_k
run_k->mapper->mapping = X()

In simple words, program holds kernel holds instance X, therefore X shares the same lifetime as program, surviving till the end of the python process.

  1. Possible fixes:

(1) Remove the cache (self.mapping) for parsed arguments:
2022-09-30 13-20-33 的屏幕截图

(2) Add scope support for cached kernels, where program should remove the created kernels from compiled_functions once exiting a certain scope

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
potential bug Something that looks like a bug but not yet confirmed
Projects
Status: Done
Development

No branches or pull requests

5 participants