How to do initialization for multiprocessing? #5944

sconlyshootery · 2022-09-01T08:55:43Z

For a program A where multiprocessing is conducted to run program B, it seems that I could only put ti.init() in B rather A, causing wasting a lot of time for initialization. Any suggestion?

jim19930609 · 2022-09-02T08:03:55Z

Hi sconlyshootery,
In terms of "multiprocessing", I can think of different uses and each of them have different semantics regarding whether it should re-initialize the environments. May I ask for a simple example code to demonstrate how you're using multiprocessing?

The other question is how annoying it is if we have to re-initialize in B? Any possible numerical results in terms of the initialization latency?

sconlyshootery · 2022-09-05T09:26:57Z

Hi, thank you for your kind reply:
I aim to produce depth maps from point cloud. I found taichi is really useful for this task, it will be 2 times faster than numba. By using it, it will produce a depth map from 1000000+ points at about 0.2 seconds, the initialization will cost 0.1 seconds. So I wonder if initialization could be done only once, the program will run 2 times faster. A simple example is here:

def main(args):
    pool = Pool(processes=args.mt_num)
    pool.map(projectPoints_ti, [pc1, pc2, pc3, ...])

def projectPoints_ti(pc, intrinsics, output_size):
    """
    pc: 3D points in world coordinates, 3*n
    intrinsics: 3 * 3
    output_size: depth image size (h, w)
    """
    """project to image coordinates"""
    pc = intrinsics @ pc  # 3*n
    pc = pc.T  # n*3
    pc[:, :2] = pc[:, :2] / pc[:, 2][..., np.newaxis]

    h, w = output_size

    ti.init(arch=ti.cpu)
    depth = ti.field(dtype=ti.f64, shape=(h, w))

    @ti.kernel
    def pcd2depth(pc: ti.types.ndarray()):

        """get depth"""
        for i in range(pc.shape[0]):
            # check if in bounds
            # use minus 1 to get the exact same value as KITTI matlab code
            x = int(ti.round(pc[i, 0]) - 1)
            y = int(ti.round(pc[i, 1]) - 1)
            z = pc[i, 2]
            if x < 0 or x >= w or y < 0 or y >= h or z <= 0.1:
                continue
            if depth[y, x] > 0:
                depth[y, x] = min(z, depth[y, x])
            else:
                depth[y, x] = z
    pcd2depth(pc)
    return depth.to_numpy()

I am new to taichi, I am not sure whether this is the best way to use it. Some tips are also welcomed.

jim19930609 · 2022-09-06T11:29:14Z

Hi sconlyshootery,
Thanks for providing the example code!

For this use case, looks like each process is using the same kernel pcd2depth(), but with different pc (Ndarray) and depth (Field) types. In that case, Taichi will compile one kernel for each pc + depth combination - similar to how template functions are handled in C++, and then execute them. Since ti.init() does memory preallocation plus Taichi's compilation and kernel execution are not thread safe, we're likely gonna get data conflicts in the case of "init once, compile and execute with multiprocesses".

However, Taichi does have a way to parallelize the above mentioned compile and execute multiple kernels, by taking advantage of our Async Executor. For example, a psudocode for the same example but with Async Executor might look like:

def prepare_pc_and_hw(...):
    pc = intrinsics @ pc  # 3*n
    pc = pc.T  # n*3
    pc[:, :2] = pc[:, :2] / pc[:, 2][..., np.newaxis]
    h, w = output_size
    return pc, (h, w)

def main(args):
    pool = Pool(processes=args.mt_num)
    inputs = pool.map(prepare_pc_and_hw, [pc1, pc2, pc3, ...])

    @ti.kernel
    def pcd2depth(pc: ti.types.ndarray(), depth=ti.template()):
           ....
    
    # Start of Async Execution
    async_engine = ti.AsyncExecutor
    for pc, h, w in inputs:
        depth = ti.field(dtype=ti.f64, shape=(h, w))
        async_engine.submit(pcd2depth(pc, depth))
    async_engine.wait()
    ...

Basically, the idea is to put the preparation parts (prepare for pc and h, w to be used in creating depth) in Python's multiprocessing. After all the preparations' done, we switch to use Taichi's AsyncEngine to accelerate Taichi's compilation and kernel execution.

Let me know whether this approach fits your need. In addition, since Async Executor isn't something officially released yet, the above codes are seriously "psudo" codes. However, we can try to arrange sth working if you are interested in trying it out.

sconlyshootery · 2022-09-07T03:55:52Z

Hi, Jim. Thank you for your kind reply.
My main concern is that preparation will produce too much data, causing the machine overloaded.
I am very glad to try it out.

jim19930609 · 2022-09-08T10:32:10Z

Thanks! Let me also cc @ailzhang and @lin-hitonami since this has something to do with Async Engine, I guess we'll need some internal discussions first.

oliver-batchelor · 2023-02-06T11:01:36Z

AsyncExecutor does not seem to exist anymore - did it change name to something else? I'm trying to figure out how I'd use taichi from multiple threads.

jim19930609 · 2023-02-07T01:44:26Z

Hi oliver,
We did deprecated the AsyncExecutor for now since it's not actively maintained. In some previous offline discussions, we did plan to add it back but there's few valid use cases for now.

Can you describe a little bit more about your task, and why multi-threading is important? Thanks in advance!

oliver-batchelor · 2023-02-07T02:01:34Z

Hi Zhanlue, I can implement my own queue in a single thread which runs all Taichi operations asynchronously - but I saw this and imagined maybe this was a more efficient way to do it. For example data uploading (GPU) and kernel execution can be overlapped. I have a couple of use cases: 1) A camera ISP pipeline where I'm grabbing frames off multiple cameras and processing them, at the moment I have a pool of threads processing camera images as there is some CPU work too 2) A GUI application where I have a mix of synchronous and fast operations (such as picking) as well as some slower long-running operations which might take a second or two, but the UI needs to remain responsive I have found I can implement these both in Taichi very nicely (previously I used pytorch) - but I've yet to integrate them. Cheers, Oliver

…

On Tue, Feb 7, 2023 at 2:44 PM Zhanlue Yang ***@***.***> wrote: Hi oliver, We did deprecated the AsyncExecutor for now since it's not actively maintained. In some previous offline discussions, we did plan to add it back but there's few valid use cases for now. Can you describe a little bit more about your task, and why multi-threading is important? Thanks in advance! — Reply to this email directly, view it on GitHub <#5944 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAITRZLVJCZINTKUXWLHJF3WWGSINANCNFSM6AAAAAAQCEJEUQ> . You are receiving this because you commented.Message ID: ***@***.***>

jim19930609 · 2023-02-08T01:43:45Z

Hi Oliver,
Thanks so much for providing us these use cases. Looks like it's gonna be AsyncEngine + Heterogeneous Support (able to execute kernels on different backends in the same run). Let me bring this topic to our Issue Triage Meeting this Friday. Thanks!

sconlyshootery added the question Question on using Taichi label Sep 1, 2022

taichi-gardener added this to Taichi Lang Sep 1, 2022

taichi-gardener moved this to Untriaged in Taichi Lang Sep 1, 2022

strongoier assigned jim19930609 Sep 2, 2022

strongoier moved this from Untriaged to Todo in Taichi Lang Sep 2, 2022

strongoier moved this from Todo to Backlog in Taichi Lang Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to do initialization for multiprocessing? #5944

How to do initialization for multiprocessing? #5944

sconlyshootery commented Sep 1, 2022

jim19930609 commented Sep 2, 2022 •

edited

Loading

sconlyshootery commented Sep 5, 2022 •

edited

Loading

jim19930609 commented Sep 6, 2022

sconlyshootery commented Sep 7, 2022

jim19930609 commented Sep 8, 2022

oliver-batchelor commented Feb 6, 2023

jim19930609 commented Feb 7, 2023

oliver-batchelor commented Feb 7, 2023 via email

jim19930609 commented Feb 8, 2023

How to do initialization for multiprocessing? #5944

How to do initialization for multiprocessing? #5944

Comments

sconlyshootery commented Sep 1, 2022

jim19930609 commented Sep 2, 2022 • edited Loading

sconlyshootery commented Sep 5, 2022 • edited Loading

jim19930609 commented Sep 6, 2022

sconlyshootery commented Sep 7, 2022

jim19930609 commented Sep 8, 2022

oliver-batchelor commented Feb 6, 2023

jim19930609 commented Feb 7, 2023

oliver-batchelor commented Feb 7, 2023 via email

jim19930609 commented Feb 8, 2023

jim19930609 commented Sep 2, 2022 •

edited

Loading

sconlyshootery commented Sep 5, 2022 •

edited

Loading