Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to do initialization for multiprocessing? #5944

Open
sconlyshootery opened this issue Sep 1, 2022 · 9 comments
Open

How to do initialization for multiprocessing? #5944

sconlyshootery opened this issue Sep 1, 2022 · 9 comments
Assignees
Labels
question Question on using Taichi

Comments

@sconlyshootery
Copy link

For a program A where multiprocessing is conducted to run program B, it seems that I could only put ti.init() in B rather A, causing wasting a lot of time for initialization. Any suggestion?

@sconlyshootery sconlyshootery added the question Question on using Taichi label Sep 1, 2022
@taichi-gardener taichi-gardener moved this to Untriaged in Taichi Lang Sep 1, 2022
@strongoier strongoier moved this from Untriaged to Todo in Taichi Lang Sep 2, 2022
@jim19930609
Copy link
Contributor

jim19930609 commented Sep 2, 2022

Hi sconlyshootery,
In terms of "multiprocessing", I can think of different uses and each of them have different semantics regarding whether it should re-initialize the environments. May I ask for a simple example code to demonstrate how you're using multiprocessing?

The other question is how annoying it is if we have to re-initialize in B? Any possible numerical results in terms of the initialization latency?

@sconlyshootery
Copy link
Author

sconlyshootery commented Sep 5, 2022

Hi, thank you for your kind reply:
I aim to produce depth maps from point cloud. I found taichi is really useful for this task, it will be 2 times faster than numba. By using it, it will produce a depth map from 1000000+ points at about 0.2 seconds, the initialization will cost 0.1 seconds. So I wonder if initialization could be done only once, the program will run 2 times faster. A simple example is here:

def main(args):
    pool = Pool(processes=args.mt_num)
    pool.map(projectPoints_ti, [pc1, pc2, pc3, ...])

def projectPoints_ti(pc, intrinsics, output_size):
    """
    pc: 3D points in world coordinates, 3*n
    intrinsics: 3 * 3
    output_size: depth image size (h, w)
    """
    """project to image coordinates"""
    pc = intrinsics @ pc  # 3*n
    pc = pc.T  # n*3
    pc[:, :2] = pc[:, :2] / pc[:, 2][..., np.newaxis]

    h, w = output_size

    ti.init(arch=ti.cpu)
    depth = ti.field(dtype=ti.f64, shape=(h, w))

    @ti.kernel
    def pcd2depth(pc: ti.types.ndarray()):

        """get depth"""
        for i in range(pc.shape[0]):
            # check if in bounds
            # use minus 1 to get the exact same value as KITTI matlab code
            x = int(ti.round(pc[i, 0]) - 1)
            y = int(ti.round(pc[i, 1]) - 1)
            z = pc[i, 2]
            if x < 0 or x >= w or y < 0 or y >= h or z <= 0.1:
                continue
            if depth[y, x] > 0:
                depth[y, x] = min(z, depth[y, x])
            else:
                depth[y, x] = z
    pcd2depth(pc)
    return depth.to_numpy()

I am new to taichi, I am not sure whether this is the best way to use it. Some tips are also welcomed.

@jim19930609
Copy link
Contributor

Hi sconlyshootery,
Thanks for providing the example code!

For this use case, looks like each process is using the same kernel pcd2depth(), but with different pc (Ndarray) and depth (Field) types. In that case, Taichi will compile one kernel for each pc + depth combination - similar to how template functions are handled in C++, and then execute them. Since ti.init() does memory preallocation plus Taichi's compilation and kernel execution are not thread safe, we're likely gonna get data conflicts in the case of "init once, compile and execute with multiprocesses".

However, Taichi does have a way to parallelize the above mentioned compile and execute multiple kernels, by taking advantage of our Async Executor. For example, a psudocode for the same example but with Async Executor might look like:

def prepare_pc_and_hw(...):
    pc = intrinsics @ pc  # 3*n
    pc = pc.T  # n*3
    pc[:, :2] = pc[:, :2] / pc[:, 2][..., np.newaxis]
    h, w = output_size
    return pc, (h, w)

def main(args):
    pool = Pool(processes=args.mt_num)
    inputs = pool.map(prepare_pc_and_hw, [pc1, pc2, pc3, ...])

    @ti.kernel
    def pcd2depth(pc: ti.types.ndarray(), depth=ti.template()):
           ....
    
    # Start of Async Execution
    async_engine = ti.AsyncExecutor
    for pc, h, w in inputs:
        depth = ti.field(dtype=ti.f64, shape=(h, w))
        async_engine.submit(pcd2depth(pc, depth))
    async_engine.wait()
    ...

Basically, the idea is to put the preparation parts (prepare for pc and h, w to be used in creating depth) in Python's multiprocessing. After all the preparations' done, we switch to use Taichi's AsyncEngine to accelerate Taichi's compilation and kernel execution.

Let me know whether this approach fits your need. In addition, since Async Executor isn't something officially released yet, the above codes are seriously "psudo" codes. However, we can try to arrange sth working if you are interested in trying it out.

@sconlyshootery
Copy link
Author

Hi, Jim. Thank you for your kind reply.
My main concern is that preparation will produce too much data, causing the machine overloaded.
I am very glad to try it out.

@jim19930609
Copy link
Contributor

Thanks! Let me also cc @ailzhang and @lin-hitonami since this has something to do with Async Engine, I guess we'll need some internal discussions first.

@oliver-batchelor
Copy link
Contributor

AsyncExecutor does not seem to exist anymore - did it change name to something else? I'm trying to figure out how I'd use taichi from multiple threads.

@jim19930609
Copy link
Contributor

Hi oliver,
We did deprecated the AsyncExecutor for now since it's not actively maintained. In some previous offline discussions, we did plan to add it back but there's few valid use cases for now.

Can you describe a little bit more about your task, and why multi-threading is important? Thanks in advance!

@oliver-batchelor
Copy link
Contributor

oliver-batchelor commented Feb 7, 2023 via email

@jim19930609
Copy link
Contributor

Hi Oliver,
Thanks so much for providing us these use cases. Looks like it's gonna be AsyncEngine + Heterogeneous Support (able to execute kernels on different backends in the same run). Let me bring this topic to our Issue Triage Meeting this Friday. Thanks!

@strongoier strongoier moved this from Todo to Backlog in Taichi Lang Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question on using Taichi
Projects
Status: Backlog
Development

No branches or pull requests

3 participants