'ValueError: only one element tensors can be converted to Python scalars' failure of TensorRT 8.2 when running NVIDIA DALI on GPU V100 #3597

juinshell · 2024-01-13T06:13:18Z

Description

Hi Nvidia Team,

I'm testing run a DNN workflow(SSD+Resnet50) with TensorRT and NVIDIA DALI(for preprocess data). I convert a Pytorch SSD pretrained model to onnx format and load it to build a TensorRT engine.

However, when I test run SSD inference, I do not know how to adapt other framework's data into TensorRT's input. For example, how to convert NVIDIA DALI's nvidia.dali.backend_impl.TensorGPU into tensorrt.tensorrt.IExecutionContext.execute()'s input? If this is difficult, is it possible to convert a 'torch.Tensor' as input data of TensorRT?

Environment

TensorRT Version: 8.2.5.1

NVIDIA GPU: V100

NVIDIA Driver Version: 520.61.05

CUDA Version: 11.8

CUDNN Version: 8401

Operating System: 4.15.0-45-generic #48-Ubuntu

Python Version (if applicable): 3.8.13

Tensorflow Version (if applicable): N/A

PyTorch Version (if applicable): 1.13.0

Baremetal or Container (if so, version): nvcr.io/nvidia/pytorch:22.06-py3

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import torch
import tensorrt as trt
import numpy as np
import time

import pycuda.driver as cuda  

import pycuda.autoinit

print("PyCUDA Version:", cuda.get_version())
print("CUDNN Version:", torch.backends.cudnn.version())
print("Torch Version:", torch.__version__)
print("TensorRT Version:", trt.__version__)

batch = 4
# Step 1: Load pretrained model from PyTorch Hub
ssd_model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd')
ssd_model.eval()
ssd_model = ssd_model.cuda()

# Step 2: Export PyTorch model to ONNX
ssd_dummy_input = torch.randn(batch, 3, 300, 300).cuda()
ssd_onnx_file_path = 'ssd.onnx'
torch.onnx.export(ssd_model, ssd_dummy_input, ssd_onnx_file_path)

# Step 3: Create a TensorRT engine from the ONNX model
TRT_LOGGER = trt.Logger(trt.Logger.ERROR)
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)

def make_rt_engine(onnx_file_path):
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
        config = builder.create_builder_config()
        config.max_workspace_size = 1 << 34
        builder.max_batch_size = batch
        
        if builder.platform_has_fast_fp16:
            config.set_flag(trt.BuilderFlag.FP16)

        # Parse the ONNX model
        print("Parsing onnx file {}...".format(onnx_file_path))
        with open(onnx_file_path, 'rb') as model_file:
            if not parser.parse(model_file.read()):
                for error in range(parser.num_errors):
                    print(parser.get_error(error))

        # Build and serialize the TensorRT engine
        profile = builder.create_optimization_profile()
        config.add_optimization_profile(profile)

        print("Building an engine...")
        engine = builder.build_engine(network, config)
    return engine

def execute_ssd_engine(engine, input_data):
    with engine.create_execution_context() as context:
        max_nboxes = 8732
        h_ploc = np.zeros((batch, 4, max_nboxes), dtype=np.float32)
        h_plabel = np.zeros((batch, 81, max_nboxes), dtype=np.float32)

        d_ploc = cuda.mem_alloc(h_ploc.nbytes) 
        d_plabel = cuda.mem_alloc(h_plabel.nbytes)

        print("d_ploc: ", d_ploc)
        print("d_plabel: ", d_plabel)
        print("context type: ", type(context))
        input("Press Enter to continue...")
        
        # Execute inference
        # warmup
        print("[SSD]warmup...")
        for i in range(50):
            context.execute(batch, [int(input_data), int(d_ploc), int(d_plabel)])
        
        print("[SSD]start inference...")
        times = []
        for i in range(100):
            T1=time.perf_counter()   
            context.execute(batch, [int(input_data), int(d_ploc), int(d_plabel)])
            T2=time.perf_counter()
            times.append(T2-T1)
            if (i + 1) % 10 == 0:
                print('[SSD]TensorRT Inference {:d}/{:d}: {:.3f}ms'.format(i + 1, 100, np.mean(times) * 1000))

        # Transfer output data to host
        cuda.memcpy_dtoh(h_ploc, d_ploc)
        cuda.memcpy_dtoh(h_plabel, d_plabel)
        print("[SSD]finish!")
    return h_ploc, h_plabel

ssd_engine = make_rt_engine(ssd_onnx_file_path)

# DALI
import nvidia.dali.fn as fn
from nvidia.dali.pipeline.experimental import pipeline_def
import nvidia.dali.types as types
from nvidia.dali.plugin.pytorch import feed_ndarray

@pipeline_def()
def simple_pipeline(resize_flag=True, resize_x=300, resize_y=300):
    jpegs, labels = fn.readers.file(file_root='./images',
                                    random_shuffle=True,
                                    name="Reader")
    images = fn.decoders.image(jpegs, device="mixed", output_type=types.RGB)

    # Resize images
    if resize_flag:
        resized_images = fn.resize(images, resize_x=resize_x, resize_y=resize_y, device="gpu")

    # Normalize images
    mean = [0.485 * 255, 0.456 * 255, 0.406 * 255]
    std = [0.229 * 255, 0.224 * 255, 0.225 * 255]

    normalized_images = fn.crop_mirror_normalize(resized_images,
                                    mean=mean,
                                    std=std,
                                    output_dtype=types.FLOAT,
                                    device="gpu")

    return normalized_images, resized_images

pipe = simple_pipeline(batch_size=4, num_threads=3, device_id=0)
pipe.build()

import nvidia.dali.types as types

_images, resized_images = pipe.run()
print("-----_images type: ", type(_images)) # nvidia.dali.backend_impl.TensorListGPU
print("-----_images.dtype: ", _images.dtype) # DALIDataType.FLOAT
print("-----_images.shape: ", _images.shape()) # [(3, 300, 300), (3, 300, 300), (3, 300, 300), (3, 300, 300)]
print("-----_images.layout:", _images.layout()) # CHW

input("Press Enter to continue...")

_images_dali_tensor = _images.as_tensor()
print("-----_images_dali_tensor type:", type(_images_dali_tensor))  # <class 'nvidia.dali.backend_impl.TensorGPU'>
print("-----_images_dali_tensor dtype:", _images_dali_tensor.dtype())  # =f
print("-----_images_dali_tensor shape:", _images_dali_tensor.shape())  # (4, 3, 300, 300)
print("-----_images_dali_tensor layout:", _images_dali_tensor.layout())  # NCHW

_images_dali_tensor_data_ptr = _images_dali_tensor.data_ptr()
print("-----_images_dali_tensor_data_ptr:", _images_dali_tensor_data_ptr) # 4398129168384
...

Have you tried the latest release?: no

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): do not test

The text was updated successfully, but these errors were encountered:

zerollzeng · 2024-01-15T15:01:32Z

is it possible to convert a 'torch.Tensor' as input data of TensorRT?

Check #2506

juinshell · 2024-01-21T02:24:19Z

Hi @zerollzeng ,
Thanks for your help! I use .data_ptr() and it works! In my test, tensorRT can take NVIDIA DALI TensorListGPU.data_ptr as the input.

juinshell changed the title ~~Segmentation fault (core dumped) failure of TensorRT 8.2 output when running SSD on GPU V100~~ 'ValueError: only one element tensors can be converted to Python scalars' failure of TensorRT 8.2 when running NVIDIA DALI on GPU V100 Jan 13, 2024

zerollzeng self-assigned this Jan 15, 2024

zerollzeng added the triaged Issue has been triaged by maintainers label Jan 15, 2024

juinshell closed this as completed Jan 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'ValueError: only one element tensors can be converted to Python scalars' failure of TensorRT 8.2 when running NVIDIA DALI on GPU V100 #3597

'ValueError: only one element tensors can be converted to Python scalars' failure of TensorRT 8.2 when running NVIDIA DALI on GPU V100 #3597

juinshell commented Jan 13, 2024 •

edited

Loading

zerollzeng commented Jan 15, 2024

juinshell commented Jan 21, 2024

'ValueError: only one element tensors can be converted to Python scalars' failure of TensorRT 8.2 when running NVIDIA DALI on GPU V100 #3597

'ValueError: only one element tensors can be converted to Python scalars' failure of TensorRT 8.2 when running NVIDIA DALI on GPU V100 #3597

Comments

juinshell commented Jan 13, 2024 • edited Loading

Description

Environment

Relevant Files

Steps To Reproduce

zerollzeng commented Jan 15, 2024

juinshell commented Jan 21, 2024

juinshell commented Jan 13, 2024 •

edited

Loading