diff --git a/docs/source/savant_101/27_working_with_models.rst b/docs/source/savant_101/27_working_with_models.rst index 2d37dbe9f..c740a0295 100644 --- a/docs/source/savant_101/27_working_with_models.rst +++ b/docs/source/savant_101/27_working_with_models.rst @@ -8,18 +8,17 @@ The listing below represents a typical Savant inference node: .. code-block:: yaml - element: nvinfer@detector - name: DetectionModel + name: detection_model model: format: etlt remote: url: "https://127.0.0.1/models/detection_model.zip" local_path: /opt/aispp/models/detection_model model_file: resnet18_dashcamnet_pruned.etlt - engine_file: resnet18.etlt_b1_gpu0_int8.engine + engine_file: resnet18_dashcamnet_pruned.etlt_b1_gpu0_int8.engine batch_size: 1 precision: int8 int8_calib_file: dashcamnet_int8.txt - mean_file: mean.ppm input: layer_name: input_1 shape: [3, 544, 960] @@ -27,35 +26,33 @@ The listing below represents a typical Savant inference node: output: layer_names: [output_cov/Sigmoid, output_bbox/BiasAdd] -The ``element`` section specifies the type of a pipeline unit. There are 4 types of units for defining models: detector, classifier, attribute_model, instance_segmentation, and complex_model. +The ``element`` section specifies the type of a pipeline unit. There are 4 types of units for defining models: :doc:`detector `, :doc:`classifier `, :doc:`attribute_model `, instance_segmentation, and :doc:`complex_model `. The ``name`` parameter defines the name of the unit. The ``name`` is used by the downstream pipeline units to refer to the objects that the unit produces. This parameter is also used to construct the path to the model files, see the ``local_path`` parameter. -The ``format`` parameter specifies the format in which the model is provided. The supported formats and the peculiarities of specifying certain parameters depending on the model format are described below. +The ``format`` parameter specifies the format in which the model is provided. The parameter is used to build the TensorRT engine and can be omitted if a pre-built engine file is provided. The supported formats and the peculiarities of specifying certain parameters depending on the model format are described below. The ``model_file`` parameter defines the name of the file with the model. The name is specified as a base name, not a full path. The ``engine_file`` parameter defines the name for the TensorRT-generated engine file. If this parameter is set, then when the pipeline is launched, the presence of this file is checked first, and if it is present, the model will be loaded from it. -If the prepared model file does not exist, then the pipeline will generate the engine for the model with the name. If you are not using a specially generated, pre-created TensorRT engine file, it is recommended not to set this field: the name will be generated automatically. +If the prepared model engine file does not exist, then the pipeline will generate the engine for the model. If you are not using a specially generated, pre-created TensorRT engine file, it is recommended not to set this field: the name will be generated automatically. The ``remote`` section specifies a URL and credentials for accessing a remote model storage. Full description below. Savant supports downloading the models from remote locations so you can easily update them without rebuilding docker images. -The ``local_path`` parameter specifies the path to the model files. It can be omitted, then the path will be automatically generated according to the following rule ``/``, where ```` is a global parameter specifying the location of all model files, set in the parameters section (description), and ```` is the name of the element. +The ``local_path`` parameter specifies the path to the model files. It can be omitted, then the path will be automatically generated according to the following rule ``/``, where ```` is a global parameter specifying the location of all model files, set in the :ref:`parameters ` section, and ```` is the name of the element. The ``batch_size`` parameter defines a batch size dimension for processing frames by the model (by default 1). -The ``precision`` parameter defines a precision of the model weights. Possible values are ``fp32``, ``fp16``, ``int8``. This parameter is set according to the precision chosen when creating the model. +The ``precision`` parameter defines the data format to be used by inference of the model. Possible values are ``fp32``, ``fp16``, ``int8``. The parameter is important for TensorRT engine creation and affects the speed of inference. ``int8`` inference is faster than ``fp16``, but requires a calibration file. ``fp16`` is faster than ``fp32``, TensorRT can perform the conversion automatically, with little or no degradation in model accuracy, so ``fp16`` is set by default. -The ``int8_calib_file`` defines the name of the calibration file in case the model has ``int8`` precision. - -The ``mean_file`` parameter defines the name of the file with the mean values for data preprocessing. The file must be in PPM format. It makes sense to use this file if you already have it, in general, it is easier to specify the necessary mean values and scaling factor for preprocessing in the input section. +The ``int8_calib_file`` defines the name of the calibration file in case the model ``precision`` is set to ``int8``. The ``input`` section describes the model input: names of input layers, dimensionality, etc. The mandatory or optional nature of the parameters in this section depends on the model format, as well as on the type of model. This section will be covered in more detail later, when describing model formats or types of models. The ``output`` section describes the model output: names of output layers, converters, selectors, etc. The mandatory or optional nature of the parameters in this section depends on the model format, as well as on the type of model. This section will be covered in more detail later, when describing model formats. -To accelerate inference in the framework, Nvidia TensorRT is used. To use a model in a pipeline, it must be presented in one of the formats supported by TensorRT: +To accelerate inference in the framework, NVIDIA TensorRT is used. To use a model in a pipeline, it must be presented in one of the formats supported by TensorRT: ONNX ---- @@ -108,7 +105,7 @@ UFF is an intermediate format for representing a model between TensorFlow and Te output: layer_names: [output_cov/Sigmoid, output_bbox/BiasAdd] -This format will no longer be supported by future releases of TensorRT (`Tensor RT release notes `_). +This format will no longer be supported by future releases of TensorRT (`Tensor RT release notes `_). Caffe ----- @@ -127,9 +124,9 @@ If you have a model trained using the Caffe framework, then you can save your mo layer_names: [output_cov/Sigmoid, output_bbox/BiasAdd] -This format will no longer be supported by future releases of TensorRT (`Tensor RT release notes `_). +This format will no longer be supported by future releases of TensorRT (`Tensor RT release notes `_). -Nvidia TAO Toolkit +NVIDIA TAO Toolkit ------------------ The NVIDIA TAO Toolkit is a set of training tools that requires minimal effort to create computer vision neural models based on user's own data. Using the TAO toolkit, users can perform transfer learning from pre-trained NVIDIA models to create their own model. @@ -152,7 +149,7 @@ After training the model, you can download it in the ``etlt`` format and use thi Custom CUDA Engine ------------------ -For all the above-mentioned variants of specifying the model, during the first launch, an engine file will be generated using TensorRT with automatic parsing of the model. When the model is very complex or requires some custom plugins or layers, you can generate the engine file yourself using the TensorRT API and specifying the library and the name of the function that generates the engine (`Using custom model with deepstream `_). +For all the above-mentioned variants of specifying the model, during the first launch, an engine file will be generated using TensorRT with automatic parsing of the model. When the model is very complex or requires some custom plugins or layers, you can generate the engine file yourself using the TensorRT API and specifying the library and the name of the function that generates the engine (`Using custom model with DeepStream `_). .. code-block:: yaml @@ -164,6 +161,47 @@ For all the above-mentioned variants of specifying the model, during the first l custom_lib_path: libnvdsinfer_custom_impl_Yolo.so engine_create_func_name: NvDsInferYoloCudaEngineGet +Build Model Engine +------------------ + +Savant uses the DeepStream element ``nvinfer`` to perform model inferencing. Under the hood, nvinfer uses TensorRT to facilitate high-performance machine learning inference. Any of the supported model types (ONNX, UFF, TAO) must be converted to the TensorRT engine for use in the pipeline. + +The TensorRT engine, unlike the model file (ONNX, UFF, TAO), is not a universal model representation, but a device-specific optimized representation. That is, it cannot be transferred between different devices. This justifies the practice of generating the TensorRT engine when initializing the nvinfer element. When the pipeline with a model is started, if the engine is missing, it will be generated based on the provided config (with a given batch size, etc.) from the model source file (ONNX, UFF, TAO). This process can take more than 10 minutes for complex models like YOLO. Savant makes it easy to cache model files, including those generated by the TensorRT. If the engine is submitted and matches the configuration, the model engine generation step will be skipped and pipelines will start immediately. + +Savant supports explicit engine generation as a separate, preliminary step of running the Savant module pipeline. The generation is done by running a simplified pipline that contains a model element (nvinfer). You can use the :py:func:`savant.deepstream.nvinfer.build_engine.build_engine` function in your code for this purpose, or you can run the generation step of all the module engines via the main module entry point specifying the option ``--build-engines`` + +.. code-block:: bash + + python -m savant.entrypoint --build-engines path/to/module/config.yml + +For example, you can build the model engines used in the `Nvidia car classification `_ example with the following command (you are expected to be in Savant/ directory): + +.. code-block:: bash + + ./scripts/run_module.py --build-engines samples/nvidia_car_classification/module.yml + +You can also use the ``trtexec`` tool from the TensorRT package to generate an engine file. But you need to understand exactly what parameters you need to use to generate a suitable for ``nvinfer`` engine file. And you can't use ``trtexec`` if you need a custom engine generator. + +Example of using ``trtexec`` to build engine for ONNX model: + +.. code-block:: bash + + /usr/src/tensorrt/bin/trtexec --onnx=/cache/models/custom_module/model_name/model_name.onnx --saveEngine=/cache/models/custom_module/model_name/model_name.onnx_b16_gpu0_fp16.engine --minShapes='images':1x3x224x224 --optShapes='images':16x3x224x224 --maxShapes='images':16x3x224x224 --fp16 --workspace=6144 --verbose + +Using Pre-built Model Engine +---------------------------- + +If you have a pre-built engine file, you can use it in the pipeline without having to include the original model file and set some of the parameters required to generate the engine from the model file (e.g. input and output layer names for UFF model). The engine file must be placed in the model file directory. The name of the engine file must be specified in the ``engine_file`` parameter of the model configuration. + +.. code-block:: yaml + + - element: nvinfer@detector + name: detection_model + model: + engine_file: detection_model.onnx_b1_gpu0_fp16.engine + +We recommend using the ``nvinfer`` name format for the engine file: ``{model_name}_b{batch_size}_gpu{gpu_id}_{precision}.engine``. This allows you to easily understand the configuration of the model engine and saves you from having to set ``batch-size`` and ``precision`` separately in the model config. + Working With Remote Models -------------------------- @@ -175,7 +213,6 @@ Currently, there are three data transfer protocols supported: S3, HTTP(S), and F name: Primary_Detector model: format: caffe - config_file: ${oc.env:APP_PATH}/samples/nvidia_car_classification/dstest2_pgie_config.txt remote: url: s3://savant-data/models/Primary_Detector/Primary_Detector.zip checksum_url: s3://savant-data/models/Primary_Detector/Primary_Detector.md5 @@ -191,39 +228,12 @@ In this example, in the remote section, we specify: * HTTP(S) protocol parameters: ``username``, ``password``; * FTP protocol parameters: ``username``, ``password``. -All necessary files (model file in one of the formats described above, configuration, calibration, and other files that you specify when configuring the model) must be archived using one of the archivers (``gzip``, ``bzip2``, ``xz``, ``zip``). The must should contain all required model files. +All necessary files (model file in one of the formats described above, configuration, calibration, and other files that you specify when configuring the model) must be archived using one of the archivers (``gzip``, ``bzip2``, ``xz``, ``zip``). The archive must contain all necessary model files. -The archive should contain a set of files. You can download an example model archive used in the `Nvidia car classification `_ example with the following command: +You can download an example model archive used in the `Nvidia car classification `_ example with the following command: .. code-block:: bash aws --endpoint-url=https://eu-central-1.linodeobjects.com s3 cp s3://savant-data/models/Primary_Detector/Primary_Detector.zip . You can find an example of using this model archive at the following `link `_. - -Build Model Engine ------------------- - -Savant uses the DeepStream element ``nvinfer`` to perform model inferencing. Under the hood, nvinfer uses TensorRT to facilitate high-performance machine learning inference. Any of the supported model types (ONNX, UFF, TAO) must be converted to the TensorRT engine for use in the pipeline. - -The TensorRT engine, unlike the model file (ONNX, UFF, TAO), is not a universal model representation, but a device-specific optimized representation. That is, it cannot be transferred between different devices. This justifies the practice of generating the TensorRT engine when initializing the nvinfer element. When the pipeline with a model is started, if the engine is missing, it will be generated based on the provided config (with a given batch size, etc.) from the model source file (ONNX, UFF, TAO). This process can take more than 10 minutes for complex models like YOLO. Savant makes it easy to cache model files, including those generated by the TensorRT. If the engine is submitted and matches the configuration, the model engine generation step will be skipped and pipelines will start immediately. - -Savant supports explicit engine generation as a separate, preliminary step of running the Savant module pipeline. The generation is done by running a simplified pipline that contains a model element (nvinfer). You can use the :py:func:`savant.deepstream.nvinfer.build_engine.build_engine` function in your code for this purpose, or you can run the generation step of all the module engines via the main module entry point specifying the option ``--build-engines`` - -.. code-block:: bash - - python -m savant.entrypoint --build-engines path/to/module/config.yml - -For example, you can build the model engines used in the `Nvidia car classification `_ example with the following command (you are expected to be in Savant/ directory): - -.. code-block:: bash - - docker run --rm -it --gpus=all \ - -e ZMQ_SRC_ENDPOINT=sub+bind:ipc:///tmp/zmq-sockets/input-video.ipc \ - -e ZMQ_SINK_ENDPOINT=pub+bind:ipc:///tmp/zmq-sockets/output-video.ipc \ - -v /tmp/zmq_sockets:/tmp/zmq-sockets \ - -v ./downloads/nvidia_car_classification:/downloads \ - -v ./models/nvidia_car_classification:/models \ - -v ./samples/:/opt/savant/samples \ - ghcr.io/insight-platform/savant-deepstream:latest \ - --build-engines samples/nvidia_car_classification/module.yml diff --git a/samples/nvidia_car_classification/flavors/module-engines-config.yml b/samples/nvidia_car_classification/flavors/module-engines-config.yml index 9ae1dad44..18d8ecfcf 100644 --- a/samples/nvidia_car_classification/flavors/module-engines-config.yml +++ b/samples/nvidia_car_classification/flavors/module-engines-config.yml @@ -13,13 +13,11 @@ pipeline: - element: nvinfer@detector name: Primary_Detector model: - format: caffe engine_file: resnet10.caffemodel_b1_gpu0_int8.engine input: scale_factor: 0.0039215697906911373 output: num_detected_classes: 4 - layer_names: [conv2d_bbox, conv2d_cov/Sigmoid] objects: - class_id: 0 label: Car @@ -43,9 +41,7 @@ pipeline: - element: nvinfer@classifier name: Secondary_CarColor model: - format: caffe engine_file: resnet18.caffemodel_b16_gpu0_int8.engine - mean_file: mean.ppm label_file: labels.txt input: object: Primary_Detector.Car @@ -53,7 +49,6 @@ pipeline: object_min_height: 64 color_format: bgr output: - layer_names: [predictions/Softmax] attributes: - name: car_color threshold: 0.51 @@ -62,9 +57,7 @@ pipeline: - element: nvinfer@classifier name: Secondary_CarMake model: - format: caffe engine_file: resnet18.caffemodel_b16_gpu0_int8.engine - mean_file: mean.ppm label_file: labels.txt input: object: Primary_Detector.Car @@ -72,7 +65,6 @@ pipeline: object_min_height: 64 color_format: bgr output: - layer_names: [predictions/Softmax] attributes: - name: car_make threshold: 0.51 @@ -81,9 +73,7 @@ pipeline: - element: nvinfer@classifier name: Secondary_VehicleTypes model: - format: caffe engine_file: resnet18.caffemodel_b16_gpu0_int8.engine - mean_file: mean.ppm label_file: labels.txt input: object: Primary_Detector.Car @@ -91,7 +81,6 @@ pipeline: object_min_height: 64 color_format: bgr output: - layer_names: [predictions/Softmax] attributes: - name: car_type threshold: 0.51 diff --git a/savant/deepstream/nvinfer/element_config.py b/savant/deepstream/nvinfer/element_config.py index 8d2e957b9..d0b291f8a 100644 --- a/savant/deepstream/nvinfer/element_config.py +++ b/savant/deepstream/nvinfer/element_config.py @@ -43,6 +43,19 @@ class NvInferConfigException(Exception): """NvInfer config exception class.""" +def recognize_format_by_file_name(model_file: str): + """Recognize model format by model file name.""" + if model_file.endswith('.onnx'): + return NvInferModelFormat.ONNX + if model_file.endswith('.uff'): + return NvInferModelFormat.UFF + if model_file.endswith('.etlt'): + return NvInferModelFormat.ETLT + if model_file.endswith('.caffemodel'): + return NvInferModelFormat.CAFFE + return NvInferModelFormat.CUSTOM + + def nvinfer_element_configurator( element_config: DictConfig, module_config: DictConfig ) -> DictConfig: @@ -80,43 +93,28 @@ def process(self, msg, kwargs): if not model_config: raise NvInferConfigException('Model specification required.') - # check model format - model_format = model_config.get('format') - if not model_format: - raise NvInferConfigException('Model format (model.format) required.') - try: - model_config.format = NvInferModelFormat[model_format.upper()] - except KeyError as exc: - raise NvInferConfigException( - f'Invalid model format (model.format) value "{model_config.format}", ' - f'expected one of {[m_format.name for m_format in NvInferModelFormat]}.' - ) from exc - # prepare parameters with case-insensitive values (enums) - if ( - 'precision' in model_config - and model_config.precision - and isinstance(model_config.precision, str) - ): - new_val = ModelPrecision[model_config.precision.upper()] - logger.debug( - 'Preparing model.precision: %s -> %s', model_config.precision, new_val - ) - model_config.precision = new_val - if ( - 'input' in model_config - and model_config.input - and 'color_format' in model_config.input - and model_config.input.color_format - and isinstance(model_config.input.color_format, str) - ): - new_val = ModelColorFormat[model_config.input.color_format.upper()] - logger.debug( - 'Preparing model.input.color_format: %s -> %s', - model_config.input.color_format, - new_val, - ) - model_config.input.color_format = new_val + enum_params = { + 'format': NvInferModelFormat, + 'precision': ModelPrecision, + 'input.color_format': ModelColorFormat, + } + for param_name, enum in enum_params.items(): + cfg = model_config + prm_name = param_name + while '.' in prm_name and cfg is not None: + section, prm_name = prm_name.split('.', 1) + cfg = cfg.get(section) + if cfg is None: + continue + if prm_name in cfg and cfg[prm_name]: + try: + cfg[prm_name] = enum[str(cfg[prm_name]).upper()] + except KeyError as exc: + raise NvInferConfigException( + f'Invalid value "{cfg[prm_name]}" for "module.{param_name}", ' + f'expected one of {[value.name for value in enum]}.' + ) from exc # setup path for the model files if not model_config.get('local_path'): @@ -248,8 +246,11 @@ def process(self, msg, kwargs): ) logger.info('Model engine file has been set to "%s".', model_config.engine_file) - # check model format-specific parameters + # check model format-specific parameters required to build the engine if model_file_required: + if not model_config.format: + model_config.format = recognize_format_by_file_name(model_config.model_file) + if model_config.format == NvInferModelFormat.CAFFE: if not model_config.proto_file: model_config.proto_file = Path(model_config.model_file).with_suffix( @@ -310,46 +311,59 @@ def process(self, msg, kwargs): model_config.tlt_model_key, ) - # model_file_required is True when the engine file is not built - # calibration file is required to build model in INT8 - if model_config.precision == ModelPrecision.INT8 and model_file_required: - if not model_config.int8_calib_file: - raise NvInferConfigException( - 'INT8 calibration file (model.int8_calib_file) required.' - ) - int8_calib_file_path = model_path / model_config.int8_calib_file - if not int8_calib_file_path.is_file(): - raise NvInferConfigException( - f'INT8 calibration file "{int8_calib_file_path}" not found.' - ) + # UFF model requirements (some ETLT models are UFF originally, e.g. peoplenet) + if model_config.format in (NvInferModelFormat.UFF, NvInferModelFormat.ETLT): + if not model_config.input.layer_name: + raise NvInferConfigException( + 'Model input layer name (model.input.layer_name) required.' + ) + if not model_config.input.shape: + raise NvInferConfigException( + 'Model input shape (model.input.shape) required.' + ) - # UFF model requirements - if model_config.format in (NvInferModelFormat.UFF, NvInferModelFormat.ETLT): - if not model_config.input.layer_name: - raise NvInferConfigException( - 'Model input layer name (model.input.layer_name) required.' - ) + if model_config.format in ( + NvInferModelFormat.CAFFE, + NvInferModelFormat.UFF, + NvInferModelFormat.ETLT, + ): + if not model_config.output.layer_names: + raise NvInferConfigException( + 'Model output layer names (model.output.layer_names) required.' + ) + + # calibration file is required to build model in INT8 + if model_config.precision == ModelPrecision.INT8: + if not model_config.int8_calib_file: + raise NvInferConfigException( + 'INT8 calibration file (model.int8_calib_file) required.' + ) + int8_calib_file_path = model_path / model_config.int8_calib_file + if not int8_calib_file_path.is_file(): + raise NvInferConfigException( + f'INT8 calibration file "{int8_calib_file_path}" not found.' + ) + + if model_config.output.converter: + logger.info('Model output converter will be used.') + + # input shape is used in some converters, + # e.g. to scale the output of yolo detector if not model_config.input.shape: raise NvInferConfigException( 'Model input shape (model.input.shape) required.' ) - # check model output layers specification - if ( - model_config.format - in (NvInferModelFormat.CAFFE, NvInferModelFormat.UFF, NvInferModelFormat.ETLT) - or model_config.output.converter - ) and not model_config.output.layer_names: - raise NvInferConfigException( - 'Model output layer names (model.output.layer_names) required.' - ) - - if nvinfer_config and model_config.output.converter: - logger.info('Model output converter will be used.') + # output layer names are required to properly order the output tensors + # for passing to the converter + if not model_config.output.layer_names: + raise NvInferConfigException( + 'Model output layer names (model.output.layer_names) required.' + ) # model type-specific parameters if issubclass(model_type, ObjectModel): - # model_config.output.objects is mandatory for object models + # model_config.output.objects is mandatory for object models, # but it may be autogenerated based on labelfile or num_detected_classes label_file = model_config.get( diff --git a/savant/deepstream/nvinfer/model.py b/savant/deepstream/nvinfer/model.py index 82c0c417d..b535b3da6 100644 --- a/savant/deepstream/nvinfer/model.py +++ b/savant/deepstream/nvinfer/model.py @@ -99,7 +99,7 @@ class NvInferModel(Model): for a model. If not set, then input will default to entire frame. """ - format: NvInferModelFormat = MISSING + format: Optional[NvInferModelFormat] = None """Model file format. Example