6.3. 快速体验

本章节中,我们为您介绍量化工具链PTQ方案的基本使用流程,便于您实现快速上手。 这里我们以 yolov5s 模型为例,为您进行使用演示,量化工具链PTQ方案的更多详细内容,请阅读 PTQ原理及步骤详解 章节。

6.3.1. 开发环境准备

若未准备开发环境,请参考 环境安装 章节进行环境安装。

6.3.2. 模型准备

若开发环境已准备完成,请使用命令:source activate horizon_bpuconda activate horizon_bpu 进入开发机模型转换环境。

执行以下命令,检查yolov5s浮点模型是否存在:

    ls -l horizon_model_convert_sample/01_common/model_zoo/mapper/detection/yolov5_onnx_optimized

命令执行完毕后,若出现以下日志,说明模型已准备完成:

    -rwxr-xr-x 1 regular-engineer 191610716 Jul 27 12:07 YOLOv5l.onnx
    -rwxr-xr-x 1 regular-engineer  87439805 Jul 27 12:07 YOLOv5m.onnx
    -rwxr-xr-x 1 regular-engineer  29999538 Jul 27 12:07 YOLOv5s.onnx
    -rwxr-xr-x 1 regular-engineer 356336025 Jul 27 12:07 YOLOv5x.onnx

若执行命令后,未出现以上日志,请阅读 环境安装 章节下载模型示例包。

6.3.3. 模型验证

若示例浮点模型已准备完成,使用 hb_mapper checker 工具进行模型验证,确保其符合地平线X3芯片的支持约束。

  • 进入浮点模型转换示例yolov5s模型目录

    cd horizon_model_convert_sample/04_detection/03_yolov5s/mapper
  • 模型检查

    #确认模型结构及算子是否支持,并提供每个算子执行硬件的分配情况(BPU/CPU)
    bash 01_check.sh

命令执行完毕后,若出现以下日志,说明模型校验成功

    2022-12-21 22:29:51,153 INFO [Wed Dec 21 22:29:51 2022] End to Horizon NN Model Convert.
    2022-12-21 22:29:51,181 INFO ONNX model output num : 3
    2022-12-21 22:29:51,219 INFO End model checking....

6.3.4. 模型转换

模型检查通过后,请使用 hb_mapper makertbin 工具进行模型转换。

  • 进行校准数据预处理

    bash 02_preprocess.sh

命令执行完毕后,若出现以下日志并无任何报错,说明数据预处理成功

    write:./calibration_data_rgb_f32/COCO_val2014_000000181677.rgb
    write:./calibration_data_rgb_f32/COCO_val2014_000000181714.rgb
    write:./calibration_data_rgb_f32/COCO_val2014_000000181739.rgb
  • 模型转换

    #转换时所需的配置文件 yolov5s_config.yaml,已存放在03_build.sh脚本同级文件夹下
    bash 03_build.sh

命令执行完毕后,若出现以下日志并无任何报错,说明模型转换成功

    2022-12-21 22:36:48,087 INFO Convert to runtime bin file sucessfully!
    2022-12-21 22:36:48,087 INFO End Model Convert

模型转换完成后,会在 model_output 文件夹下保存模型文件和静态性能评估文件。

  • torch-jit-export_subgraph_0.html # 静态性能评估文件(可读性更好)

  • torch-jit-export_subgraph_0.json # 静态性能评估文件

  • yolov5s_672x672_nv12.bin # 用于在地平线AI芯片上加载运行的模型

  • yolov5s_672x672_nv12_optimized_float_model.onnx # 中间过程模型文件,可用于后续模型的精度校验

  • yolov5s_672x672_nv12_original_float_model.onnx # 中间过程模型文件,可用于后续模型的精度校验

  • yolov5s_672x672_nv12_quantized_model.onnx # 中间过程模型文件,可用于后续模型的精度校验

6.3.5. 模型性能验证

若以上模型量化步骤全部正确完成,说明示例 yolov5s 模型已完成量化并已生成可在地平线X3芯片上运行的定点 yolov5s_672x672_nv12.bin 模型文件;若需了解 yolov5s_672x672_nv12.bin 定点模型的推理性能情况,请继续阅读后续内容;地平线支持在开发机端预估模型的静态性能,也支持在 开发板端使用工具快速评测动态性能。 性能验证的具体说明和性能调优建议请阅读 模型性能分析与调优 章节内容。

6.3.5.1. 静态性能评估

1.查看 hb_mapper_makertbin.log 日志文件,了解模型逐层算子的执行硬件和因CPU算子导致的分段情况。

2.查看 torch-jit-export_subgraph_0.html 文件获取模型 BPU 部分的预估性能和模型整体带宽占用情况。

6.3.5.2. 动态性能评估

1.请确保已按照 安装系统 章节完成开发板端的环境部署,然后将模型拷贝至开发板(开发机和开发板需要网络连通)

    scp model_output/yolov5s_672x672_nv12.bin root@{board_ip}:/userdata

2.登录开发板,使用 hrt_model_exec perf 工具快速评估模型的耗时和帧率(若执行命令提示未找到,请阅读 系统更新 章节,对开发板进行系统更新)

  • 开发板登录方法,请阅读 开发板登录方法 章节内容,本示例采用网口 SSH 登录方式;

    ssh root@{board_ip}
    cd /userdata
  • 性能评测前,请阅读 CPU调频策略 章节内容,将开发板CPU工作状态设置为 performance 模式

  • BPU单核单线程串行状态下评测latency

    hrt_model_exec perf --model_file yolov5s_672x672_nv12.bin --core_id 1 --thread_num 1 --frame_count 1000

执行命令后,会输出以下信息:

    hrt_model_exec perf --model_file yolov5s_672x672_nv12.bin --core_id 1 --thread_num 1 --frame_count 1000
    I0000 00:00:00.000000  2041 vlog_is_on.cc:197] RAW: Set VLOG level for "*" to 3
    [BPU_PLAT]BPU Platform Version(1.3.1)!
    [HBRT] set log level as 0. version = 3.14.5
    [DNN] Runtime version = 1.9.7_(3.14.5 HBRT)
    Load model to DDR cost 166.729ms.
    I0101 08:08:44.469233  2041 main.cpp:1045] get model handle success
    I0101 08:08:44.469348  2041 main.cpp:1666] get model input count success
    I0101 08:08:44.469470  2041 main.cpp:1673] prepare input tensor success!
    I0101 08:08:44.469492  2041 main.cpp:1679] get model output count success
    Frame count: 200,  Thread Average: 106.773048 ms,  thread max latency: 108.484001 ms,  thread min latency: 106.436996 ms,  FPS: 9.362576
    Frame count: 400,  Thread Average: 106.718391 ms,  thread max latency: 108.484001 ms,  thread min latency: 106.383003 ms,  FPS: 9.367456
    Frame count: 600,  Thread Average: 106.686966 ms,  thread max latency: 108.484001 ms,  thread min latency: 106.365997 ms,  FPS: 9.370262
    Frame count: 800,  Thread Average: 106.686150 ms,  thread max latency: 108.484001 ms,  thread min latency: 106.300003 ms,  FPS: 9.370339
    Frame count: 1000,  Thread Average: 106.679321 ms,  thread max latency: 108.484001 ms,  thread min latency: 106.300003 ms,  FPS: 9.370942

    Running condition:
    Thread number is: 1
    Frame count   is: 1000
    Program run time: 106712.991000 ms
    Perf result:
    Frame totally latency is: 106679.320312 ms
    Average    latency    is: 106.679321 ms
    Frame      rate       is: 9.370930 FPS
  • BPU双核多线程并发状态下评测latency

    hrt_model_exec perf --model_file yolov5s_672x672_nv12.bin --core_id 0 --thread_num 8 --frame_count 1000

执行命令后,会输出以下信息:

    hrt_model_exec perf --model_file yolov5s_672x672_nv12.bin --core_id 0 --thread_num 8 --frame_count 1000
    I0000 00:00:00.000000  2320 vlog_is_on.cc:197] RAW: Set VLOG level for "*" to 3
    [BPU_PLAT]BPU Platform Version(1.3.1)!
    [HBRT] set log level as 0. version = 3.14.5
    [DNN] Runtime version = 1.9.7_(3.14.5 HBRT)
    Load model to DDR cost 167.149ms.
    I0101 08:13:14.876650  2320 main.cpp:1045] get model handle success
    I0101 08:13:14.876773  2320 main.cpp:1666] get model input count success
    I0101 08:13:14.876892  2320 main.cpp:1673] prepare input tensor success!
    I0101 08:13:14.876913  2320 main.cpp:1679] get model output count success
    Frame count: 200,  Thread Average: 241.177429 ms,  thread max latency: 345.130005 ms,  thread min latency: 190.535995 ms,  FPS: 32.715076
    Frame count: 400,  Thread Average: 241.110901 ms,  thread max latency: 345.130005 ms,  thread min latency: 190.535995 ms,  FPS: 32.885406
    Frame count: 600,  Thread Average: 241.131088 ms,  thread max latency: 355.122986 ms,  thread min latency: 176.201996 ms,  FPS: 32.950832
    Frame count: 800,  Thread Average: 241.146606 ms,  thread max latency: 355.122986 ms,  thread min latency: 176.197006 ms,  FPS: 33.006718
    Frame count: 1000,  Thread Average: 240.983292 ms,  thread max latency: 355.122986 ms,  thread min latency: 176.197006 ms,  FPS: 33.124897

    Running condition:
    Thread number is: 8
    Frame count   is: 1000
    Program run time: 30188.990000 ms
    Perf result:
    Frame totally latency is: 240983.296875 ms
    Average    latency    is: 240.983292 ms
    Frame      rate       is: 33.124659 FPS

6.3.6. 模型精度验证

若以上模型量化步骤全部正确完成,说明已正确获取到 yolov5s_672x672_nv12.bin 模型的性能情况;若需了解 yolov5s_672x672_nv12.bin 定点模型的推理精度情况,请继续阅读后续内容;地平线支持在开发机端评测模型的推理精度。 精度验证的具体说明和优化建议,请阅读 模型精度分析与调优 章节内容。

6.3.6.1. 开发机Python环境验证

在开发机的Python环境中评测 yolov5s_672x672_nv12_quantized_model.onnx 模型的量化精度,其输出与 yolov5s_672x672_nv12.bin 模型是保持推理结果一致的,参考示例如下:

  • 测试量化模型单张图片推理结果

    bash 04_inference.sh

命令执行完毕后,若出现以下日志并无任何报错,说明模型推理完成

    2022-12-29 16:11:30,028 INFO detected item num: 14
    2022-12-29 16:11:30,029 INFO person is in the picture with confidence:0.8104
    2022-12-29 16:11:30,046 INFO person is in the picture with confidence:0.7805
    2022-12-29 16:11:30,063 INFO person is in the picture with confidence:0.6903
    2022-12-29 16:11:30,108 INFO person is in the picture with confidence:0.5512
    2022-12-29 16:11:30,157 INFO person is in the picture with confidence:0.5352
    2022-12-29 16:11:30,195 INFO person is in the picture with confidence:0.5012
    2022-12-29 16:11:30,241 INFO person is in the picture with confidence:0.4950
    2022-12-29 16:11:30,287 INFO person is in the picture with confidence:0.4617
    2022-12-29 16:11:30,336 INFO person is in the picture with confidence:0.4152
    2022-12-29 16:11:30,384 INFO kite is in the picture with confidence:0.8432
    2022-12-29 16:11:30,433 INFO kite is in the picture with confidence:0.8064
    2022-12-29 16:11:30,481 INFO kite is in the picture with confidence:0.6995
    2022-12-29 16:11:30,530 INFO kite is in the picture with confidence:0.6601
    2022-12-29 16:11:30,576 INFO kite is in the picture with confidence:0.6002
  • 测试浮点模型单张图片推理结果(可选)

    bash 04_inference.sh origin

命令执行完毕后,若出现以下日志并无任何报错,说明模型推理完成

    2022-12-21 23:25:31,325 INFO detected item num: 15
    2022-12-21 23:25:31,327 INFO person is in the picture with confidence:0.8617
    2022-12-21 23:25:31,361 INFO person is in the picture with confidence:0.8189
    2022-12-21 23:25:31,403 INFO person is in the picture with confidence:0.7264
    2022-12-21 23:25:31,441 INFO person is in the picture with confidence:0.6687
    2022-12-21 23:25:31,483 INFO person is in the picture with confidence:0.6271
    2022-12-21 23:25:31,517 INFO person is in the picture with confidence:0.6222
    2022-12-21 23:25:31,560 INFO person is in the picture with confidence:0.5141
    2022-12-21 23:25:31,598 INFO person is in the picture with confidence:0.5085
    2022-12-21 23:25:31,636 INFO person is in the picture with confidence:0.4223
    2022-12-21 23:25:31,671 INFO kite is in the picture with confidence:0.8950
    2022-12-21 23:25:31,703 INFO kite is in the picture with confidence:0.8420
    2022-12-21 23:25:31,743 INFO kite is in the picture with confidence:0.7510
    2022-12-21 23:25:31,778 INFO kite is in the picture with confidence:0.6881
    2022-12-21 23:25:31,827 INFO kite is in the picture with confidence:0.6405
    2022-12-21 23:25:31,864 INFO kite is in the picture with confidence:0.4280

6.3.7. 模型上板运行

注意事项:模型上板运行前,请确保已按照 安装系统 章节完成开发板端的环境部署。 将 yolov5s_672x672_nv12.bin 定点模型拷贝 替换 至开发板的 /app/ai_inference/models 目录下,调用以下命令运行

    cd /app/ai_inference/07_yolov5_sample/
    sudo python3 ./test_yolov5.py

运行成功后,会输出图像的分割结果,并且dump出分割效果图: result.jpg

yolov5s-result

常用API示例,请参考 yolov5目标检测算法 章节内容:

更多模型推理API使用说明,请参考 Python开发指南-AI 算法推理接口使用说明C/C++开发指南-BPU(算法推理模块)API 章节内容。