6.3. 快速体验

本章节中，我们为您介绍量化工具链PTQ方案的基本使用流程，便于您实现快速上手。这里我们以 yolov5s 模型为例，为您进行使用演示，量化工具链PTQ方案的更多详细内容，请阅读 PTQ原理及步骤详解章节。

6.3.1. 开发环境准备

若未准备开发环境，请参考环境安装章节进行环境安装。

6.3.2. 模型准备

若开发环境已准备完成，请使用命令：source activate horizon_bpu 或 conda activate horizon_bpu 进入开发机模型转换环境。

执行以下命令，检查yolov5s浮点模型是否存在：

    ls -l horizon_model_convert_sample/01_common/model_zoo/mapper/detection/yolov5_onnx_optimized

命令执行完毕后，若出现以下日志，说明模型已准备完成：

    -rwxr-xr-x 1 regular-engineer 191610716 Jul 27 12:07 YOLOv5l.onnx
    -rwxr-xr-x 1 regular-engineer  87439805 Jul 27 12:07 YOLOv5m.onnx
    -rwxr-xr-x 1 regular-engineer  29999538 Jul 27 12:07 YOLOv5s.onnx
    -rwxr-xr-x 1 regular-engineer 356336025 Jul 27 12:07 YOLOv5x.onnx

若执行命令后，未出现以上日志，请阅读环境安装章节下载模型示例包。

6.3.3. 模型验证

若示例浮点模型已准备完成，使用 hb_mapper checker 工具进行模型验证，确保其符合地平线X3芯片的支持约束。

进入浮点模型转换示例yolov5s模型目录

    cd horizon_model_convert_sample/04_detection/03_yolov5s/mapper

模型检查

    #确认模型结构及算子是否支持，并提供每个算子执行硬件的分配情况（BPU/CPU）
    bash 01_check.sh

命令执行完毕后，若出现以下日志，说明模型校验成功

    2022-12-21 22:29:51,153 INFO [Wed Dec 21 22:29:51 2022] End to Horizon NN Model Convert.
    2022-12-21 22:29:51,181 INFO ONNX model output num : 3
    2022-12-21 22:29:51,219 INFO End model checking....

6.3.4. 模型转换

模型检查通过后，请使用 hb_mapper makertbin 工具进行模型转换。

进行校准数据预处理

    bash 02_preprocess.sh

命令执行完毕后，若出现以下日志并无任何报错，说明数据预处理成功

    write:./calibration_data_rgb_f32/COCO_val2014_000000181677.rgb
    write:./calibration_data_rgb_f32/COCO_val2014_000000181714.rgb
    write:./calibration_data_rgb_f32/COCO_val2014_000000181739.rgb

模型转换

    #转换时所需的配置文件 yolov5s_config.yaml，已存放在03_build.sh脚本同级文件夹下
    bash 03_build.sh

命令执行完毕后，若出现以下日志并无任何报错，说明模型转换成功

    2022-12-21 22:36:48,087 INFO Convert to runtime bin file sucessfully!
    2022-12-21 22:36:48,087 INFO End Model Convert

模型转换完成后，会在 model_output 文件夹下保存模型文件和静态性能评估文件。

torch-jit-export_subgraph_0.html # 静态性能评估文件（可读性更好）
torch-jit-export_subgraph_0.json # 静态性能评估文件
yolov5s_672x672_nv12.bin # 用于在地平线AI芯片上加载运行的模型
yolov5s_672x672_nv12_optimized_float_model.onnx # 中间过程模型文件，可用于后续模型的精度校验
yolov5s_672x672_nv12_original_float_model.onnx # 中间过程模型文件，可用于后续模型的精度校验
yolov5s_672x672_nv12_quantized_model.onnx # 中间过程模型文件，可用于后续模型的精度校验

6.3.5. 模型性能验证

若以上模型量化步骤全部正确完成，说明示例 yolov5s 模型已完成量化并已生成可在地平线X3芯片上运行的定点 yolov5s_672x672_nv12.bin 模型文件；若需了解 yolov5s_672x672_nv12.bin 定点模型的推理性能情况，请继续阅读后续内容；地平线支持在开发机端预估模型的静态性能，也支持在开发板端使用工具快速评测动态性能。性能验证的具体说明和性能调优建议请阅读模型性能分析与调优章节内容。

6.3.5.1. 静态性能评估

1.查看 hb_mapper_makertbin.log 日志文件，了解模型逐层算子的执行硬件和因CPU算子导致的分段情况。

2.查看 torch-jit-export_subgraph_0.html 文件获取模型 BPU 部分的预估性能和模型整体带宽占用情况。

6.3.5.2. 动态性能评估

1.请确保已按照安装系统章节完成开发板端的环境部署，然后将模型拷贝至开发板（开发机和开发板需要网络连通）

    scp model_output/yolov5s_672x672_nv12.bin root@{board_ip}:/userdata

2.登录开发板，使用 hrt_model_exec perf 工具快速评估模型的耗时和帧率（若执行命令提示未找到，请阅读系统更新章节，对开发板进行系统更新）

开发板登录方法，请阅读开发板登录方法章节内容，本示例采用网口 SSH 登录方式；

    ssh root@{board_ip}
    cd /userdata

性能评测前，请阅读 CPU调频策略章节内容，将开发板CPU工作状态设置为 performance 模式
BPU单核单线程串行状态下评测latency

    hrt_model_exec perf --model_file yolov5s_672x672_nv12.bin --core_id 1 --thread_num 1 --frame_count 1000

执行命令后，会输出以下信息：

    hrt_model_exec perf --model_file yolov5s_672x672_nv12.bin --core_id 1 --thread_num 1 --frame_count 1000
    I0000 00:00:00.000000  2041 vlog_is_on.cc:197] RAW: Set VLOG level for "*" to 3
    [BPU_PLAT]BPU Platform Version(1.3.1)!
    [HBRT] set log level as 0. version = 3.14.5
    [DNN] Runtime version = 1.9.7_(3.14.5 HBRT)
    Load model to DDR cost 166.729ms.
    I0101 08:08:44.469233  2041 main.cpp:1045] get model handle success
    I0101 08:08:44.469348  2041 main.cpp:1666] get model input count success
    I0101 08:08:44.469470  2041 main.cpp:1673] prepare input tensor success!
    I0101 08:08:44.469492  2041 main.cpp:1679] get model output count success
    Frame count: 200,  Thread Average: 106.773048 ms,  thread max latency: 108.484001 ms,  thread min latency: 106.436996 ms,  FPS: 9.362576
    Frame count: 400,  Thread Average: 106.718391 ms,  thread max latency: 108.484001 ms,  thread min latency: 106.383003 ms,  FPS: 9.367456
    Frame count: 600,  Thread Average: 106.686966 ms,  thread max latency: 108.484001 ms,  thread min latency: 106.365997 ms,  FPS: 9.370262
    Frame count: 800,  Thread Average: 106.686150 ms,  thread max latency: 108.484001 ms,  thread min latency: 106.300003 ms,  FPS: 9.370339
    Frame count: 1000,  Thread Average: 106.679321 ms,  thread max latency: 108.484001 ms,  thread min latency: 106.300003 ms,  FPS: 9.370942

    Running condition:
    Thread number is: 1
    Frame count   is: 1000
    Program run time: 106712.991000 ms
    Perf result:
    Frame totally latency is: 106679.320312 ms
    Average    latency    is: 106.679321 ms
    Frame      rate       is: 9.370930 FPS

BPU双核多线程并发状态下评测latency

    hrt_model_exec perf --model_file yolov5s_672x672_nv12.bin --core_id 0 --thread_num 8 --frame_count 1000

执行命令后，会输出以下信息：

    hrt_model_exec perf --model_file yolov5s_672x672_nv12.bin --core_id 0 --thread_num 8 --frame_count 1000
    I0000 00:00:00.000000  2320 vlog_is_on.cc:197] RAW: Set VLOG level for "*" to 3
    [BPU_PLAT]BPU Platform Version(1.3.1)!
    [HBRT] set log level as 0. version = 3.14.5
    [DNN] Runtime version = 1.9.7_(3.14.5 HBRT)
    Load model to DDR cost 167.149ms.
    I0101 08:13:14.876650  2320 main.cpp:1045] get model handle success
    I0101 08:13:14.876773  2320 main.cpp:1666] get model input count success
    I0101 08:13:14.876892  2320 main.cpp:1673] prepare input tensor success!
    I0101 08:13:14.876913  2320 main.cpp:1679] get model output count success
    Frame count: 200,  Thread Average: 241.177429 ms,  thread max latency: 345.130005 ms,  thread min latency: 190.535995 ms,  FPS: 32.715076
    Frame count: 400,  Thread Average: 241.110901 ms,  thread max latency: 345.130005 ms,  thread min latency: 190.535995 ms,  FPS: 32.885406
    Frame count: 600,  Thread Average: 241.131088 ms,  thread max latency: 355.122986 ms,  thread min latency: 176.201996 ms,  FPS: 32.950832
    Frame count: 800,  Thread Average: 241.146606 ms,  thread max latency: 355.122986 ms,  thread min latency: 176.197006 ms,  FPS: 33.006718
    Frame count: 1000,  Thread Average: 240.983292 ms,  thread max latency: 355.122986 ms,  thread min latency: 176.197006 ms,  FPS: 33.124897

    Running condition:
    Thread number is: 8
    Frame count   is: 1000
    Program run time: 30188.990000 ms
    Perf result:
    Frame totally latency is: 240983.296875 ms
    Average    latency    is: 240.983292 ms
    Frame      rate       is: 33.124659 FPS

6.3.6. 模型精度验证

若以上模型量化步骤全部正确完成，说明已正确获取到 yolov5s_672x672_nv12.bin 模型的性能情况；若需了解 yolov5s_672x672_nv12.bin 定点模型的推理精度情况，请继续阅读后续内容；地平线支持在开发机端评测模型的推理精度。精度验证的具体说明和优化建议，请阅读模型精度分析与调优章节内容。

6.3.6.1. 开发机Python环境验证

在开发机的Python环境中评测 yolov5s_672x672_nv12_quantized_model.onnx 模型的量化精度，其输出与 yolov5s_672x672_nv12.bin 模型是保持推理结果一致的，参考示例如下：

测试量化模型单张图片推理结果

    bash 04_inference.sh

命令执行完毕后，若出现以下日志并无任何报错，说明模型推理完成

    2022-12-29 16:11:30,028 INFO detected item num: 14
    2022-12-29 16:11:30,029 INFO person is in the picture with confidence:0.8104
    2022-12-29 16:11:30,046 INFO person is in the picture with confidence:0.7805
    2022-12-29 16:11:30,063 INFO person is in the picture with confidence:0.6903
    2022-12-29 16:11:30,108 INFO person is in the picture with confidence:0.5512
    2022-12-29 16:11:30,157 INFO person is in the picture with confidence:0.5352
    2022-12-29 16:11:30,195 INFO person is in the picture with confidence:0.5012
    2022-12-29 16:11:30,241 INFO person is in the picture with confidence:0.4950
    2022-12-29 16:11:30,287 INFO person is in the picture with confidence:0.4617
    2022-12-29 16:11:30,336 INFO person is in the picture with confidence:0.4152
    2022-12-29 16:11:30,384 INFO kite is in the picture with confidence:0.8432
    2022-12-29 16:11:30,433 INFO kite is in the picture with confidence:0.8064
    2022-12-29 16:11:30,481 INFO kite is in the picture with confidence:0.6995
    2022-12-29 16:11:30,530 INFO kite is in the picture with confidence:0.6601
    2022-12-29 16:11:30,576 INFO kite is in the picture with confidence:0.6002

测试浮点模型单张图片推理结果（可选）

    bash 04_inference.sh origin

命令执行完毕后，若出现以下日志并无任何报错，说明模型推理完成

    2022-12-21 23:25:31,325 INFO detected item num: 15
    2022-12-21 23:25:31,327 INFO person is in the picture with confidence:0.8617
    2022-12-21 23:25:31,361 INFO person is in the picture with confidence:0.8189
    2022-12-21 23:25:31,403 INFO person is in the picture with confidence:0.7264
    2022-12-21 23:25:31,441 INFO person is in the picture with confidence:0.6687
    2022-12-21 23:25:31,483 INFO person is in the picture with confidence:0.6271
    2022-12-21 23:25:31,517 INFO person is in the picture with confidence:0.6222
    2022-12-21 23:25:31,560 INFO person is in the picture with confidence:0.5141
    2022-12-21 23:25:31,598 INFO person is in the picture with confidence:0.5085
    2022-12-21 23:25:31,636 INFO person is in the picture with confidence:0.4223
    2022-12-21 23:25:31,671 INFO kite is in the picture with confidence:0.8950
    2022-12-21 23:25:31,703 INFO kite is in the picture with confidence:0.8420
    2022-12-21 23:25:31,743 INFO kite is in the picture with confidence:0.7510
    2022-12-21 23:25:31,778 INFO kite is in the picture with confidence:0.6881
    2022-12-21 23:25:31,827 INFO kite is in the picture with confidence:0.6405
    2022-12-21 23:25:31,864 INFO kite is in the picture with confidence:0.4280

6.3.7. 模型上板运行

注意事项：模型上板运行前，请确保已按照安装系统章节完成开发板端的环境部署。将 yolov5s_672x672_nv12.bin 定点模型拷贝替换至开发板的 /app/ai_inference/models 目录下，调用以下命令运行

    cd /app/ai_inference/07_yolov5_sample/
    sudo python3 ./test_yolov5.py

运行成功后，会输出图像的分割结果，并且dump出分割效果图: result.jpg

yolov5s-result

常用API示例，请参考 yolov5目标检测算法章节内容:

更多模型推理API使用说明，请参考 Python开发指南-AI 算法推理接口使用说明和 C/C++开发指南-BPU（算法推理模块）API 章节内容。