零拷贝
功能介绍
TogetheROS.Bot提供了灵活、高效的零拷贝功能,可以显著降低大尺寸数据的通信延时和CPU占用。tros.b通过集成performance_test工具,可以方便的评测开启零拷贝前后的性能差异。performance_test工具能够实现sub数量、消息大小、QoS等参数配置,方便评估不同场景下的通信性能,主要性能指标如下:
时延(latency):对应消息从pub到sub的传输时间
CPU使用率(CPU usage):通信活动所占CPU使用百分比
驻留内存(resident memory):包括堆分配内存、共享内存以及用于系统内部的栈内存
样本统计(sample statistics):包括每次实验发送、接收以及丢失的消息数量
代码仓库:https://c-gitlab.horizon.ai/HHP/tros/rclcpp,https://c-gitlab.horizon.ai/HHP/tros/rcl_interfaces
支持平台
平台 | 运行方式 | 示例功能 |
---|---|---|
旭日X3派 | Ubuntu 20.04 | 展示零拷贝性能指标测试结果 |
准备工作
旭日X3派
开始测试前,需要将旭日X3派调整为性能模型,以保证测试结果准确性,命令如下:
echo performance > /sys/class/devfreq/devfreq0/governor echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
旭日X3派已成功安装performance_test工具包,安装命令:
apt update; apt install tros-performance-test
。
使用介绍
旭日X3派
不开启零拷贝功能的4M数据传输测试,命令如下:
ros2 run performance_test perf_test --reliable --keep-last --history-depth 10 -s 1 -m Array4m -r 100 --max-runtime 30
测试结果如下:
run time +--------------+-----------+--------+----------+ | T_experiment | 30.982817 | T_loop | 1.000126 | +--------------+-----------+--------+----------+ samples latency +------+------+------+-----------+---------------+ +----------+----------+----------+----------+ | recv | sent | lost | data_recv | relative_loss | | min | max | mean | variance | +------+------+------+-----------+---------------+ +----------+----------+----------+----------+ | 99 | 100 | 0 | 418505326 | 0.000000 | | 0.004327 | 0.005605 | 0.004546 | 0.000000 | +------+------+------+-----------+---------------+ +----------+----------+----------+----------+ publisher loop subscriber loop +----------+----------+----------+----------+ +----------+----------+----------+----------+ | min | max | mean | variance | | min | max | mean | variance | +----------+----------+----------+----------+ +----------+----------+----------+----------+ | 0.007260 | 0.008229 | 0.008057 | 0.000000 | | 0.000000 | 0.000000 | 0.000000 | 0.000000 | +----------+----------+----------+----------+ +----------+----------+----------+----------+ system usage +-------------+-----------+---------+--------+--------+----------+--------+--------+ | utime | stime | maxrss | ixrss | idrss | isrss | minflt | majflt | +-------------+-----------+---------+--------+--------+----------+--------+--------+ | 23120954000 | 121597000 | 65092 | 0 | 0 | 0 | 11578 | 2 | +-------------+-----------+---------+--------+--------+----------+--------+--------+ | nswap | inblock | oublock | msgsnd | msgrcv | nsignals | nvcsw | nivcsw | +-------------+-----------+---------+--------+--------+----------+--------+--------+ | 0 | 0 | 0 | 0 | 0 | 0 | 9885 | 7193 | +-------------+-----------+---------+--------+--------+----------+--------+--------+ Maximum runtime reached. Exiting.
开启零拷贝功能(加入–zero-copy参数)的4M数据传输测试,命令如下:
ros2 run performance_test perf_test --zero-copy --reliable --keep-last --history-depth 10 -s 1 -m Array4m -r 100 --max-runtime 30
测试结果如下:
run time +--------------+-----------+--------+----------+ | T_experiment | 30.554773 | T_loop | 1.000084 | +--------------+-----------+--------+----------+ samples latency +------+------+------+-----------+---------------+ +----------+----------+----------+----------+ | recv | sent | lost | data_recv | relative_loss | | min | max | mean | variance | +------+------+------+-----------+---------------+ +----------+----------+----------+----------+ | 99 | 99 | 0 | 418701472 | 0.000000 | | 0.000146 | 0.000381 | 0.000195 | 0.000000 | +------+------+------+-----------+---------------+ +----------+----------+----------+----------+ publisher loop subscriber loop +----------+----------+----------+----------+ +----------+----------+----------+----------+ | min | max | mean | variance | | min | max | mean | variance | +----------+----------+----------+----------+ +----------+----------+----------+----------+ | 0.009812 | 0.009895 | 0.009877 | 0.000000 | | 0.000000 | 0.000000 | 0.000000 | 0.000000 | +----------+----------+----------+----------+ +----------+----------+----------+----------+ system usage +------------+-----------+---------+--------+--------+----------+--------+--------+ | utime | stime | maxrss | ixrss | idrss | isrss | minflt | majflt | +------------+-----------+---------+--------+--------+----------+--------+--------+ | 8727113000 | 307920000 | 46224 | 0 | 0 | 0 | 6440 | 0 | +------------+-----------+---------+--------+--------+----------+--------+--------+ | nswap | inblock | oublock | msgsnd | msgrcv | nsignals | nvcsw | nivcsw | +------------+-----------+---------+--------+--------+----------+--------+--------+ | 0 | 0 | 0 | 0 | 0 | 0 | 9734 | 2544 | +------------+-----------+---------+--------+--------+----------+--------+--------+ Maximum runtime reached. Exiting.
结果分析
performance_test工具可输出多种类型的统计结果,下面主要对比延时、系统占用的差异:
latency 对比关闭和打开“zero-copy”功能的通信时延均值分别为4.546ms和0.195ms,可以看出“zero-copy”功能显著降低通信时延。
system usage
+------------------+---------------+-------------------+--------+--------+----------+------------------+---------------------+
| utime | stime | maxrss | ixrss | idrss | isrss | minflt | majflt |
+------------------+---------------+-------------------+--------+--------+----------+------------------+---------------------+
| userspace耗时(Hz)| system耗时(Hz)| 驻留内存大小(Byte) | 0 | 0 | 0 | 次缺页错误次数 | 主缺页错误次数 |
+------------------+---------------+-------------------+--------+--------+----------+------------------+---------------------+
| nswap | inblock | oublock | msgsnd | msgrcv | nsignals | nvcsw | nivcsw |
+------------------+---------------+-------------------+--------+--------+----------+------------------+---------------------+
| 0 | 0 | 0 | 0 | 0 | 0 | 自愿上下文切换次数| 非自愿上下文切换次数|
+------------------+---------------+-------------------+--------+--------+----------+------------------+---------------------+
通信方式 | latency | utime+stime | maxrss | minflt | majflt | nvcsw | nivcsw |
---|---|---|---|---|---|---|---|
非“zero-copy” | 0.004546 | 23242551000 | 65092 | 11578 | 2 | 9885 | 7193 |
“zero-copy” | 0.000381 | 9035033000 | 46224 | 6440 | 0 | 9734 | 2544 |
对比可知
“zero-copy” utime和stime之和明显低于非“zero-copy”,表明“zero-copy”消耗的CPU资源更少
“zero-copy” maxrss少于非“zero-copy”,表明“zero-copy”占用的内存少
“zero-copy” minflt、majflt明显少于非“zero-copy”,表明“zero-copy”通信抖动更小
“zero-copy” nvcsw、nivcsw明显少于非“zero-copy”,表明“zero-copy”通信抖动更小
总的来说对于大数据通信,“zero-copy”在CPU消耗、内存占用以及通信延迟抖动方便均明显优于非“zero-copy”