HarmonyHu 多思不如养志,多言不如守静,多才不如蓄德

quantization int8

2020-06-01
AI

算法

8-bit 与float转换

int8_value的weight范围是[-127, 127]zero_point为0;activations/inputs范围是[-128, 127]zero_point范围是[-128, 127]

threshold

threshold理解为某个tensor的元素最大值,则:

per-axis 与 per-tensor

  • per-axis,表示某个维度每一片都有一个scale和zero_point,比如per-channel表示每个channel都有一个scale和zero_point
  • per-tensor,表示整个tensor用一个scale和zero_point

Scale转换

y = x * M,且y与x都是整型,M是浮点型,通过以上公式可以将其转换为整型运算。当multiplier为int32时 ,这样Multiplier至少有30位精度。

举例说明:

Add推导

矩阵乘法推导

有两N x N矩阵r1r2r3=r1 x r2,为了简化,令zero_point都为0,则浮点到整型运算推导过程如下:

相关函数

cmath

std::round

double round(double x)

四舍五入,比如:std::round(7.479) = 7, std::round(7.579) = 8

std::floor

double floor(double x)

取整,但<= x,比如:std::floor(7.579) = 7

std::frexp

double frexp(double x, int *y)

二进制浮点表达转换,若w = std::frexp(x, &y),则x = w * (2^y),w范围:(-1.0, -0.5] U [0.5, 1.0)

algorithm

std::min_element / std::max_element

template< class ForwardIt > 
ForwardIt min_element( ForwardIt first, ForwardIt last );
template< class ForwardIt, class Compare >
ForwardIt min_element( ForwardIt first, ForwardIt last, Compare comp );

查找最小/最大元素

参考文献

TensorFlow Lite 8-bit quantization specification

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference


Similar Posts

上一篇 python杂记

Content