Hanguang-800 NPU

Hanguang-800 NPU, fabricated with TSMC 12nm process, is a high performance AI inference chip targeting for Alibaba data centers and standalone edge servers. It will be deployed to empower various Alibaba AI applications and be made available to public via Alibaba cloud services.

Architecture Features

  • Designed to accelerate convolutions (including Transposed CONV, Dilated CONV, and CONV3D), matrix multiplication, interpolation, and ROIs.

  • Highly efficient storage and computing technologies.

  • Support INT8/INT16 matrix computation, as well as FP16/BFP16 vector processing.

  • Key activation functions are natively supported. Customized activation functions can be supported via math instructions.

Featured Technology

  • Optimized for CNN-based algorithms and vision tasks. Also support general-purpose DNN accelerations.

  • Support model compression and quantization.

  • High power-efficiency with low latency.

  • Programmable and easy to use.

  • Adaptive software stack, supporting various frameworks, such as TensorFlow, MXNet, Caffe, ONNX, etc.

Performance Comparison

Resnet50 v1 Performance (Image/Second):

Resnet50 v1 Efficiency (Image/Second/Watt):