With AI enabled technologies widespread, various applications using inference of images are prevalent today. They include analyzers of images from surveillance cameras in buildings and applications running on smartphones. In addition to software operating inference on CPUs, there are also systems with a GPU or FPGA that offload the operations. We now even see small terminals incorporating an inference dedicated chip.
This article shares the verification results of efficient inference of image data by two systems: one that uses only a CPU (“the CPU system”), and another that integrates a CPU with an FPGA that offloads some computation (“the FPGA-offload system”). They were given the same computation in order to review the difference in accuracy and power consumption.
Methods and Conditions of Evaluation
This evaluation used Zebra V2020.02.1, Mipsology’s application software. In addition, the evaluation environment includes:
Server : Dell Poweredge R7515
CPU : AMD EPYC 7302P 16-Core Processor
Memory : 64GB x2 DIMM (128 GB)
FPGA : Alveo U250 (running at 550 MHz)
OS : Ubuntu 18.04.1 LTS x86_64
Kernel : 4.15.0-101-generic
The Measurement Conditions
- Model: ResNet50
- A batch processing 60 images is run 50 times to obtain a result of one computation set (the time for a batch operation is defined as one period)
- The images to be inferred: obtained from ImageNet
- Image size: 224 x 224 pixels via three channels
This measurement is depicted in Figure 1. The Mechanism of Each System.
Figure 1. The Mechanism of Each System
Performance Measurement
This performance measurement details the time that each system spent calculating the batches. Each system conducted three computation sets; the results were then averaged to obtain the final results, which are shown in Figure 2. Result of the CPU-based System and Figure 3. Result of the system of CPU with FPGA offloading.
Figure 2. Result of the CPU System
Figure 3. Result of the FPGA-offload System (CPU + FPGA)
This capability test reveals that the FPGA-offload system achieved the result 46 times faster than the computation with only a CPU.
Accuracy Measured
The difference in accuracy between the systems was also verified. Here, accuracy is defined as the ratio of 1) the correct answer chosen as the highest probability (Top 1) and 2) the correct answer included in five highest rank (Top 5). The two systems are compared in both 1) and 2). Figure 4. How to Identify the Results depicts the approach.
Figure 4. How to Identify the Results
Figure 5. Result of Accuracy Measurements lists the percentage values representing accuracy of the CPU system and the FPGA-offload system.
Figure 5. Result of Accuracy Measurements
Figure 5 shows us that FPGA-offload only reduced the Top 1 accuracy by one percent, and the Top 5 accuracy by 0.6 percent.
Measurement of Power Consumption
Figure 6. How to Determine Power Consumption shows the values used for calculating the power consumed. Figure 7. Power Consumption Comparison indicates the compared consumption for the same computational volume. A rate plan of TEPCO Energy Partner, Inc. for industrial use was used to calculate the energy costs in Figure 8. Cost Performance Comparison.
Figure 6. How to Determine Power Consumption
The CPU system and the FPGA-offload system continuously conducted the inference for two hours to measure each system’s power consumption. Note that there were some light processes, such as rearranging data between periods,which resulted in deviations that are indicated by the mark ‘*’. Therefore, the medians of the power consumption values are used as a reference.
One period ends in milliseconds but, for clarity, the results have been converted into seconds (sec). Based on the consumption by each system, the amount of energy consumed for the 1,000 periods is 101.95 Wh by the CPU system, and 1.99 Wh by the FPGA-offload system.
Figure 7. Comparison of power consumption
The costs were determined based on the results in Figure 6. Because the unit of electricity bills is kWh, the results have been multiplied by 1,000 to form the computation quantities for comparison. Although Figure 7 above shows the comparison of the computation of 106 periods, the amount computed by each system is equal.
Figure 8. Cost Performance Comparison
Figure 8 can be summarized as: the FPGA-offload system exhibits a 98-percent reduction in cost for the same amount of calculation from the cost of the CPU system; the FPGA-offload system also finished the calculation 46 times faster than the other.
Summary
The results of this study can be summarized as follows:
The FPGA offloading could:
-
- compute 46 times faster
-
- reduce the electricity cost for the same amount of calculation by 98 percent
-
- suppress the reduction in Top 1 accuracy by less than one percent
-
- regulate the reduction in Top 5 accuracy by 0.6 percent
These results depend on the version of Zebra and the type of neural network in use (Convolutional Neural Network ResNet50 in this case), and will not necessarily be equal in every environment. There could also be room for improvement through the sophistication of the FPGA algorithms (IP cores). Mipsology’s customized FPGA IP cores for this evaluation were greatly helpful in achieving the results as described above.
Conclusion
The evaluation shows us that FPGA offloading has the potential to elevate efficiency with fine-tuned IP cores. It can also be said, however, that expected improvement is unlikely to be obtained without improved IP cores. The advantages of FPGA devices also include customizable dimensions and performance according to applications and locations. The Alveo series from Xilinx ranges from small boards to one addressing higher workloads . For example, a data center can use large-capacity boards, while a site or factory favors ones with a small footprint.