Neural Networks, Lab Theory vs Real World Deployment; Closing the Gap
Ideally, once your data science team has done all the hard work of developing and training a neural network (NN), you can deploy it in the field for the best results. But those laboring in the lab might not have a window into all the environmental and system-dependent constraints the NN will encounter when deployed in the real world. In this post, we share details about our solution, which mitigates the need to constantly go back and change the NN.
Mipsology’s Zebra AI acceleration software makes it possible to run inferencing that accommodates varying real-world constraints in more places using the NN models as is, with no need to go back and tune models and NNs. Zebra is also “plug and play,” meaning that you can deploy it today without needing a team of hardware experts and months of effort.
Training in a lab vs deployment
There is a big difference between training a neural network and successfully deploying it in the field. You begin by hiring a team of AI specialists who understand how to use the data to make the NN work. Let’s say their goal is to create a model that can correctly identify a picture of cute cats 87% of the time. Scientists tune it, and after months of tries and training and modifications, they get something that works. It does what it’s supposed to do with the expected accuracy.
However, these specialists may not be experts at making the same system work in real-world environments. NN’s are trained in a lab without knowledge of potential deployment constraints or issues. Some neural networks aren’t even designed to be deployed; it is more about bragging rights for ‘showcase’ demos, but people may not know that and still try to deploy them. All the potential problems are left for the implementation team to solve. And this can lead to disaster.
For example, GPUs generally operate fine in a climate-controlled environment, but they are not intended for use in the field. When faced with excessive heat, wind, rain and other climate factors, GPUs cause excessive latency or power constraints. The former prevents the system from reacting quickly enough, and the latter makes cooling impossible and thus renders the GPU unusable. Furthermore, GPUs have limited lifespans (some as short as two years) and are prone to failing without warning. This makes their large scale deployment for applications like autonomous cars economically challenging.
To avoid issues like these, the NN developers need to ask the following questions:
- Is the network practical for use in the field?
- How much compute resource will it take?
- Does the network fit your deployment constraints?
- How much will it cost and how long will it take to build and deploy the network?
This last question may seem easy to answer, but there’s more to consider than just hardware and electricity costs. For example, if a NN is too large, it may not fit in memory or may require excessive computation. This would demand more time from the accelerator, which could impact the price.
If these questions are overlooked or answered incorrectly, issues will arise, and changes will be needed.
Hardware Limitations Drive Changes, Adding Cost and Delays
Far too often, there is a disconnect between trained neural network and real-world constraints for deploying for inference. This can drive the need for changes to the NN, due to limitations in the field. Or, if you bump up against limits when adding new features, you will need to go back to square one and again modify or retrain the neural network.
There are several other reasons you may need to make changes in the field. You could be getting the desired accuracy, but computation is taking too long. Perhaps the hardware does not accelerate a new layer impacting the throughput. Or maybe you want to deploy it at the edge, which has limited support for NNs.
If you want to deploy at the edge where much hardware has limited NN support, or if your data center costs are not considered during NN training, significant changes may be required once they are completed. In the Cloud, the equation is usually simpler: it all comes down to price. If the NN is too large and requires a lot of computation, it can drive up the price, especially for high traffic. Even a price of one additional cent can mean a huge cost increase when multiplied by billions of users. The deployed network may be impractical at the scale targeted because of this difference.
The constraints are numerous, but obviously the most important factor is performance. You need to be sure you can compute the NN rapidly enough within the other limitations. If you cannot, you will be forced to modify and retrain it.
In all these cases, you’ll need to make changes to the network to compensate, and then ring up the AI team to go back and train it. But what if they’re not available, or you used a third party? Either way, you’re looking at sub-optimal performance, potentially critical delays, increased costs, or likely all of the above. This means missing targets and slowing time-to-market.
Mipsology’s Zebra Software Can Help
FPGAs have a unique capability that the other chips lack — they are “field-programmable.” This means that an FPGA can be programmed for any function or application, and then reprogrammed. But this is far from the only advantage that FPGAs have over GPUs and ASICs.
FPGAs function reliably in the harsh, real-world environments demanded by many AI applications such as robotics, outdoor image recognition, space flight, autonomous vehicles, edge computing, data center and others. They also lengthen application lifespans for as long as 10 years, some 5x the expected lifespan of GPUs, making cost-sensitive commercial applications far more realistic.
Unfortunately, due to the complexity of programming them, FPGAs are used for AI and deep learning projects mostly only by FPGA specialists. This is where Zebra comes into play. Zebra’s Zero Effort IP eliminates the need for advanced expertise and makes FPGAs accessible to all AI practitioners. So you don’t have to worry about whether your training and deployment teams are aligned. Even if they aren’t, Zebra allows you to deploy and manage your trained network using the tools you already have.
The flexibility of a Zebra-powered FPGA allows your network to be computed “as is,” with a large choice of FPGA hardware that fits the needs of most AI application use cases. Zebra’s plug and play architecture enables you to deploy it without having to retrain the NN or make any other changes/modifications that would normally be required when using a different type of chip.
Companies that use Zebra are guaranteed top of the line inference acceleration without any of the drawbacks that come with ASICs, CPUs or GPUs. This is a big part of what makes Zebra the ”cuDNN of FPGAs.” Zebra allows users to achieve accelerated inference and reap all the benefits with zero added effort, right out of the box instead of months later.
Zebra can be deployed for many critical applications, including robotics, broadcast video, automotive, healthcare, smart cities and more. Zebra can run over a dozen FPGA boards, and Mipsology is working with a growing ecosystem of silicon, system and distribution partners, including Xilinx, Avnet, Western Digital and Advantech.
If you’ve been struggling with getting your neural network deployment up and running, or are experiencing setbacks and changes, reach out to us! Zebra is here to make your life easier.