Mythic has developed a truly unique AI compute platform that enables smart camera systems, intelligent appliances, brilliant robotics, and more. Our platform enables AI designers to deploy in form factors that were previously out of reach, which is the result of unmatched performance and energy efficiency. Key features include:

  • Lowest latency:  Single frame delay by running batch size of 1
  • Highest performance per watt:   >4 TOPS/Watt
  • Hyper scalability:  Low-power single-chip to high-performance rack systems
  • Ease of use:  Major platform support (e.g., TensorFlow) and topology-agnostic performance

Mythic products are based on a unique tile-based AI compute architecture that features three fundamental hardware technologies – Compute-in-Memory, Dataflow Architecture, and Analog Computing. For AI developers, the Mythic SDK streamlines the preparation of trained neural networks for edge and low-latency datacenter deployments, and also performs automatic optimization and compilation of dataflow graphs for our unique architecture.

The Mythic IPU features an array of tiles that blend familiar concepts and breakthrough new technology to deliver unmatched performance, power, and flexibility. Each tile in a Mythic IPU has a large analog compute array to store bulky neural network weights, local SRAM memory for data being passed between the neural network nodes, a single-instruction multiple-data (SIMD) unit for processing operations not handled by the analog compute array, and a nano-processor for controlling the sequencing and operation of the tile. The tiles are interconnected with an efficient on-chip router network, which facilitates the dataflow from one tile to the next. On the edge of the IPU, we have off-chip connections to either other Mythic chips or to the host system.

The Mythic SDK includes the necessary tools to bridge the gap between popular training frameworks like TensorFlow and Pytorch and highly constrained edge and datacenter deployments on the Mythic IPU that must optimize for cost, power, and performance. The Mythic Optimization Suite solves a significant customer pain point by automatically converting the neural network to an 8-bit representation, while preserving accuracy in the analog compute domain. The Mythic Graph Compiler automatically generates machine code for our IPU, so developers do not have to worry about low-level implementation or optimization. Mythic host drivers are pain-free and light-weight, with support planned for most popular embedded and server operating systems.

For a more in-depth look into Mythic technology, check out our blogs:

Mythic’s Chip Architecture

Software Engineering at Mythic

COMPUTE IN MEMORY

BOOSTING MEMORY CAPACITY AND PROCESSING SPEED

Today’s most common computing architectures are built on assumptions about how memory is accessed and used. These systems assume that the full memory space is too large to fit on-chip near the processor, and that we do not know what memory will be needed at what time. To address the space issue and the uncertainty issue, these architectures build a hierarchy of memory. The memory hierarchy near the CPU is small and fast and can support high frequency of use, while DRAM and SSD are large enough to store the bulkier, less time-sensitive data.

A Standard Computing Architecture

Compute-in-memory is built using different assumptions: we have a large amount of data that we need to access but we know exactly when we will need it. These assumptions are possible for AI inference applications because the execution flow of the neural network is deterministic – it is not dependent on the input data like many other applications. Using that knowledge, we can strategically control the location of data in memory, instead of building a cache hierarchy to cover for our lack of knowledge. Compute-in-memory also adds local compute to each memory array, allowing it to process the data directly next to each memory. By having compute next to each memory array, we can have an enormous memory that has the same performance and efficiency as L1 cache (or even register files).

DATAFLOW ARCHITECTURE

MAXIMIZING INFERENCE PERFORMANCE THROUGH CAREFUL ARCHITECTURE DESIGN

Standard compute architectures are designed to tackle sequential algorithms. They excel at these algorithms by using a massively powerful and power-intensive CPU core, surrounding it with a memory architecture that matches the memory profile of those applications. AI inference is not a typical sequential application; it is a graph-based application where the output of one graph node flows to the input of other graph nodes.

Graph applications provide opportunities to extract parallelism by assigning a different compute element to each node of the graph. When the results from one graph node are completed, they flow to the next graph node to start the next operation, which is ideal for dataflow architecture. In our dataflow architecture, we assign a graph node to each compute-in-memory array and put the weight data for that graph node into that memory array. When the input data for that graph node is ready, it flows to the correct location, adjacent to the memory array, and then is executed upon by the local compute and memory. Many inference applications use operations like convolution, which processes bits of the image frame at a time instead of the whole frame at once.

Our dataflow architecture also maximizes inference performance by having many of the compute-in-memory elements operating in parallel, pipelining the image processing by processing neural networks nodes (or “layers”) in parallel in different parts of the frame. By being built from the ground-up as a dataflow architecture, the Mythic architecture minimizes the memory and computational overhead required to manage the dependency graphs needed for dataflow computing, and keeps the application operating at maximum performance.

ANALOG COMPUTING

ACHIEVING UNMATCHED EFFICIENCY AND PERFORMANCE

Analog computing provides the ultimate compute-in-memory processing element. The term compute-in-memory is used very broadly and can mean many things. Our analog compute takes compute-in-memory to an extreme, where we compute directly inside the memory array itself. This is possible by using the memory elements as tunable resistors, supplying the inputs as voltages and collecting the outputs as currents. We use analog computing for our core neural network matrix operations, where we are multiplying an input vector by a weight matrix.

Analog computing does a couple things for us. First, it is amazingly efficient; it eliminates memory movement for the neural network weights since they are used in place as resistors. Second, it is high performance; there are hundreds of thousands of multiply-accumulate operations occurring in parallel when we perform one of these vector operations. Given these two properties, analog computing is the core of our high-performance yet highly efficient system.

Mythic SDK

Effortlessly deploy your models to the Mythic IPU

Mythic’s platform delivers category-leading performance, power, and on-chip model capacity in a low-cost form factor. To leverage this compute, Mythic’s software stack optimizes and compiles trained neural networks using a flow that is familiar and easy to use for developers. We build on existing ecosystems like ONNX and TensorFlow for the front-end to ensure frictionless integration with standard training flows.

The software then runs through two stages: optimization and compilation. The Mythic Optimization Suite transforms the neural network into a form that is compatible with analog compute-in-memory, including quantization from floating point values to integer 8-bit. The Mythic Graph Compiler performs automatic mapping, packing, and code generation. The final result is a packaged binary containing everything that the host driver needs to program the accelerator chip and run neural networks in a real-time environment.

Mythic Optimization Suite

The first stage of the Mythic SDK is optimization of a trained neural network. The quantization flow converts 32-bit floating point weights and activations — which is the standard numerical format in training — to 8-bit integer, which is essential for effective deployment at the edge and in datacenter. Quantization represents a major pain point for customers with high accuracy requirements. Our simplest flow runs after training and performs conversion to 8-bit and also verifies that the analog compute does not degrade accuracy below acceptable thresholds. The resulting accuracy is typical of digital 8-bit quantization and is often sufficient for power-constrained edge deployments.

We also provide retraining flows for applications with strict accuracy requirements and/or more aggressive performance and power targets. Quantization-aware and analog-aware retraining builds resiliency into layers that are more sensitive to the lower bit-depths of quantization and to analog noise. For aggressive performance and power targets, certain layers can even be pushed to 4 bits and below without a significant drop in accuracy.

Mythic Graph Compiler

The second stage of the Mythic SDK generates and compiles the binary image to run on our IPU. The conversion from a neural network compute graph to machine code is handled in an automated series of steps including mapping, optimization, and code generation. Powerful hardware architecture elements including many-core processing, SIMD vector engines, and dataflow schedulers are all leveraged automatically by the graph compiler. Even the host driver is simple and pain-free, with input/output and memory transfers handled behind the scenes. We also provide compiler support for high compute intensity operations before, after, and in-between neural network layers with our array of processors and SIMD vector engines.

Mythic Software Vision

Mythic is a long term partner for deploying powerful AI

Long term, Mythic envisions an SDK with a suite of powerful tools to help developers evaluate tradeoffs as well as excellent compatibility with the fast moving world of deep neural networks. Mythic’s platform and analog compute technology deliver complete deterministic execution and a tremendous amount of flexibility in making tradeoffs compared to other platforms. We believe the best software platform provides automatic tools for exploring the design space of accuracy, model size and pruning, performance, and power. This lets developers quickly identify the best solution within the constraints of power and cost.

We also believe the best software platform should be modular and integrate easily with popular tools such as TensorFlow, ONNX, or whatever else comes along. As new layer types and network topologies get invented, software and hardware support should be straightforward. Our platform ensures this through the modular design of the software SDK, leveraging a large amount of generic matrix compute capabilities rather than architecture-specific accelerators. Altogether, our vision and roadmap for the software SDK ensures that selecting Mythic as a platform will be an effective choice for many years to come.