Mythic has developed a truly unique AI compute platform that enables smart surveillance camera systems, smart appliances, powerful machine vision systems, commercial drones, brilliant robotics, and more. With unmatched performance and power efficiency, the Mythic platform enables AI designers to deploy in form-factors that were previously out of reach.  Key features include:

  • Lowest latency:  Single frame delay by running batch size of 1
  • Efficient Compute: Best-in-class performance-to-power efficiency
  • Hyper scalability:  Low-power single-chip to high-performance rack systems
  • Ease of use:  Supports major platforms, frameworks and topology-agnostic performance

Mythic products are based on a unique tile-based AI compute architecture that features three fundamental hardware technologies – Compute-in-Memory, Dataflow Architecture, and Analog Computing. For AI developers, Mythic delivers the hardware, software tool kit, and trained neural networks to ease deployment in edge devices.

The Mythic AMP features an array of tiles that blend familiar concepts and breakthrough new technology to deliver unmatched performance, power, and flexibility. Each tile has a large Analog Compute Engine (Mythic ACE™) to store bulky neural network weights, local SRAM memory for data being passed between the neural network nodes, a single-instruction multiple-data (SIMD) unit for processing operations not handled by the ACE, and a nano-processor for controlling the sequencing and operation of the tile. The tiles are interconnected with an efficient on-chip router network, which facilitates the dataflow from one tile to the next. On the edge of the AMP, off-chip connections provide an interface to the host system.

Mythic software provides the necessary tools to bridge t­he gap between popular training frameworks like Pytorch and highly constrained edge and server deployments on the Mythic AMP that must optimize for cost, power, and performance. The Mythic Optimization Suite solves a significant customer pain point by converting the neural network to an 8-bit representation, while preserving accuracy in the analog compute domain. The Mythic Graph Compiler automatically generates machine code for our AMP, so developers do not have to worry about low-level implementation or optimization. Mythic host drivers are pain-free and light-weight, with support planned for most popular embedded and server operating systems.

For a more in-depth look into Mythic technology, check out our blogs:

Mythic’s Chip Architecture

Software Engineering at Mythic

COMPUTE IN MEMORY

BOOSTING MEMORY CAPACITY AND PROCESSING SPEED

Today’s most common computing architectures are built on assumptions about how memory is accessed and used. These systems assume that the full memory space is too large to fit on-chip near the processor, and that we do not know what memory will be needed at what time. To address the space issue and the uncertainty issue, these architectures build a hierarchy of memory. The memory hierarchy near the CPU is small and fast and can support high frequency of use, while DRAM and SSD are large enough to store the bulkier, less time-sensitive data.

A Standard Computing Architecture

Compute-in-memory is built using different assumptions: we have a large amount of data that we need to access but we know exactly when we will need it. These assumptions are possible for AI inference applications because the execution flow of the neural network is deterministic – it is not dependent on the input data like many other applications. Using that knowledge, we can strategically control the location of data in memory, instead of building a cache hierarchy to cover for our lack of knowledge. Compute-in-memory also adds local compute to each memory array, allowing it to process the data directly next to each memory. By having compute next to each memory array, we can have an enormous memory that has the same performance and efficiency as L1 cache (or even register files).

DATAFLOW ARCHITECTURE

MAXIMIZING INFERENCE PERFORMANCE THROUGH CAREFUL ARCHITECTURE DESIGN

Standard compute architectures are designed to tackle sequential algorithms. They excel at these algorithms by using a massively powerful and power-intensive CPU core, surrounding it with a memory architecture that matches the memory profile of those applications. AI inference is not a typical sequential application; it is a graph-based application where the output of one graph node flows to the input of other graph nodes.

Graph applications provide opportunities to extract parallelism by assigning a different compute element to each node of the graph. When the results from one graph node are completed, they flow to the next graph node to start the next operation, which is ideal for dataflow architecture. In our dataflow architecture, we assign a graph node to each compute-in-memory array and put the weight data for that graph node into that memory array. When the input data for that graph node is ready, it flows to the correct location, adjacent to the memory array, and then is executed upon by the local compute and memory. Many inference applications use operations like convolution, which processes bits of the image frame at a time instead of the whole frame at once.

Our dataflow architecture also maximizes inference performance by having many of the compute-in-memory elements operating in parallel, pipelining the image processing by processing neural networks nodes (or “layers”) in parallel in different parts of the frame. By being built from the ground-up as a dataflow architecture, the Mythic architecture minimizes the memory and computational overhead required to manage the dependency graphs needed for dataflow computing, and keeps the application operating at maximum performance.

ANALOG COMPUTING

ACHIEVING UNMATCHED EFFICIENCY AND PERFORMANCE

Analog computing provides the ultimate compute-in-memory processing element. The term compute-in-memory is used very broadly and can mean many things. Our analog compute takes compute-in-memory to an extreme, where we compute directly inside the memory array itself. This is possible by using the memory elements as tunable resistors, supplying the inputs as voltages and collecting the outputs as currents. We use analog computing for our core neural network matrix operations, where we are multiplying an input vector by a weight matrix.

Analog computing does a couple things for us. First, it is amazingly efficient; it eliminates memory movement for the neural network weights since they are used in place as resistors. Second, it is high performance; there are hundreds of thousands of multiply-accumulate operations occurring in parallel when we perform one of these vector operations. Given these two properties, analog computing is the core of our high-performance yet highly efficient system.

Mythic AI Workflow

Effortlessly deploy your models to the Mythic IPU

Mythic’s platform delivers category-leading performance, power, and on-chip model capacity in a low-cost form factor. To leverage this compute, Mythic’s software stack optimizes and compiles trained neural networks using a flow that is familiar and easy to use for developers. We build on existing ecosystems like ONNX for the front-end to ensure frictionless integration with standard training flows.

The software then runs through two stages: optimization and compilation. The Mythic Optimization Suite transforms the neural network into a form that is compatible with analog compute-in-memory, including quantization from floating point values to integer 8-bit. The Mythic Graph Compiler performs automatic mapping, packing, and code generation. The final result is a packaged binary containing everything that the host driver needs to program the AMP and run neural networks in a real-time environment.

Mythic Optimization Suite

The first stage of the Mythic AI workflow is optimization of a trained neural network. The quantization flow converts 32-bit floating point weights and activations — which is the standard numerical format in training — to 8-bit integer, which is essential for effective deployment at the edge. Quantization represents a major pain point for customers with high accuracy requirements. Our simple flow runs after training and performs conversion to an analog representation of 8-bit (ANA8). The resulting accuracy is typical of digital 8-bit quantization which is typically deploye in power-constrained edge applications.

We also provide retraining flows for applications with strict accuracy requirements and/or more aggressive performance and power targets. Quantization-aware and analog-aware retraining builds resiliency into layers that are more sensitive to the lower bit-depths of quantization and to analog effects. For aggressive performance and power targets, certain layers can even be pushed to 4 bits and below without a significant drop in accuracy.

Mythic Graph Compiler

The second stage of the Mythic AI workflow generates  the binary image to run on our AMP. The conversion from a neural network compute graph to machine code is handled in an automated series of steps including mapping, optimization, and code generation. Powerful hardware architecture elements including many-core processing, SIMD vector engines, and dataflow schedulers are all leveraged automatically by the graph compiler. Even the host driver is simple and pain-free, with input/output and memory transfers handled behind the scenes. We also provide compiler support for high compute intensity operations before, after, and in-between neural network layers with our array of processors and SIMD vector engines.

Mythic Software Vision

Mythic is a long term partner for deploying powerful AI

Long term, Mythic envisions an SDK with a suite of powerful tools to help developers evaluate tradeoffs as well as excellent compatibility with the fast moving world of deep neural networks. Mythic’s platform and analog compute technology deliver complete deterministic execution and a tremendous amount of flexibility in making tradeoffs compared to other platforms. We believe the best software platform provides automatic tools for exploring the design space of accuracy, model size and pruning, performance, and power. This lets developers quickly identify the best solution within the constraints of power and cost.

We also believe the best software platform should be modular and integrate easily with popular tools such as TensorFlow, ONNX, or whatever else comes along. As new layer types and network topologies get invented, software and hardware support should be straightforward. Our platform ensures this through the modular design of the software SDK, leveraging a large amount of generic matrix compute capabilities rather than architecture-specific accelerators. Altogether, our vision and roadmap for the software SDK ensures that selecting Mythic as a platform will be an effective choice for many years to come.