MEEP, implementing a scale-down version of the ACME architecture

22 April 2021

ACME is expected to be a supercomputer chiplet that will provide a self-hosted framework for a variety of accelerators, including a vector unit. The MEEP system emulates a portion of an ACME chiplet design and – some number of processing tiles. In addition, MEEP also provides a set of IPs in the shape of an FPGA shell, which contains all the necessary mechanisms to communicate with the host , and also with other neighboring FPGAs. . MEEP serves not only as a pre-prototype for ACME, but as a general research testbed for all aspects of the architecture of high-performance computer systems.

In order for MEEP to achieve the system performance goals for a supercomputer built from ACME chiplets, the project requirement is about 5 double-precision TFLOPS per vector-oriented chiplet. This requires 2048 FPMACs in aggregate, which would need to be clocked at a conservative 1.33 GHz to achieve this goal.

The ACME-MEEP architecture is 64-bit RISC-V architecture with Vector Extensions and further vector extensions that will be added to further improve attainable vector performance.
The major ACME components are:

Memory tile, with the Memory Controller CPU (MCPU), Memory Controller, and last-level caches.
Network on Chip (NoC).
Processing tile with the scalar core, vector coprocessor, and other accelerators.
I/O peripherals and main memory.

Various example codes are currently being used to demonstrate the ACME architecture operation for the disaggregated memory system. First, the utility of long virtual vectors will be demonstrated by using a simple SAXPY case with software-managed coherence, if required. Next, the focus will be on more complex use cases, like sparse code examples with vector index registers, where the vector index registers are in the MCPU.

In order to reason about memory operations in the ACME architecture, memory operations have been classified into 3 categories: Cached, Scratchpad, and Bypass. These categories map memory operations into the cache subsystem, the scratchpad, or no on-chip memory at all (see Figure 1).

This is all work in progress in the project that will be further developed in the following months.

Learn more about ACME (Accelerator Architecture) on the platform ecosystem page.