MEEP provides a foundation for building European-based chips and infrastructure to enable rapid prototyping using a library of IPs and a standard set of interfaces to the Host CPU and other FPGAs in the system using the FPGA shell.  In addition to RISC-V architecture and hardware ecosystem improvements, MEEP is also improving the RISC-V software ecosystem with an improved and extended software toolchain and software stack, including a suite of HPC and HPDA (High-Performance Data Analytics) applications.

MEEP platform layers

From a high-level perspective, MEEP is defined as a layered structure, as depicted below. Direct dependencies existing among consecutive vertical layers, whereas the vertical layer is transversal to all the rest.

 

Layered structure of MEEP platform
Layered structure of MEEP platform

 

The above structure contextualizes the scope of each layer and the close relationship among contiguous layers at the same time. HPC applications (first layer on top) need software tools (second layer) to transform programs into something readable and executable on the emulation platform (third layer) according to the characterization of the accelerator architecture (fourth layer).

 

Each layer has its own responsibility, and all of them might be analyzed using profiling and performance monitoring tools. In particular:

  • HPC Applications layer studies and analyzes different kinds of applications, considering their workloads, data distribution and dependencies, level of parallelism, programming languages, etc.
  • The Software toolchain layer provides the required software ecosystem to exploit hardware capabilities. It includes the software stack, from the operating system (at the bottom) to several HPC runtimes (at the top).
  • The Emulation Platform layer enables the possibility of executing HPC applications on top of the Accelerator Architecture, using the information received from the Software toolchain layer.
  • The Accelerator Architecture layer defines the accelerator components, their connectivity and functionality.
  • The Profiling and performance monitoring layer gives some feedback in terms of applications memory access pattern.

Read more details on the MEEP platform layers here.

 

MEEP development phases

 

To deal with MEEP's complexity and dimension, its development was afforded in two main phases:

 

Phase 1: It worked with a portion of the architecture on a single FPGA, a single instance of the emulation platform. The development of that phase used the Xilinx Alveo U280 data center accelerator cards. The idea behind that experimental work was to demonstrate the viability of MEEP as an FPGA-emulation platform.

MEEP teams made progress on the different layers involved in the project, from software to hardware.

Regarding the Software Stack, most of the efforts were focused on the Operating System, as this was the foundation for all the software layers built on top. The same happened from the hardware perspective, where the Operating System opened the door to the execution of HPC applications on the hardware architecture. Thus, the final goal of the emulation platform and the emulated accelerator was to support the requirements imposed by the OS.

The efforts made from the hardware stack perspective were focused on three different areas:

Emulation Platform: The goal there was to create an FPGA Shell that enabled the communication between the host and emulated accelerator. More information can be found on the Emulation Platform. 

Modelling ACME accelerator architecture: The goal there was to work on the refinement of the ACME accelerator specifications by following a data-driven decision procedure. Coyote played the main role as a performance modelling simulator, modeling different components and the corresponding simulation results were used to check initial assumptions, refine, and validate features and behaviors. More information could be found on .

Modelling ACME accelerator architecture: The goal was to work on refining the ACME accelerator specifications by following a data-driven decision procedure. Coyote played the main role as a performance modelling simulator, modelling different components and the corresponding simulation results were used to check initial assumptions, refine, and validate features and behaviours. More information can be found on Accelerator Architecture.

ACME as an emulated accelerator: Given that the ACME accelerator SoC would be further developed after the life cycle of MEEP, the hardware stack had to work on taking one step after another, starting from a first approach to the architecture and then refining the design to be as faithful as possible to the proposed ACME architecture specifications. In the end, the emulated accelerator became a proof of concept for the MEEP project, which allowed the execution of HPC applications on top of it. More information can be found on Accelerator Architecture/corresponding subsection.

Phase 2: A more extensive and more complex system was developed in which up to 8 cards were placed in a system or node. Multiple systems were connected via a high-speed switch. The FPGAs could communicate within or between nodes, facilitating FPGA-to-FPGA communication development in two ways: locally and remotely. The idea was to scale the system using a denser form factor beyond what could be done at that time with normal off-the-shelf platforms. Therefore, the prototype platform enabled mappings within an FPGA, across multiple FPGAs in a node, and multiple FPGAs in multiple nodes.