In the previous article "Getting Started with Xilinx Zynq, All Programmable System-On-Chip (SoC)", we have the first touch of XilinxZynq All Programmable SoC, Xilinx Vivado Design Suite and Xilinx Software Development Kit (SDK). Digilent ZYBO was used in the design implementation.
In this article, we will learn how to use Xilinx SDSoC, the latest development environment, to create embedded C/C++/OpenCL application development and implement the software design directly on the FPGA device. Digilent Arty Z7-20, Xilinx All Programmable SoC Zynq-7000 Z70-20 Development Platform for Embedded Vision, will be used in the hardware implementation. We will follow the below tutorial written by Adam Taylor.
Xilinx SDSoC for hardware & software co-design
What makes the Zynq device so flexible for many applications is the combination of ARM processing cores and the programmable logic. This means we can segment the design between elements which are ideal for implementing within the ARM cores, e.g. high-level decision making and design elements which are ideal for implementing in programmable logic such as image processing pipelines.
Of course, the traditional Zynq development flow would split the development between Vivado and SDK. With this approach, it is difficult to move functions between the programmable logic (PL) and processing system (PS) to obtain the optimal system performance.
This is where SDSoC comes in to play, SDSoC is a system optimising complier which enables software defined development of the entire device, both PS and PL. The standard SDSoC development flow is:
- Develop the application in a high-level language
- Analyse the design using provided performance monitors to determine the bottlenecks in the performance
- Accelerate the bottleneck functions into the programmable logic using the SDSoC project overview
- Re-validate the performance, and if necessary select other functions to be accelerated as well.
Thanks to a combination of high level synthesis (HLS) and a connectivity framework, functions can be transferred between the PS and PL with ease.
Base platform for SDSoC
To obtain the best performance in the programmable logic, we need to define some optimization pragmas within the function to be accelerated, such that these functions can define what optimizations the HLS tool performs.
Allowing us to develop our Zynq based designs using high level languages like, C, C++ and OpenCL. To support SDSoC use, a base platform is required which defines the underlying hardware and the software environment. For the Arty Z7 you can find a SDSoC platform located on my GitHub, enabling us to develop for the arty using SDSoC.
Once we have the SDSoC platform downloaded we can use this to develop applications for the Arty Z7
Getting started with SDSoC
In the remainder of this tutorial, we will examine how we can use this platform to accelerate the performance of a matrix multiplication example.
Step 1: within SDSoC is to create a new SDSoC project, this can be achieved by doing the following:
File -> New -> Xilinx SDx Project
This will open a new project dialog which enables the platform specification, OS selection and example application selection.
Create new project
Select Art7 Z7-20 as hardware platform
Select Operating System (OS) & Target CPU
Select Example Application
Step 2: On the second page of the new project dialog to select a new platform, click on add custom platform option. This will enable navigation to the location of the downloaded Arty Z7-20 platform from the github. Once added, the platform will be visible in the list of built in platforms and we can select it for our project.
Step 3: Following completion of the project, a SDx Project Settings page will be visible within the SDSoC environment. We use this project setting tab so that functions are moved between the PS and the PL. This is achieved by selecting the Add HW Function button, and selecting the required function to move into the PL.
If we wish to accelerate a function from running on the PS to being implemented within the PL, there are a few rules which we need to follow:
- The function cannot contain any system calls to the operating system.
- The function must contain the entire functionality.
- C Constructs need to be bounded and of a fixed size.
- Implementation of the constructs needs to be unambiguous.
Within the projects setting control panel, we can also control the frequency of operation for the accelerated module and the data motion network which moves data between the PL and PS.
SDSoC Project Settings – This is the main control function in SDSoC
Functions Available to be accelerated
Step 4: When SDSoC builds, it will create the necessary bin files to run the design from a SD Card or we can use the debugger as required as well to download the example into the Zynq. This can take a while to build, so in many cases when we are selecting functions to accelerate we can run an estimation build first of the total resources required and the estimated acceleration. We do this by ticking the estimate performance option in the project settings tab.
When this runs through the build process a resultant report will be generated.
Estimation Run Results
Once we build the design without the estimate performance option ticked, we can download and run the example as we would a normal application using the debugger. To launch the debugger, right click on your project, and select Debug As -> Launch on Hardware (SDSoC Debugger)
Launching a Debug Session
This will download the application to the Arty Z7, pausing the Zynq at the programme entry as shown below. Connecting the SDSoC terminal to the Arty Z7 UART will enable us to display the results of the example.
As we move into developing our own SDSoC applications, we need to be aware of the libraries and acceleration stacks provided with SDSoC.
To enable faster development of the end application, SDSoC comes with several HLS libraries which developers can use in their applications, these include:
- reVision Stack – Provides a three-element development stack which enables the use of OpenCV, Caffe along with a range commonly used neural network open source frame works for embedded vision applications. reVision includes multiple acceleration capable OpenCV
- Math Library – Provides synthesisable implementations of the standard math libraries.
- IP Library – Provides IP libraries for implementing FFT, FIR, and Shift Register LUT functions.
- Linear Algebra Library – Provides a library of commonly used Linear Algebra functions.
- Arbitrary Precision Data Types Library – Provides support for non-power-of-2, arbitrary length data using signed and unsigned integers. This library allows them to use the FPGA’s resources more efficiently.
As we have worked through this example we have seen that it is easy to move SW functions between the PS and PL. With the move to the PL, also comes a significant increase in performance.