Skip to main content

Basic Tips and Principle of FPGA Design Explained

Introduction

FPGA design is not a simple chip breakthrough but mainly uses its model to design products in other industries. From the perspective of chip devices, FPGA itself constitutes a typical integrated circuit in a semi-custom circuit, which contains digital management modules, embedded units, output units, and input units. On this basis, it is necessary to focus on the comprehensive chip optimization design of FPGA chips, and add new chip functions by improving the current chip design, thereby realizing the simplification of the overall structure of the chip and the improvement of performance. Therefore, when using it, you must understand its design principles in order to better improve its performance.

FPGA Chip

FPGA Design Tips

1) Serial to Parallel Conversion

It is an important skill in FPGA design, a common method for data stream processing, and a direct reflection of the idea of area and speed interchange. There are various implementation methods for serial-parallel conversion. According to the data ordering and quantity requirements, the following methods can be selected: registers for small design, ROM for large data volumes, state machines for complex serial-parallel conversion, or direct use of functions module to implement.

2) Pipelining

Pipelining is a frequently used design method in high-speed design. If the processing flow of a certain data is divided into several steps, and the entire data processing is "single flow", that is, there is no feedback or iterative operation, and the output of the previous step is the input of the next step, the pipeline design method can be used to improve the operating frequency of the system.

  • During pipeline design, the timing sequence should be arranged reasonably, the division of each operation step should be reasonable, and the data flow between each step should be carefully considered.
  • If the pre-stage operation time is exactly equal to the post-stage operation time, the design is the most simple, and the pre-stage output directly to the post-stage input will do.
  • If the operation time of the previous stage is longer than the operation time of the latter stage, the latter stage will often be idle, and the data of the former stage can be properly buffered and then output to the input end of the latter stage.
  • If the operation time of the previous stage is less than the operation time of the subsequent stage, the data flow must be distributed and preprocessed in parallel through the replication logic, or the data should be stored and post-processed in the previous stage, otherwise it will cause the data overflow of the later stage. Therefore, how to balance the processing time of each module should be given due consideration in the design.

3) ATA Interfaces

  • If the beat of the input data is the same frequency as the processing clock of the system, the input data register can be sampled directly with the master clock of the system to complete the synchronization of the input data.
  • If the input data and the processing clock of the system are asynchronous, use the processing clock to sample the input data twice (or multiple times) of the register to complete the synchronization of the input data. The effect of double (or multiple) sampling is to suppress the propagation of metastability and is suitable for a small number of functional units that are insensitive to errors.
  • In order to avoid the wrong sampling level in the asynchronous clock domain, generally use the RAM and FIFO storage methods to complete the data conversion of the asynchronous clock domain, and use the upper level in the input port.

4) Comparison of Effective Coding and If-Then-Else

In the combination process and the state process, If-Then-Else and case statements may be used, the synthesized results are the same as the combinational logic, but there will be differences. The case statement is implemented by a module, and the If-Then-Else is used, decoder implementation with privileged encoding.

5) Latches and Registers

The latch is level-effective, and in the case of level-effective, the input is sent to the output. For example, the register is valid on the clock edge and only outputs when the clock arrives. In addition to the lookup table, there are registers in the logic unit, so it is recommended that you use the registers that are valid along the edge, and do not use the latches implemented by adding combinational logic.

6) Clock Enable

The way the VHDL program is coded will determine whether to use the clock enable signal, which makes timing constraints easier to control.

7) Tri-states

The IEEE standard defines the Z value of the tri-state in the STD_LOGIC software package. That is, the simulation is a high-impedance state, and it is converted into a tri-state buffer during synthesis.

Only the I/0 cells of Altera devices have tri-state buffers, which has the advantage of eliminating possible bus connections and solving internal logic placement issues. Because no tri-state buffer is required, device testing is reduced and costs are saved. The internal three-state must be converted into combinational logic, and the complex output enable is easy to cause errors and low-efficiency logic, but there is a three-state buffer in the I/O unit.

8) Bidirectional Pin

When the pin is specified as the direction INOUT, the INOUT pin is used as input or tri-state output. In the program, a bi-direction is set to act as a bidirectional INOUT signal. When the CE enable signal is equal to 1, the from core signal from the core is sent to the bi-directional pin, otherwise, it is tri-state. At this time, the bi-directional is used as a tri-state output, and there is also an input assigned, the signal from the core is sent to the tri-state, the input is sent to the kernel.

9) Memory Storage

Synthesis tools have different capabilities to identify various memories. In order to identify the various memories, synthesis tools are very sensitive to specific decoding types, which are usually specified in the synthesis tool file. The synthesis tool may have some restrictions on the structure implementation, such as only synchronous writes, clock configuration restrictions, memory size restrictions, etc., an array data type must be specified to hold the memory value. The memory includes single port memory, single port dual clock memory, dual clock memory, ROM and so on.FPGA Design

Programming Language

Verilog HDL, as an HDL language, models system behaviour is in a hierarchical manner. The more important levels are system level, algorithm level, register transfer level (RTL), logic level, gate level, circuit switch level.

Regarding the for loop, in actual work, except for the use of the for loop statement when describing the simulation test stimulus (Testbench), the for loop is rarely used in RTL-level coding, because the for loop will be expanded by the synthesizer into the execution of all variable cases statement, each variable independently occupies register resources, which cannot effectively reuse hardware logic resources, resulting in huge waste. Generally, we often use case statements instead.

In addition, there is a big difference between if...else...and the case in the nested description, which has priority. Generally speaking, the first “if” has the highest priority, and the last “else” has the lowest priority. The case statement is a parallel statement, it has no priority, and the establishment of a priority structure requires a lot of logic resources, so do not use if...else... statement where case can be used.

FPGA Basic Architecture

FPGA Basic Architecture

FPGA Pin Assignment Principles

In the development of the chip, FPGA verification is an important part. How to effectively use the resources of the FPGA, pin assignment is also an important issue that must be considered. Generally, a better method is to automatically allocate the corresponding tools through some timing constraints during the synthesis process, but this method is often not desirable from the time period of research and development, and the RTL verification and board design must be carried out synchronously. When the verification code comes out, the board to be verified must also be designed, that is, the pin assignment must also be completed before the design code comes out. Therefore, the assignment of pins will depend more on people than tools. At this time, various factors need to be considered.

In summary, the following aspects are mainly considered:

1) The Signal Flow of the Logic Carried by the FPGA.

The FPGA used in IC verification generally has a very large logic capacity and a large number of external pins. At this time, the difficulty of wiring during PCB design must be considered. If the pin assignment is unreasonable, it may be a large number of crossed signal lines in PCB design, which brings great difficulties to the wiring or even fails to go through, or even if the wiring goes through, it may not meet the timing requirements due to the excessive external delay. Therefore, before the pin assignment, you should be quite familiar with the working environment of the FPGA, and you should be very clear about where the signals come from and where they are going. In this way, according to the principle of the shortest connection, the corresponding signals are assigned to the closest connection to the external device in BANK.

2) Master the Allocation of BANK within the FPGA.

FPGAs are now divided into several areas internally, and the number of I/O pins available in each area varies. In the IC verification, FPGAs of the ALTERA and XILINX series are used. There are certain differences in the allocation of internal banks in the FPGAs of these two manufacturers. But you can refer to the relevant manuals in the design. The following is an example of the allocation of BANK within the FPGA of the Stratix II series in ALTERA.

The allocation of BANKs inside the FPGA and the I/O standards are supported in each BANK. According to the distribution of the internal banks in the FPGA, the flow direction of the signals can be roughly fixed. The direction of the FPGA in the single board, and at the same time, the related signals are distributed to the related banks according to the principle of proximity. This method can complete the distribution of general signals.

3) Master the I/O standards supported by each bank of the selected FPGA.

It can be seen that the I/O standards supported by each bank in the FPGA are not the same, so the pins that support the same standard should be concentrated in one bank when assigning pins because the same standard in the FPGA is required. A bank generally does not support two I/O standards at the same time, of course, there are exceptions, which requires consulting the working conditions required by the relevant I/O standards.

4) Pay Attention to the Assignment of Pins for Special Signals.

The special signals here mainly refer to the clock signal and the reset signal, or some signals that require high driving capability. The clock signal is generally required to be distributed to the global clock pin so that the time delay obtained in this way will be the smallest and the drive will be the strongest. Because the reset signal requires good synchronization and strong driving ability, it is also sent from the global clock pin under normal circumstances. When allocating clocks, the allocation strategies vary greatly according to the number of clocks, and you need to pay attention. It is necessary to refer to the corresponding manual to see where clocks belong to. Generally, the clocks are differential. At this time, if the clocks used are not, you should pay attention to that the P terminal and the N terminal generally cannot be assigned to different clock signals at the same time. Anyway, if the paired clocks in the XILINX series of FPGAs are used at the same time, they cannot reach the same area at the same time, because there is only one clock line that reaches the same area.

Therefore, it is best not to use pairs of P and S at the same time when there are few clocks, but just choose P or N so that there will be no conflict.

5) The Consideration of Signal Integrity.

Since bus assignment often occurs in the assignment, and a large number of buses may often be flipped at the same time, this will bring a series of signal integrity problems separately.

I'm a electronic editor interested in semiconductor as my work. Hope to share and get new ideas from here, if you have any interset of my electronic works, you can visit https://www.kukelec.com/.
DesignSpark Electrical Logolinkedin