Table of Content

NG-FPGAs offer a very flexible architecture that allows the implementation of a wide range of applications. However, the user must understand that a safe and reproducible behavior can be guaranteed only if some simple but efficient and necessary design rules have been adopted during the design steps.

Into an FPGA, just like in any other electronic component, the internal logic and routing delays can vary across the Process, Voltage and Temperature variations (PVT).

However, NG-MEDIUM and the “nxmap” synthesis and implementation tools provide robust architecture as well as implementation procedures to ensure a safe and reproducible behavior against those delays variations, just by following some very simple design rules.

To guarantee the correct behavior, the clock(s) must be distributed by using dedicated routing resources in order to guarantee very low skew across the FPGA die. The NG-MEDIUM FPGA fabric is split in two zones or clock regions. Each zone can use up to 12 low skew signals.

Clock(s) and other low skew routing resources: “nxmap” automatically assigns low skew routing resources to the clocks of your design, to ensure that the delay at the clock input of all FFs will be controlled, and the skew (maximum delay difference at the destinations) will be low enough and predictable. The maximum clock skew across the FPGA fabric is then controlled by construction, and limited to some tens of picoseconds.

Note that the low skew network can be optionally used for other high fanout signals. This can be the case of some heavily loaded signals like RESET and LOAD_ENABLE, if they are applied to a large number of FFs across the FPGA. In this case the “low skew” feature is less important than the maximum delay allowed to reach a large amount of destinations. This maximum delay can also be guaranteed by the use of the low skew network.

Applying timing constraints to your design: The user can constrain the design to support the required clock(s) frequency. The timing constraints are specified in the “nxpython” script file.

By applying a “Period” constraint to the clock, the user specifies the maximum delay allowed from any FF output to any FFs input (using the same clock edge) after implementation. The “Timing Analyzer” is embedded into the synthesis and implementation tools. It interacts with each one of the synthesis and implementation processes, in order to find – if possible - a solution that meets the user’s defined timing requirements.

The user can generate timing reports to check that the timing constraints were met during the implementation process.

Typical example of physical path between two FFs

The synthesis and implementation tools manage two different kinds of timing:

Silicon process related timing: The maximum values are fixed by the foundry silicon process (always the same for a specified device and speed-grade). Most timing parameters are documented in the datasheet.
- Clock skew: difference of timing delays at the destination FF and the source FF clock inputs. This delay difference can be positive or negative, but always limited to some tens of picoseconds by silicon process. The timing analyzer gives the exact clock skew value for each analyzed path.

Clock skew = Clock_delay@FF_dest – Clock_delay@FF_src

Tco: FFs clock to output delay. Defines the delay between the clock edge at the FF input, for the Q output to be stable and valid.
Tsu: FF setup time. Defines the amount of time required by the target FF to safely sample the incoming data value at its D input.
Tcomb: combinatorial logic and routing delays (LUTs, carry logic and other combinatorial elements).

Implementation related timing: The routing delays on the connections (nets) between sources and destinations. Those delays are defined by the “Route” process (depending on the used routing resources). Note that the routing is dependent of the placement.

“nxmap” synthesis and implementation algorithms try to find a solution that meet the user’s timing requirements specified by constraints. The user can generate timing reports to analyze the implemented results.

To guarantee a stable behavior over PVT variations, the following condition must be met:

Period >= Tco(source) + ∑Tnets + ∑Tcomb + Tsu(destination) + Clock skew

Among the analysis tools:

Timing Analyzer: allows the user to generate timing reports. The timing analyzer commands are documented in the “NanoXplore nxmap Python API” documentation. Among the timing report information:
- Identification of the clock domains detected in the design
- Slack (timing margin) to meet the specified constraint(s). If the timing constraints are met, the slack is positive. Otherwise, the slack is negative.
- Detailed delays on selected path(s)
“nxmap” GUI: Graphical User’s Interface to analyze placement and routing results. See NXmap User Manual documentation for detailed information. The user can observe the location of each IO port, tile logic elements, BRAMs, DSP blocks, PLL and WaveForm Generators and have a detailed or simplified view of the used routing resources.

Synchronous design methodology

Did you ever face a “haunted” design? This kind of design that surprisingly works “most of the time”… but fails “sometimes”, particularly during demonstrations or inaugurations?

Most of the time, if not due to PCB or power supplies issues, those random problems are due to:

Inadequate clock distribution (e.g.: local clock distributed by general routing resources), generally related with internal clock generation with logic.
Inadequate reset methodology
Local asynchronous SET and/or RESET conditions in the source code.
Lack of anti-meta-stability and resynchronization stages when crossing asynchronous clocks domains.
Missing timing constraints. “nxmap” will ignore the timing on unconstrained paths – and then doesn’t warn about possible timing errors.

Recommended clocking schemes

Single clock rising edge:

Whenever possible, this the most recommended, simplest and safer clock distribution scheme. This clocking scheme provides the easiest way for both user and tools to implement a safe design working at any frequency from DC to a maximum frequency determined by the timing analyzer (period limited by the longest path from FF to FF).

The internal timing can be constrained with a single timing constraint: “Period”. However inputs and outputs of the design might require additional constraints to specify the timing required on the inputs (setInputDelay) and the outputs (setOutputDelay), according the datasheet of the external components. See the timing constraints chapter for more information.

In order to reduce the FPGA internal clock propagation delay, the PLL + WFG (WaveFormGenerator) can be used to generate a ZERO Delay clock distribution. Cancelling the clock delay distribution reduces the FPGA clock to output pads, and reduces or eliminates potential hold time problems on the FPGA inputs.

Multiple synchronous clocks: Using dedicated clock management resources

NG-MEDIUM provides 4 sets of clock management resources (ClocK Generators or CKG), located at the corners of the die. Each CKG include one PLL and 8 WaveForm Generators.

The PLL reference input frequency can come from single ended or differential semi-dedicated clock input pins. At its outputs, the PLL can generate a wide range of clock outputs by applying frequency multiplication and/or division factors on the incoming input.

In order to reduce the FPGA internal clock propagation delay, the PLL + WFG can be used to generate a ZERO Delay clock distribution. Cancelling the clock delay distribution reduces the FPGA clock to output pads, and reduces or eliminates potential hold time problems on the FPGA inputs.

The WaveForm Generators (WFG) can be used as clock buffers. They provide direct routing to the low skew network. The WFG can also be used to generate clock dividers and user programmable patterns. See NG-MEDIUM datasheet for more information.

By combining PLL and WaveFormGenerators, the user can generate internally synchronous clocks, like for example:

Main_clock: same phase and frequency as the input clock pad
Higher_frequency_clock: a multiple of the input frequency (ex: Fin x 2)
Lower_frequency_clock: divided input clock frequency (ex: Fin / 2)

Those 3 clocks are synchronous together. Clock domain changes will be easily managed by the synthesis and implementation tools. No meta-stability or resynchronizing issues while timing constraints are met (more details on meta-stability issues on chapter 1.2.3).

Multiple asynchronous clocks: Meta-stability issues and resynchronization

When a signal synchronous of a clock is resampled by another asynchronous clock, it must be resynchronized to avoid unstable behavior due to meta-stability.

The meta-stability is an invalid logic level at a FF output, caused by a transition on the D input of the FF during its setup/hold window. This invalid logic level can cause incorrect behavior of your design.

Possible FF meta-stability when setup/hold time violation occurs

When registering an asynchronous signal, the meta-stability phenomenon can’t be avoided, but simple design rules allow to cancel its negative and unreproducible effects.

Fortunately, the meta-stability doesn’t propagate from a FF to another, providing that the connection delay to the second FF is limited to a small fraction of the clock period.

Two cases must be considered.

Case 1: Resynchronizing a single signal (one bit):

Simple anti-meta-stability method using an additional FF

In this example, the first FF will be subject to meta-stability. However, this invalid logic level will not be propagated to the next FF, particularly if the propagation delay between both FFs is short.

Case 2: Resynchronizing a multibit bus (two or more bits):

Multibit busses cannot be directly resynchronized just by applying the same technics to each one of the bus bits. Fortunately, a bus is qualified by an additional signal such as “DATA_VALID” or any other signal that indicates when the bus has a stable value. Thus, the user can safely resynchronize this control signal, and use the resynchronized version to sample the bus value,

Safely resampling an asynchronous bus

The user must make sure that the clock frequency is high enough to sample the bus value while its value is still stable.

RESET methodology

The NG-MEDIUM internal Flip-Flops have a dedicated RESET input. The tile FFs can be reset synchronously or asynchronously, while the registers embedded onto BRAM and DSP blocs support exclusively synchronous reset.

Global reset

Can be synchronous or asynchronous – for the tile FFs. However, in any case, to guarantee a safe startup, the reset signal must be properly resynchronized on the design master clock, in order to avoid any meta-stability condition during the first active clock cycle.

Remember that BRAM and DSP registers can be reset synchronously only (no asynchronous reset available). However, independently of the synchronous or asynchronous usage of the reset, the risk of meta-stability during the first active clock cycle exists if the reset signal is not synchronous of the clock.

When using PLL for internal clock(s) generation, remember that the generated clocks are not safe during the PLL locking process. For a safe design startup, ensure your design is reset at least until the PLL locked status (RDY) is set.

A simple and efficient mechanism consists in delaying the RDY output of the PLL by some clock periods, as in the following source code sample:

signal RESET_DELAY : std_logic_vector(7 downto 0);

signal INTERNAL_RESET : std_logic;

begin

process(CLK_generated_by_PLL_and_WFG)  begin

	if rising_edge(CLK_generated_by_PLL_and WFG)  then

		RESET_DELAY <= RESET_DELAY(6 downto 0) & not(RDY);

end if;

end process;

INTERNAL_RESET <= RESET_DELAY(7);

RESET_DELAY is the delay line (8 steps of one clock period each). The last bit of the chain is used as INTERNAL_RESET. NanoXplore recommends to use at least two levels of registers on the reset delay line. It can be safely used as synchronous or asynchronous reset of the design. In any case, the timing constraints will cover all timing paths, including INTERNAL_RESET source to any Flip-Flop, including BRAM, DSP blocks and IO FFs.

in NG-MEDIUM, the “nxmap” implementation tools will use the low skew network – if possible - for the internal reset routing – taking in account the high fanout of this signal.

Local reset

For local RESET (to be applied only to a partial set of FFs), the synchronous way should be prefer. This gives more control of the routing and logic delays to the implementation tools.

Remember that an asynchronous reset is glitch sensitive.

Don’t apply both Asynchronous_SET and Asynchronous_RESET to the same Flip-Flop(s). The internal Flip-Flops have a dedicated synchronous or asynchronous RESET input only. The tile FFs reset can be synchronous or asynchronous, while the registers embedded onto BRAM, DSP support exclusively synchronous reset.

Things to avoid

Don’t use both clock rising and falling edges if not strictly necessary

Most designs can be implemented by using exclusively the clock(s) rising edges for the FPGA internal logic. This gives more flexibility and timing control to the implementation tools.

Don’t use internally generated clocks if not strictly necessary

Internally generated clocks (by using combinatorial or registered logic) create race conditions that drives to unpredictable or unstable behavior.

Remember that internal clocks can be easily and safely generated synchronously with the main clock by using the NG-MEDIUM PLL and Waveform Generators.

Example of internally generated local clock

Illustration of possible timing diagrams

The first figure illustrates the schematics of a portion of design, where an internal local clock is generated using a FF output. This creates a race condition, where the routing delays (of the data and the local clock) will impact the behavior (will be unstable over PVT variations). See the timing diagrams on second figure (case 1 and case 2). We can clearly see that the behavior will be routing dependent – and probably unstable over PVT.

Resynchronize the RESET signal on the clock domain

The tile Flip-Flops have a dedicated input for synchronous or asynchronous RESET.

Reset de-assertion is very critical

If the Reset signal (used as asynchronous or synchronous reset) is not synchronized on the FPGA clock, it can create setup violations on many Flip Flops during its de-assertion.

Risk of hazardous startup!!!

Timing constraints can’t help to avoid this problem

RESET distribution and associated timing diagrams

The RESET signal is propagated by using routing resources to the destination FFs. However, even if distributed by low skew lines, its de-assertion can be interpreted differently by the FFs, and can cause hazardous startup.

This issue can be easily overcome by resynchronizing the RESET input, using anti-metastability FFs.

Resynchronized RESET signal

The resynchronized RESET signal can be used as Asynchronous or Synchronous RESET.

If used as synchronous RESET, the implementation tools and the timing analyzer will control the propagation delays to ensure a predictable behavior at the specified frequency.

Avoid using asynchronous RESET if possible

The tile Flip-Flops have a dedicated input for synchronous or asynchronous RESET.

Asynchronous reset is glitch sensitive, while synchronous reset is part of your synchronous design, and then it’s covered by the period constraint – if generated synchronously to the clock domain. The implementation tools and the timing analyzer will control the propagation delays to ensure a predictable behavior at the specified frequency.

Don’t use asynchronous SET

There is no dedicated asynchronous SET input on the tile FFs. However synchronous SET can be easily and safely implemented by combining LUT + FF of the same FE (NG-MEDIUM logic cell that includes one 4-input LUT and on D Flip Flop.

However, “nxmap” synthesis tools can build the behavior of asynchronous set at the cost FF, extra logic resources mapped to LUTs in another FE and additional routing delays (uses more logic resources, and poor performance in terms of power consumption and working frequency).

Don’t use asynchronous initialization from a given value (signal or constant)

Asynchronous initialization from a signal value will prevent the synthesis and implementation tools from using dedicated flip-flops. Combinatorial loops can be generated. The resulting behavior can be unpredictable.

Example of source code to be avoided:

process(CLK, INIT)  begin

   if INIT = ‘1’  then  -- Asynchronous initialization

      DATAR <= DATA_IN; -- Assigned value is not a constant

   elsif rising_edge(CLK)  then

      if ENA = ‘1’  then

         DATAR <= CNT;

      end if;

   end if;

end process;

Instead, the following code will be prefer:

process(CLK)  begin   

   if rising_edge(CLK)  then

      if INIT = ‘1’  then

         DATAR <= DATA_IN; -- Synchronous initialization

      elsif ENA = ‘1’  then   

         DATAR <= CNT;

         end if;

   end if;

end process;

Writing efficient HDL source code

The quality of the source code is the most important factor to ensure an efficient, stable and predictable design.

Whenever possible, use a simple, compact and clear writing style.

HDL synthesis is the first step of the implementation process.

If for any reason, the synthesis results are not optimized enough for your design requirements, there will be no way to change this during the subsequent mapping, place and route processes.

The HDL source code is probably your main investment for maintainability, design density, power reduction and performance optimization

Source code must be optimized for the targeted architecture
As much as possible, it must be also flexible and portable (to other architectures or synthesis tools)
Readability is another very important factor

Write a direct, simple and clear source code

The more compact, the more readable
The synthesis tools can also make a better translation to take advantage of the silicon features when the source code is compact and clear

Avoid using combinatorial processes if not necessary

Combinatorial processes can generate latches and combinatorial loops. This can led to unpredictable or unstable behavior, and have a negative impact on logic and routing resources utilization.

Be very careful if you have to write combinatorial processes.

Avoid declaring and using un-necessary combinatorial signals

In a synchronous design, the combinatorial signals are registered with FFs.

Generally, it’s simpler, faster and more efficient to use a single process to define the global (combinatorial logic and register) in a single clocked process.

Don’t declare un-necessary signals if those signals must be registered

All NG-MEDIUM configurable elements have their own Flip Flop (tile logic, BRAMs, DSP Blocks, and IOs). The synthesis automatically will recognize that the function can be implemented in the same elements (by packing combinatorial logic and the FFs into the same logic element such as Functional Elements, BRAMs or DSP blocks)
Reduced code size and improved readability
Apply this method also for state machines (you will avoid timing and implementation problems)

Have a look on the following VHDL source code that describes a pipelined adder-multiplier function.

signal A, B, C: std_logic_vector(15 downto 0); -- A, B and C inputs
signal A_REG, B_REG, C_REG: std_logic_vector(15 downto 0); -- registered inputs

signal A_PLUS_B: std_logic_vector(15 downto 0); -- combanitorial added output
signal A_PLUS_B_REG: std_logic_vector(15 downto 0); -- registered adder output

signal MULT: std_logic_vector(31 downto 0); -- combanitorial multiplier output
signal MULT_REG: std_logic_vector(31 downto 0); -- registered multiplier output

begin

process(CLK) begin
  if rising_edge(CLK) then
    A_REG <= A;
    B_REG <= B;
    C_REG <= C;
  end if;
end process;

A_PLUS_B <= A_REG + B_REG;

process(CLK) begin
  if rising_edge(CLK) then
    A_PLUS_B_REG <= A_PLUS_B;
  end if;
end process;

MULT <= A_PLUS_B_REG * C_REG;

process(CLK) begin
  if rising_edge(CLK) then
    MULT_REG <= MULT;
  end if;
end process;

The same behavior can be written as follows. Readability is increased.

signal A, B, C: std_logic_vector(15 downto 0); -- A, B and C inputs
signal A_REG, B_REG, C_REG: std_logic_vector(15 downto 0); -- registered input

signal A_PLUS_B_REG: std_logic_vector(15 downto 0); -- registered adder output

signal MULT_REG: std_logic_vector(31 downto 0); -- registered multiplier output

begin

process(CLK) begin
  if rising_edge(CLK) then
    A_REG <= A;
    B_REG <= B;
    C_REG <= C;
    A_PLUS_B_REG <= A_REG + B_REG;
    MULT_REG <= A_PLUS_B_REG * C_REG;
  end if;
end process;

At this time, we do not take in consideration the rules for signed or unsigned operations. This example is just to show that there is an easy and compact way to describe the same functionality with very few lines.

For arithmetic and/or DSP functions, see the chapter DSP blocks.

Use appropriate sensitivity list

Un-appropriate sensitivity list can drive to synthesis/simulation misunderstanding. The simulation behavior will not necessarily match the implementation results.

For simple clocked processes (with NO asynchronous reset or initialization), only the clock signal must be in the sensitivity list

process(CLK)  begin

   if rising_edge(CLK)  then

      ……                 -- assignments

   end if;

end process;

For clocked processes with asynchronous re-initialization of FFs (not recommended), clock and reset (or any other asynchronous signal) must be in the sensitivity list

process(CLK, RST)  begin

   if RST = ‘1’  then

      ……   -- signal assignments to ‘0’ or (others => ‘0’)

   elsif rising_edge(CLK)  then

      ……                 -- assignments

   end if;

end process;

For combinatorial processes (not recommended) all the involved signals in the assignments must be in the sensitivity list (does not prevent generation of latches).

Be careful when using relational operators

VHDL allows to compare busses with different number of bits.
The VHDL relational operators =, <, >, <= and >= work in a very surprising way when the number of operands bits doesn’t match.
In addition, synthesis and simulation can have different interpretations, depending on the context.

Conclusion: Make sure the busses length are the same in both operands

Avoid using un-necessary variable in the synthesizable source code

If not strictly necessary, it’s safer and better to use signals instead of variables:

In synthesis, most often, variables do not have a logical representation in the synthesized netlist or in simulation.
Very often, using variables can make more complex the synthesizable source code – for synthesis as well as simulation) while providing no benefit

Conclusion: use signal instead of variable whenever is possible.

Don’t declare ports and signals as integers – if not necessary

The packages std_logic_unsigned and std_logic_signed provide direct type conversion, and additional functions. For example you can simply write :

CNT <= CNT + 1; -- CNT is a std_logic_vector

Additional conversion functions are provided within the std_logic_arith package
- Conv_integer(std_logic_vector_signal)

Converts the value of the signal declared as a std_logic_vector to its equivalent integer value (according the used std_logic_unsigned or std_logic_signed package).

Conv_std_logic_vector(integer_value, std_logic_vector_number of bits)

Converts an integer value to its equivalent std_logic_vector value. The number of bits is specified as the second argument (according the used std_logic_unsigned or std_logic_signed package).

Conclusion: use std_logic_vector unless exception.

Don’t use inactive or redundant assignments

In VHDL, if no condition is found to assign a new value to a signal, the signal will keep its previous value
In a combinatorial process, assigning a condition to maintain the previous value to a signal will produce latches and/or combinatorial loops.
In a clocked process if a condition enables some assignments, the dedicated FFs load or clock_enable will be used
Don’t use the null statement if not strictly necessary
Leave your source code as compact as possible, taking advantage of the standard packages and the synthesis rules. In the following simple example, the lines in green do not help for anything. Instead, in some more complex cases, they could prevent the synthesis tools from optimizing the resulting netlist.

process(CLK)  begin

   if rising_edge(CLK)  then

      if ENA = '1'  then

         SIG_R <= SIG_IN;

 --   else

 --      SIG_R <= SIG_R;

      end if;

   end if;

end process;

Other tricks and tips for a more compact and re-usable source code

Use (others => ‘0’) when several bits are assigned to the same value.
- Example for reset condition :

REG_DATA <= (others => ‘0’);

Works for any bus length

Example for High impedance output buffers :

if  WRITE_EXT_MEM = ‘1’  then

DOUT <= REG_DATA;

else

DOUT <= (others => ‘Z’);

end if;

Use generic parameters whenever your module can be re-used with different parameters. This is particularly useful for memory and DSP functions. Example :

library IEEE;

use IEEE.std_logic_1164.all;

use IEEE.std_logic_unsigned.all.

use IEEE.std_logic_arith.all;

entity MY_MEMORY is

generic (

Number_of_address_bits : integer := 10,

Number_of_data_bits : integer := 16

);

port (

CLK  : std_logic;

DIN  : std_logic_vector(Number_of_data_bits-1 downto 0);

WE   : std_logic;

ADR  : std_logic_vector(Number_of_address_bits-1 downto 0);

DOUT : std_logic_vector(Number_of_data_bits-1 downto 0)

);

End MY_MEMORY;

architecture ARCHI of MY_MEMORY is

type MEM_TYPE is array((2**Number_of_address_bits)-1 downto 0 of std_logic_vector(DIN’range);

signal MEM : MEM_TYPE;

begin

process(CLK)  begin

   if rising_edge(CLK)  then

      if WE = '1'  then

         MEM(conv_integer(ADR)) <= DIN;

      else

         DOUT <= MEM(conv_integer(ADR)); -- DOUT doesn’t change   

      end if;                            -- during write

   end if;

end process;

end ARCHI;

This same module can be instantiated several times in your design with different configuration for each instance, by assigning individual parameters sets to each instance (Number_of_address_bits and Number_of_data_bits assigned by generic map).

Take advantage of the “std_logic_arith” and “std_logic_unsigned” or “std_logic_signed” package for a more compact and more readable source code, thanks to the implicit and explicit conversion functions.
Note that the package “numeric_std” is also supported by “nxmap”.

Design hierarchy

Must be organized in a logical way. Avoid mixing unrelated functions into the same hierarchical module. It will be easier to assign synthesis options and directives.
Hierarchical modules can be synthesized separately to verify the quality of results such as :
- Used logic resources (tile logic, BRAM, DSP blocks…)
- Timing performance for the considered module.
Register all the outputs of the hierarchical modules – if performance is required

Naming rules of entities, component labels and signals:

Good and clear naming rules greatly improves the readability
- Short but clear names can ease the recognition on debug and verification tools (timing analyzer, NXmap GUI, simulation)
- However, too short names can drive to confusion or additional difficulties to recognize some elements (signals, entities, components…)
- Be careful with reserved words or other words commonly used (WRITE, READ, BUS, COUNT…).

during the synthesis process, the FF outputs signals are automatically renamed by the tools, by adding “_reg” to the original name defined in the source code. As an example, a signal called “DATA_CHANNEL” in the source code will be renamed “DATA_CHANNEL_reg” after synthesis if it’s generated by a FF or group of FFs. Be careful not to name any other non registered signal “DATA_CHANNEL_reg” to avoid possible post-synthesis conflicts.

Signal names should reflect their polarity. For example:
- RST is an active high signal (resets the FFs when it goes high, while RST_N is active low.
- LOAD is active high (loading occurs when LOAD is high), while LOAD_N is active low.

Inference vs instantiation:

Inference advantages and limitations

The inference describes the behavior of the functions to be synthesized and implemented, with standard HDL description. As a result, the source code is portable and can be used with other architecture or tools.
Synthesis and mapping options: Most high performance NG-MEDIUM cells can be inferred with a very simple source code and implemented as desired. However, by using some mapping options, the user can have a better control over the synthesis and mapping processes, providing that the described functionality matches the FPGA elements behavior. This is particularly true for RAM inference – that could be virtually implemented with RAM blocks or Register_file (RF). See the “addMappingDirective” in the NanoXplore NXmap Python API documentation for more information.
However, some NG-MEDIUM built-in functions cannot be inferred. This is the case for example of the PLL, WaveformGenerators using patterns, some RAM and DSP blocks configurations, high performance IOs using DDR or SERDES.

In such cases, it might be necessary to instantiate the primitives.

Instantiation of NG-MEDIUM primitives

NG-MEDIUM primitives can also be instantiated:

Register_file
RAM blocks
DSP blocks
PLL and WFG

Although memories (Register_file and BRAM) can be easily and efficiently inferred with simple and portable source code for most common functions, the user might prefer to instantiate the elements. The source code is not portable, but the user can get access some features that are not necessarily accessible by inference.

In addition, some primitives cannot be inferred. They can be used only by instantiation.

PLL most often combined with WaveForm Generator(s)
Dual port RAM with different bus sizes on both ports
Some DSP blocks configurations
Other primitives

See the “Library guide” for more information.

NG-MEDIUM architecture survey

Before starting your design, it’s important to acquire a proper understanding of the NG-MEDIUM architecture. From the user’s point of view, we can separate the architecture in three main blocks:

The user’s IO ring (organized in IO banks). Includes flexible single ended and/or differential IOs, DDR registers, SpaceWire compatible interfaces IO blocks, and many more features (input and output calibrated delay lines, output serializers, input de-serializers…).

There are also 4 clock generators also called CKG (PLL + WaveFormGenerators – one set in each FPGA corner of the die).

The FPGA core logic offers
- Tile logic for combinatorial or registered functions, arithmetic, register_files (64 x 16-bit synchronous simple dual port memories – with EDAC)
- Flexible 48Kbit synchronous true dual port memory blocs (including user’s selectable EDAC)
- DSP blocs for high performance complex DSP functions
The FPGA configuration logic and dedicated IO interface

Please, refer to the next chapters as well as NG-MEDIUM data sheet for detailed information.

NG-MEDIUM inputs and outputs (IOs)

The NG-MEDIUM user’s IOs are organized in 13 IO banks. Each IO bank have a single power supply (Vddio) for all the IOs into the same bank.

The top and bottom banks are called “complex”. The complex IO banks provide more flexibility and performance than the left and right banks that are called “simple”.

All IOs can be configured as input, output or bi-directional IO.

The next figures show the IO banks location, and the numbering of the die IO blocks. For physical pin numbers on the selected package, please consult the NG-MEDIUM datasheet. Currently, NG-MEDIUM is available in three different packages:

LGA625 : Land-Grid array 625 pins
CGA625 : ceramic column-Grid array 625 pins
CQFP352 : ceramic Quad-flat package 352 pins

In addition another bank located to the left of the FPGA die is used for FPGA configuration. This document doesn’t cover the configuration process. Please, refer to the NG-MEDIUM datasheet for detailed information about the FPGA configuration modes and pins.

Available user’s IOs

LGA/CGA 625 packages:

All the 374 die user’s IOs are available

Bank	Type	I/Os	Location	Bank	Type	I/Os

0	Simple	22	Left	1	Simple	22
2	Complex	30	Bottom	3	Complex	30
4	Complex	30	Bottom	5	Complex	30
6	Simple	30	Right	7	Simple	30
8	Simple	30	Right
9	Complex	30	Top	10	Complex	30
11	Complex	30	Top	12	Complex	30

LGA/CGA 625 packages – available IOs

IO banks distribution and user’s IOs availability on LGA/CGA625 packages

CQFP 352 package:

Only 192 of the 374 die user’s IOs are available

Bank	Type	I/Os	Location	Bank	Type	I/Os

0	Simple	14	Left	1	Simple	12
2	Complex	30	Bottom	3	Complex	-
4	Complex	-	Bottom	5	Complex	30
6	Simple	22	Right	7	Simple	-
8	Simple	24	Right
9	Complex	30	Top	10	Complex	-
11	Complex	-	Top	12	Complex	30

CQFP 352 package – available IOs

IO banks distribution and user’s IOs availability on CQFP352 package

Simple and complex banks IO features

Each IO is composed of two different and complementary elements:

The IO buffers (primitives NX_IOB_x): can be used as input, output or bi-directional. Single ended I/Os have a default 10K to 40K PullUp. The IO buffers can be configured to work in a wide range of single ended and differential electrical standards (LVCMOS, SSTL, HSTL, and LVDS). The IO buffers can be instantiated or inferred. They also can be parametrized in the “nxpython” script to adapt their electrical configuration to meet the board design requirements. Among parameters :
- Output drive
- Slew rate (for LVCMOS outputs)
- Turbo mode (faster LVCMOS input)
- Optional 2K to 6K PullUp
- Optional adjustable termination (only for complex banks – SSTL, HSTL, LVDS)

The sequential input and output elements (input, output and tri-state control FF). It includes the following elements :
- Single flip-flop on input, output and tri-state paths
- Optional adjustable delay lines on the input, output and tri-state paths (0 to 63 x 160 ps delay)

The NG-MEDIUM IO ring is segmented into 13 IO banks. The left (B0 and B1) and the right (B6, B7 and B8) IO banks offer flexible but limited features. They are called “simple” banks.

Instead, the top (B9, B10, B11 and B12) and bottom banks (B2, B3, B4 and B5) are called “complex” banks and offer more electrical and functional features.

The following table summarizes the main IOs features available into complex and simple IO banks.

Feature	Complex	Simple
Number of IOs	30	30/22

Power supply (Vddio)	1.8, 2.5 or 3.3v	1.8, 2.5 or 3.3v

Supported IO standards	LVCMOS, SSTL, HSTL, LVDS	LVCMOS, SSTL, HSTL, LVDS

Single DFF (in, out, tri-state)	Yes	Yes
Differential SSTL/HSTL	Yes	No
LVDS	Yes	Yes (no internal input termination)
Resistive input termination	Yes	No
Programmable input/output delay	Yes	Yes
CDC (Clock Domain Changer)	Yes	No
Shift Register	Yes	No
DDR mode	Yes	No
SpaceWire	Yes	No

Simple and complex banks IO features summary

Simple banks: Electrical standards and supported electrical parameters

Standard

Type

Bank supply

Drive

Speed

Special considerations

LVCMOS 3.3V

SE (*)

3.3 V

2–16 mA

100Mb/s

In NG-MEDIUM, single ended I/Os have an internal default 10K to 40K PullUp. In addition the user can active a slower value (2K to 6K) optional PullUp

Slew rate SLOW/FAST for outputs

Turbo mode for inputs (faster inputs at the cost of higher static power)

(Those electrical parameters can be set by constraints in a script file)

Cookbook