Cookbook

Cookbook

Table of Content

Table of figures

Typical example of physical path between two FFs

Possible FF meta-stability when setup/hold time violation occurs

Simple anti-meta-stability method using an additional FF

Safely resampling an asynchronous bus

Example of internally generated local clock

Illustration of possible timing diagrams

RESET distribution and associated timing diagrams

Resynchronized RESET signal

IO banks distribution and user’s IOs availability on LGA/CGA625 packages

IO banks distribution and user’s IOs availability on CQFP352 package

Impedance adaptation resistor connected to VTO for single ended SSTL or HSTL inputs

Impedance adjustment vs VCCIO

Input impedance adaptation for differential SSTL/HSTL inputs

SSTL/HSTL differential output buffer pair

Bidirectional SSTL/HSTL input/output buffer pair

Input impedance adaptation for LVDS inputs

Basic IO configuration (simple and complex banks)

SERDES data path simplified diagram

SERDES delay lines control block simplified diagram

FLD activation

FLG activation

Writing and reading delay registers (note that DIG = ‘1’)

Semi-dedicated clock inputs

Simplified clock distribution diagram

Simplified ClocK Generator (CKG) diagram

Simplified WaveForm Generator diagram

WFG pattern generator diagram

Diagram for WFGs synchronization

Synchronized WFGs timing diagram example

Simplified PLL block diagram

PLL divided outputs timing diagram (1)

PLL divided outputs timing diagram (2)

Multiple clocks generation with basic WFG configurations

Multiple clocks generation using WFG input inverter

Optimized multiple clock generation

NG-MEDIUM clock distribution overview

NG-MEDIUM global clock distribution (FPGA fabric & IOs)

NG-MEDIUM alternate global clock distribution (FPGA fabric) or complex banks fast IO clocks

NG-MEDIUM very fast local IO clock distribution

NG-MEDIUM central clock switch

NG-MEDIUM Input and output paths with classic clock distribution

Output timing with classic clock distribution

Input timing with classic clock distribution

NG-MEDIUM input and output paths improvement using PLL

Output timing improvement using PLL with feedback by clock tree

Input timing improvement using PLL with feedback by clock tree

Zero delay clock generation with additional clocks

Timing diagram of the Zero delay clock generation

Core logic distribution

Functional Element (FE) simplified diagram

Distribution of the logic resources available in a tile

Functional element diagram

Carry logic directly connected to 4 of the 8 neighboring FEs

X-LUT combines the output of the 4 neighboring FE’s LUTs

Register_File includes 64x16 SDP RAM array + 32 associated FEs

Register_File simplified internal diagram

RAM block simplified diagram

RAM block organization and physical/logical mapping without EDAC

RAM block organization and physical/logical mapping with EDAC

DSP simplified block diagram

Chaining DSP blocks in a same CGB row

Chaining DSP blocks in a same CGB row is from right to left on the

High performance Pipelined Multiplier 24 x 30 with rounding

Sequential FIR filter implementation with a single MAC

Sequential symmetrical FIR filter with a single Pre-add / MAC

Direct form parallel FIR filter

Direct form parallel FIR filter with adder tree

Direct form parallel FIR filter with adder chain

Transpose structure FIR filter

Transpose structure FIR filter and associated DSP blocks configuration

Systolic structure FIR filter and associated DSP blocks configuration

Symmetric systolic structure FIR filter

Symmetric transpose structure FIR filter

 

Table of tables

LGA/CGA 625 packages – available IOs

CQFP 352 package – available IOs

Simple and complex banks IO features summary

Simple IO banks electrical parameters and performance

Complex IO banks electrical parameters and performance

VTO range vs VDDIO

Recommended “termination” parameter values for 50, 75 and 100 Ohms impedance

NG-MEDIUM configuration modes

 

 

Digital design methodology

NG-FPGAs offer a very flexible architecture that allows the implementation of a wide range of applications. However, the user must understand that a safe and reproducible behavior can be guaranteed only if some simple but efficient and necessary design rules have been adopted during the design steps.

Into an FPGA, just like in any other electronic component, the internal logic and routing delays can vary across the Process, Voltage and Temperature variations (PVT).

However, NG-MEDIUM and the “nxmap” synthesis and implementation tools provide robust architecture as well as implementation procedures to ensure a safe and reproducible behavior against those delays variations, just by following some very simple design rules.

To guarantee the correct behavior, the clock(s) must be distributed by using dedicated routing resources in order to guarantee very low skew across the FPGA die. The NG-MEDIUM FPGA fabric is split in two zones or clock regions. Each zone can use up to 12 low skew signals.

Clock(s) and other low skew routing resources: “nxmap” automatically assigns low skew routing resources to the clocks of your design, to ensure that the delay at the clock input of all FFs will be controlled, and the skew (maximum delay difference at the destinations) will be low enough and predictable. The maximum clock skew across the FPGA fabric is then controlled by construction, and limited to some tens of picoseconds.

Note that the low skew network can be optionally used for other high fanout signals. This can be the case of some heavily loaded signals like RESET and LOAD_ENABLE, if they are applied to a large number of FFs across the FPGA. In this case the “low skew” feature is less important than the maximum delay allowed to reach a large amount of destinations. This maximum delay can also be guaranteed by the use of the low skew network.

Applying timing constraints to your design: The user can constrain the design to support the required clock(s) frequency. The timing constraints are specified in the “nxpython” script file.

By applying a “Period” constraint to the clock, the user specifies the maximum delay allowed from any FF output to any FFs input (using the same clock edge) after implementation. The “Timing Analyzer” is embedded into the synthesis and implementation tools. It interacts with each one of the synthesis and implementation processes, in order to find – if possible - a solution that meets the user’s defined timing requirements.

The user can generate timing reports to check that the timing constraints were met during the implementation process.

Typical example of physical path between two FFs

The synthesis and implementation tools manage two different kinds of timing:

  • Silicon process related timing: The maximum values are fixed by the foundry silicon process (always the same for a specified device and speed-grade). Most timing parameters are documented in the datasheet.

    • Clock skew: difference of timing delays at the destination FF and the source FF clock inputs. This delay difference can be positive or negative, but always limited to some tens of picoseconds by silicon process. The timing analyzer gives the exact clock skew value for each analyzed path.

Clock skew = Clock_delay@FF_dest – Clock_delay@FF_src

 

  • Tco: FFs clock to output delay. Defines the delay between the clock edge at the FF input, for the Q output to be stable and valid.

  • Tsu: FF setup time. Defines the amount of time required by the target FF to safely sample the incoming data value at its D input.

  • Tcomb: combinatorial logic and routing delays (LUTs, carry logic and other combinatorial elements).

 

Implementation related timing: The routing delays on the connections (nets) between sources and destinations. Those delays are defined by the “Route” process (depending on the used routing resources). Note that the routing is dependent of the placement.

“nxmap” synthesis and implementation algorithms try to find a solution that meet the user’s timing requirements specified by constraints. The user can generate timing reports to analyze the implemented results.

To guarantee a stable behavior over PVT variations, the following condition must be met:

 

Period >= Tco(source) + ∑Tnets + ∑Tcomb + Tsu(destination) + Clock skew

 

Among the analysis tools:

  • Timing Analyzer: allows the user to generate timing reports. The timing analyzer commands are documented in the “NanoXplore nxmap Python API” documentation. Among the timing report information:

    • Identification of the clock domains detected in the design

    • Slack (timing margin) to meet the specified constraint(s). If the timing constraints are met, the slack is positive. Otherwise, the slack is negative.

    • Detailed delays on selected path(s)

  • “nxmap” GUI: Graphical User’s Interface to analyze placement and routing results. See NXmap User Manual documentation for detailed information. The user can observe the location of each IO port, tile logic elements, BRAMs, DSP blocks, PLL and WaveForm Generators and have a detailed or simplified view of the used routing resources.

 

Synchronous design methodology

Did you ever face a “haunted” design? This kind of design that surprisingly works “most of the time”… but fails “sometimes”, particularly during demonstrations or inaugurations?

Most of the time, if not due to PCB or power supplies issues, those random problems are due to:

  • Inadequate clock distribution (e.g.: local clock distributed by general routing resources), generally related with internal clock generation with logic.

  • Inadequate reset methodology

  • Local asynchronous SET and/or RESET conditions in the source code.

  • Lack of anti-meta-stability and resynchronization stages when crossing asynchronous clocks domains.

  • Missing timing constraints. “nxmap” will ignore the timing on unconstrained paths – and then doesn’t warn about possible timing errors.

 

Recommended clocking schemes

Single clock rising edge:

Whenever possible, this the most recommended, simplest and safer clock distribution scheme. This clocking scheme provides the easiest way for both user and tools to implement a safe design working at any frequency from DC to a maximum frequency determined by the timing analyzer (period limited by the longest path from FF to FF).

The internal timing can be constrained with a single timing constraint: “Period”. However inputs and outputs of the design might require additional constraints to specify the timing required on the inputs (setInputDelay) and the outputs (setOutputDelay), according the datasheet of the external components. See the timing constraints chapter for more information.

In order to reduce the FPGA internal clock propagation delay, the PLL + WFG (WaveFormGenerator) can be used to generate a ZERO Delay clock distribution. Cancelling the clock delay distribution reduces the FPGA clock to output pads, and reduces or eliminates potential hold time problems on the FPGA inputs.

 

Multiple synchronous clocks: Using dedicated clock management resources

NG-MEDIUM provides 4 sets of clock management resources (ClocK Generators or CKG), located at the corners of the die. Each CKG include one PLL and 8 WaveForm Generators.

The PLL reference input frequency can come from single ended or differential semi-dedicated clock input pins. At its outputs, the PLL can generate a wide range of clock outputs by applying frequency multiplication and/or division factors on the incoming input.

In order to reduce the FPGA internal clock propagation delay, the PLL + WFG can be used to generate a ZERO Delay clock distribution. Cancelling the clock delay distribution reduces the FPGA clock to output pads, and reduces or eliminates potential hold time problems on the FPGA inputs.

The WaveForm Generators (WFG) can be used as clock buffers. They provide direct routing to the low skew network. The WFG can also be used to generate clock dividers and user programmable patterns. See NG-MEDIUM datasheet for more information.

By combining PLL and WaveFormGenerators, the user can generate internally synchronous clocks, like for example:

  • Main_clock: same phase and frequency as the input clock pad

  • Higher_frequency_clock: a multiple of the input frequency (ex: Fin x 2)

  • Lower_frequency_clock: divided input clock frequency (ex: Fin / 2)

Those 3 clocks are synchronous together. Clock domain changes will be easily managed by the synthesis and implementation tools. No meta-stability or resynchronizing issues while timing constraints are met (more details on meta-stability issues on chapter 1.2.3).

 

Multiple asynchronous clocks: Meta-stability issues and resynchronization

When a signal synchronous of a clock is resampled by another asynchronous clock, it must be resynchronized to avoid unstable behavior due to meta-stability.

The meta-stability is an invalid logic level at a FF output, caused by a transition on the D input of the FF during its setup/hold window. This invalid logic level can cause incorrect behavior of your design.

Possible FF meta-stability when setup/hold time violation occurs

 

When registering an asynchronous signal, the meta-stability phenomenon can’t be avoided, but simple design rules allow to cancel its negative and unreproducible effects.

Fortunately, the meta-stability doesn’t propagate from a FF to another, providing that the connection delay to the second FF is limited to a small fraction of the clock period.

Two cases must be considered.

Case 1: Resynchronizing a single signal (one bit):

Simple anti-meta-stability method using an additional FF

 

In this example, the first FF will be subject to meta-stability. However, this invalid logic level will not be propagated to the next FF, particularly if the propagation delay between both FFs is short.

 

Case 2: Resynchronizing a multibit bus (two or more bits):

Multibit busses cannot be directly resynchronized just by applying the same technics to each one of the bus bits. Fortunately, a bus is qualified by an additional signal such as “DATA_VALID” or any other signal that indicates when the bus has a stable value. Thus, the user can safely resynchronize this control signal, and use the resynchronized version to sample the bus value,

Safely resampling an asynchronous bus

 

The user must make sure that the clock frequency is high enough to sample the bus value while its value is still stable.

 

 

RESET methodology

The NG-MEDIUM internal Flip-Flops have a dedicated RESET input. The tile FFs can be reset synchronously or asynchronously, while the registers embedded onto BRAM and DSP blocs support exclusively synchronous reset.

 

Global reset

 

Can be synchronous or asynchronous – for the tile FFs. However, in any case, to guarantee a safe startup, the reset signal must be properly resynchronized on the design master clock, in order to avoid any meta-stability condition during the first active clock cycle.

 

Remember that BRAM and DSP registers can be reset synchronously only (no asynchronous reset available). However, independently of the synchronous or asynchronous usage of the reset, the risk of meta-stability during the first active clock cycle exists if the reset signal is not synchronous of the clock.

 

When using PLL for internal clock(s) generation, remember that the generated clocks are not safe during the PLL locking process. For a safe design startup, ensure your design is reset at least until the PLL locked status (RDY) is set.

A simple and efficient mechanism consists in delaying the RDY output of the PLL by some clock periods, as in the following source code sample:

signal RESET_DELAY : std_logic_vector(7 downto 0); signal INTERNAL_RESET : std_logic; begin process(CLK_generated_by_PLL_and_WFG) begin if rising_edge(CLK_generated_by_PLL_and WFG) then RESET_DELAY <= RESET_DELAY(6 downto 0) & not(RDY); end if; end process; INTERNAL_RESET <= RESET_DELAY(7);

 

RESET_DELAY is the delay line (8 steps of one clock period each). The last bit of the chain is used as INTERNAL_RESET. NanoXplore recommends to use at least two levels of registers on the reset delay line. It can be safely used as synchronous or asynchronous reset of the design. In any case, the timing constraints will cover all timing paths, including INTERNAL_RESET source to any Flip-Flop, including BRAM, DSP blocks and IO FFs.

in NG-MEDIUM, the “nxmap” implementation tools will use the low skew network – if possible - for the internal reset routing – taking in account the high fanout of this signal.

Local reset

For local RESET (to be applied only to a partial set of FFs), the synchronous way should be prefer. This gives more control of the routing and logic delays to the implementation tools.

Remember that an asynchronous reset is glitch sensitive.

Don’t apply both Asynchronous_SET and Asynchronous_RESET to the same Flip-Flop(s). The internal Flip-Flops have a dedicated synchronous or asynchronous RESET input only. The tile FFs reset can be synchronous or asynchronous, while the registers embedded onto BRAM, DSP support exclusively synchronous reset.

 

Things to avoid

Don’t use both clock rising and falling edges if not strictly necessary

Most designs can be implemented by using exclusively the clock(s) rising edges for the FPGA internal logic. This gives more flexibility and timing control to the implementation tools.

 

 

 

Don’t use internally generated clocks if not strictly necessary

Internally generated clocks (by using combinatorial or registered logic) create race conditions that drives to unpredictable or unstable behavior.

Remember that internal clocks can be easily and safely generated synchronously with the main clock by using the NG-MEDIUM PLL and Waveform Generators.

Example of internally generated local clock

 

Illustration of possible timing diagrams

 

The first figure illustrates the schematics of a portion of design, where an internal local clock is generated using a FF output. This creates a race condition, where the routing delays (of the data and the local clock) will impact the behavior (will be unstable over PVT variations). See the timing diagrams on second figure (case 1 and case 2). We can clearly see that the behavior will be routing dependent – and probably unstable over PVT.

 

 

 

Resynchronize the RESET signal on the clock domain

The tile Flip-Flops have a dedicated input for synchronous or asynchronous RESET.

Reset de-assertion is very critical

If the Reset signal (used as asynchronous or synchronous reset) is not synchronized on the FPGA clock, it can create setup violations on many Flip Flops during its de-assertion.

Risk of hazardous startup!!!

Timing constraints can’t help to avoid this problem

RESET distribution and associated timing diagrams

 

The RESET signal is propagated by using routing resources to the destination FFs. However, even if distributed by low skew lines, its de-assertion can be interpreted differently by the FFs, and can cause hazardous startup.

This issue can be easily overcome by resynchronizing the RESET input, using anti-metastability FFs.

Resynchronized RESET signal

 

The resynchronized RESET signal can be used as Asynchronous or Synchronous RESET.

If used as synchronous RESET, the implementation tools and the timing analyzer will control the propagation delays to ensure a predictable behavior at the specified frequency.

 

Avoid using asynchronous RESET if possible

The tile Flip-Flops have a dedicated input for synchronous or asynchronous RESET.

Asynchronous reset is glitch sensitive, while synchronous reset is part of your synchronous design, and then it’s covered by the period constraint – if generated synchronously to the clock domain. The implementation tools and the timing analyzer will control the propagation delays to ensure a predictable behavior at the specified frequency.

 

Don’t use asynchronous SET

There is no dedicated asynchronous SET input on the tile FFs. However synchronous SET can be easily and safely implemented by combining LUT + FF of the same FE (NG-MEDIUM logic cell that includes one 4-input LUT and on D Flip Flop.

However, “nxmap” synthesis tools can build the behavior of asynchronous set at the cost FF, extra logic resources mapped to LUTs in another FE and additional routing delays (uses more logic resources, and poor performance in terms of power consumption and working frequency).

 

Don’t use asynchronous initialization from a given value (signal or constant)

Asynchronous initialization from a signal value will prevent the synthesis and implementation tools from using dedicated flip-flops. Combinatorial loops can be generated. The resulting behavior can be unpredictable.

Example of source code to be avoided:

process(CLK, INIT) begin if INIT = ‘1’ then -- Asynchronous initialization DATAR <= DATA_IN; -- Assigned value is not a constant elsif rising_edge(CLK) then if ENA = ‘1’ then DATAR <= CNT; end if; end if; end process;

Instead, the following code will be prefer:

process(CLK) begin if rising_edge(CLK) then if INIT = ‘1’ then DATAR <= DATA_IN; -- Synchronous initialization elsif ENA = ‘1’ then DATAR <= CNT; end if; end if; end process;

 

Writing efficient HDL source code

The quality of the source code is the most important factor to ensure an efficient, stable and predictable design.

Whenever possible, use a simple, compact and clear writing style.

HDL synthesis is the first step of the implementation process.

If for any reason, the synthesis results are not optimized enough for your design requirements, there will be no way to change this during the subsequent mapping, place and route processes.

The HDL source code is probably your main investment for maintainability, design density, power reduction and performance optimization

  • Source code must be optimized for the targeted architecture

  • As much as possible, it must be also flexible and portable (to other architectures or synthesis tools)

  • Readability is another very important factor

 

Write a direct, simple and clear source code

  • The more compact, the more readable

  • The synthesis tools can also make a better translation to take advantage of the silicon features when the source code is compact and clear

Avoid using combinatorial processes if not necessary

Combinatorial processes can generate latches and combinatorial loops. This can led to unpredictable or unstable behavior, and have a negative impact on logic and routing resources utilization.

Be very careful if you have to write combinatorial processes.

 

Avoid declaring and using un-necessary combinatorial signals

In a synchronous design, the combinatorial signals are registered with FFs.

Generally, it’s simpler, faster and more efficient to use a single process to define the global (combinatorial logic and register) in a single clocked process.

Don’t declare un-necessary signals if those signals must be registered

  • All NG-MEDIUM configurable elements have their own Flip Flop (tile logic, BRAMs, DSP Blocks, and IOs). The synthesis automatically will recognize that the function can be implemented in the same elements (by packing combinatorial logic and the FFs into the same logic element such as Functional Elements, BRAMs or DSP blocks)

  • Reduced code size and improved readability

  • Apply this method also for state machines (you will avoid timing and implementation problems)

 

Have a look on the following VHDL source code that describes a pipelined adder-multiplier function.

signal A, B, C: std_logic_vector(15 downto 0); -- A, B and C inputs signal A_REG, B_REG, C_REG: std_logic_vector(15 downto 0); -- registered inputs signal A_PLUS_B: std_logic_vector(15 downto 0); -- combanitorial added output signal A_PLUS_B_REG: std_logic_vector(15 downto 0); -- registered adder output signal MULT: std_logic_vector(31 downto 0); -- combanitorial multiplier output signal MULT_REG: std_logic_vector(31 downto 0); -- registered multiplier output begin process(CLK) begin if rising_edge(CLK) then A_REG <= A; B_REG <= B; C_REG <= C; end if; end process; A_PLUS_B <= A_REG + B_REG; process(CLK) begin if rising_edge(CLK) then A_PLUS_B_REG <= A_PLUS_B; end if; end process; MULT <= A_PLUS_B_REG * C_REG; process(CLK) begin if rising_edge(CLK) then MULT_REG <= MULT; end if; end process;

The same behavior can be written as follows. Readability is increased.

signal A, B, C: std_logic_vector(15 downto 0); -- A, B and C inputs signal A_REG, B_REG, C_REG: std_logic_vector(15 downto 0); -- registered input signal A_PLUS_B_REG: std_logic_vector(15 downto 0); -- registered adder output signal MULT_REG: std_logic_vector(31 downto 0); -- registered multiplier output begin process(CLK) begin if rising_edge(CLK) then A_REG <= A; B_REG <= B; C_REG <= C; A_PLUS_B_REG <= A_REG + B_REG; MULT_REG <= A_PLUS_B_REG * C_REG; end if; end process;

 

At this time, we do not take in consideration the rules for signed or unsigned operations. This example is just to show that there is an easy and compact way to describe the same functionality with very few lines.

For arithmetic and/or DSP functions, see the chapter DSP blocks.

 

Use appropriate sensitivity list

Un-appropriate sensitivity list can drive to synthesis/simulation misunderstanding. The simulation behavior will not necessarily match the implementation results.

For simple clocked processes (with NO asynchronous reset or initialization), only the clock signal must be in the sensitivity list

process(CLK) begin if rising_edge(CLK) then …… -- assignments end if; end process;

 

For clocked processes with asynchronous re-initialization of FFs (not recommended), clock and reset (or any other asynchronous signal) must be in the sensitivity list

process(CLK, RST) begin if RST = ‘1’ then …… -- signal assignments to ‘0’ or (others => ‘0’) elsif rising_edge(CLK) then …… -- assignments end if; end process;

 

For combinatorial processes (not recommended) all the involved signals in the assignments must be in the sensitivity list (does not prevent generation of latches).

 

Be careful when using relational operators

  • VHDL allows to compare busses with different number of bits.

  • The VHDL relational operators =, <, >, <= and >= work in a very surprising way when the number of operands bits doesn’t match.

  • In addition, synthesis and simulation can have different interpretations, depending on the context.

Conclusion: Make sure the busses length are the same in both operands

 

Avoid using un-necessary variable in the synthesizable source code

If not strictly necessary, it’s safer and better to use signals instead of variables:

  • In synthesis, most often, variables do not have a logical representation in the synthesized netlist or in simulation.

  • Very often, using variables can make more complex the synthesizable source code – for synthesis as well as simulation) while providing no benefit

Conclusion: use signal instead of variable whenever is possible.

 

Don’t declare ports and signals as integers – if not necessary

The packages std_logic_unsigned and std_logic_signed provide direct type conversion, and additional functions. For example you can simply write :

CNT <= CNT + 1; -- CNT is a std_logic_vector
  • Additional conversion functions are provided within the std_logic_arith package

    • Conv_integer(std_logic_vector_signal)

Converts the value of the signal declared as a std_logic_vector to its equivalent integer value (according the used std_logic_unsigned or std_logic_signed package).

Conv_std_logic_vector(integer_value, std_logic_vector_number of bits)

Converts an integer value to its equivalent std_logic_vector value. The number of bits is specified as the second argument (according the used std_logic_unsigned or std_logic_signed package).

 

Conclusion: use std_logic_vector unless exception.

 

Don’t use inactive or redundant assignments

  • In VHDL, if no condition is found to assign a new value to a signal, the signal will keep its previous value

  • In a combinatorial process, assigning a condition to maintain the previous value to a signal will produce latches and/or combinatorial loops.

  • In a clocked process if a condition enables some assignments, the dedicated FFs load or clock_enable will be used

  • Don’t use the null statement if not strictly necessary

  • Leave your source code as compact as possible, taking advantage of the standard packages and the synthesis rules. In the following simple example, the lines in green do not help for anything. Instead, in some more complex cases, they could prevent the synthesis tools from optimizing the resulting netlist.

process(CLK) begin if rising_edge(CLK) then if ENA = '1' then SIG_R <= SIG_IN; -- else -- SIG_R <= SIG_R; end if; end if; end process;

Other tricks and tips for a more compact and re-usable source code

  • Use (others => ‘0’) when several bits are assigned to the same value.

    • Example for reset condition :

REG_DATA <= (others => ‘0’);

Works for any bus length

 

Example for High impedance output buffers :

if WRITE_EXT_MEM = ‘1’ then DOUT <= REG_DATA; else DOUT <= (others => ‘Z’); end if;

 

Use generic parameters whenever your module can be re-used with different parameters. This is particularly useful for memory and DSP functions. Example :

library IEEE; use IEEE.std_logic_1164.all; use IEEE.std_logic_unsigned.all. use IEEE.std_logic_arith.all; entity MY_MEMORY is generic ( Number_of_address_bits : integer := 10, Number_of_data_bits : integer := 16 ); port ( CLK : std_logic; DIN : std_logic_vector(Number_of_data_bits-1 downto 0); WE : std_logic; ADR : std_logic_vector(Number_of_address_bits-1 downto 0); DOUT : std_logic_vector(Number_of_data_bits-1 downto 0) ); End MY_MEMORY; architecture ARCHI of MY_MEMORY is type MEM_TYPE is array((2**Number_of_address_bits)-1 downto 0 of std_logic_vector(DIN’range); signal MEM : MEM_TYPE; begin process(CLK) begin if rising_edge(CLK) then if WE = '1' then MEM(conv_integer(ADR)) <= DIN; else DOUT <= MEM(conv_integer(ADR)); -- DOUT doesn’t change end if; -- during write end if; end process; end ARCHI;

 

 

This same module can be instantiated several times in your design with different configuration for each instance, by assigning individual parameters sets to each instance (Number_of_address_bits and Number_of_data_bits assigned by generic map).

 

  • Take advantage of the “std_logic_arith” and “std_logic_unsigned” or “std_logic_signed” package for a more compact and more readable source code, thanks to the implicit and explicit conversion functions.

  • Note that the package “numeric_std” is also supported by “nxmap”.

Design hierarchy

  • Must be organized in a logical way. Avoid mixing unrelated functions into the same hierarchical module. It will be easier to assign synthesis options and directives.

  • Hierarchical modules can be synthesized separately to verify the quality of results such as :

    • Used logic resources (tile logic, BRAM, DSP blocks…)

    • Timing performance for the considered module.

  • Register all the outputs of the hierarchical modules – if performance is required

 

Naming rules of entities, component labels and signals:

  • Good and clear naming rules greatly improves the readability

    • Short but clear names can ease the recognition on debug and verification tools (timing analyzer, NXmap GUI, simulation)

    • However, too short names can drive to confusion or additional difficulties to recognize some elements (signals, entities, components…)

    • Be careful with reserved words or other words commonly used (WRITE, READ, BUS, COUNT…).

during the synthesis process, the FF outputs signals are automatically renamed by the tools, by adding “_reg” to the original name defined in the source code. As an example, a signal called “DATA_CHANNEL” in the source code will be renamed “DATA_CHANNEL_reg” after synthesis if it’s generated by a FF or group of FFs. Be careful not to name any other non registered signal “DATA_CHANNEL_reg” to avoid possible post-synthesis conflicts.

  • Signal names should reflect their polarity. For example:

    • RST is an active high signal (resets the FFs when it goes high, while RST_N is active low.

    • LOAD is active high (loading occurs when LOAD is high), while LOAD_N is active low.

 

Inference vs instantiation:

Inference advantages and limitations

  • The inference describes the behavior of the functions to be synthesized and implemented, with standard HDL description. As a result, the source code is portable and can be used with other architecture or tools.

  • Synthesis and mapping options: Most high performance NG-MEDIUM cells can be inferred with a very simple source code and implemented as desired. However, by using some mapping options, the user can have a better control over the synthesis and mapping processes, providing that the described functionality matches the FPGA elements behavior. This is particularly true for RAM inference – that could be virtually implemented with RAM blocks or Register_file (RF). See the “addMappingDirective” in the NanoXplore NXmap Python API documentation for more information.

  • However, some NG-MEDIUM built-in functions cannot be inferred. This is the case for example of the PLL, WaveformGenerators using patterns, some RAM and DSP blocks configurations, high performance IOs using DDR or SERDES.

In such cases, it might be necessary to instantiate the primitives.

 

Instantiation of NG-MEDIUM primitives

NG-MEDIUM primitives can also be instantiated:

  • Register_file

  • RAM blocks

  • DSP blocks

  • PLL and WFG

Although memories (Register_file and BRAM) can be easily and efficiently inferred with simple and portable source code for most common functions, the user might prefer to instantiate the elements. The source code is not portable, but the user can get access some features that are not necessarily accessible by inference.

In addition, some primitives cannot be inferred. They can be used only by instantiation.

  • PLL most often combined with WaveForm Generator(s)

  • Dual port RAM with different bus sizes on both ports

  • Some DSP blocks configurations

  • Other primitives

See the “Library guide” for more information.

 

 

NG-MEDIUM architecture survey

Before starting your design, it’s important to acquire a proper understanding of the NG-MEDIUM architecture. From the user’s point of view, we can separate the architecture in three main blocks:

The user’s IO ring (organized in IO banks). Includes flexible single ended and/or differential IOs, DDR registers, SpaceWire compatible interfaces IO blocks, and many more features (input and output calibrated delay lines, output serializers, input de-serializers…).

There are also 4 clock generators also called CKG (PLL + WaveFormGenerators – one set in each FPGA corner of the die).

  • The FPGA core logic offers

    • Tile logic for combinatorial or registered functions, arithmetic, register_files (64 x 16-bit synchronous simple dual port memories – with EDAC)

    • Flexible 48Kbit synchronous true dual port memory blocs (including user’s selectable EDAC)

    • DSP blocs for high performance complex DSP functions

  • The FPGA configuration logic and dedicated IO interface

Please, refer to the next chapters as well as NG-MEDIUM data sheet for detailed information.

 

NG-MEDIUM inputs and outputs (IOs)

The NG-MEDIUM user’s IOs are organized in 13 IO banks. Each IO bank have a single power supply (Vddio) for all the IOs into the same bank.

The top and bottom banks are called “complex”. The complex IO banks provide more flexibility and performance than the left and right banks that are called “simple”.

All IOs can be configured as input, output or bi-directional IO.

The next figures show the IO banks location, and the numbering of the die IO blocks. For physical pin numbers on the selected package, please consult the NG-MEDIUM datasheet. Currently, NG-MEDIUM is available in three different packages:

  • LGA625 : Land-Grid array 625 pins

  • CGA625 : ceramic column-Grid array 625 pins

  • CQFP352 : ceramic Quad-flat package 352 pins

In addition another bank located to the left of the FPGA die is used for FPGA configuration. This document doesn’t cover the configuration process. Please, refer to the NG-MEDIUM datasheet for detailed information about the FPGA configuration modes and pins.

 

Available user’s IOs

 

LGA/CGA 625 packages:

All the 374 die user’s IOs are available

 

Bank

Type

I/Os

Location

Bank

Type

I/Os

 

 

 

 

 

 

 

0

Simple

22

Left

1

Simple

22

2

Complex

30

Bottom

3

Complex

30

4

Complex

30

Bottom

5

Complex

30

6

Simple

30

Right

7

Simple

30

8

Simple

30

Right

 

 

 

9

Complex

30

Top

10

Complex

30

11

Complex

30

Top

12

Complex

30

LGA/CGA 625 packages – available IOs

 

IO banks distribution and user’s IOs availability on LGA/CGA625 packages

 

CQFP 352 package:

Only 192 of the 374 die user’s IOs are available

Bank

Type

I/Os

Location

Bank

Type

I/Os

 

 

 

 

 

 

 

0

Simple

14

Left

1

Simple

12

2

Complex

30

Bottom

3

Complex

-

4

Complex

-

Bottom

5

Complex

30

6

Simple

22

Right

7

Simple

-

8

Simple

24

Right

 

 

 

9

Complex

30

Top

10

Complex

-

11

Complex

-

Top

12

Complex

30

CQFP 352 package – available IOs

 

IO banks distribution and user’s IOs availability on CQFP352 package

 

 

Simple and complex banks IO features

Each IO is composed of two different and complementary elements:

  • The IO buffers (primitives NX_IOB_x): can be used as input, output or bi-directional. Single ended I/Os have a default 10K to 40K PullUp. The IO buffers can be configured to work in a wide range of single ended and differential electrical standards (LVCMOS, SSTL, HSTL, and LVDS). The IO buffers can be instantiated or inferred. They also can be parametrized in the “nxpython” script to adapt their electrical configuration to meet the board design requirements. Among parameters :

    • Output drive

    • Slew rate (for LVCMOS outputs)

    • Turbo mode (faster LVCMOS input)

    • Optional 2K to 6K PullUp

    • Optional adjustable termination (only for complex banks – SSTL, HSTL, LVDS)

 

  • The sequential input and output elements (input, output and tri-state control FF). It includes the following elements :

    • Single flip-flop on input, output and tri-state paths

    • Optional adjustable delay lines on the input, output and tri-state paths (0 to 63 x 160 ps delay)

 

The NG-MEDIUM IO ring is segmented into 13 IO banks. The left (B0 and B1) and the right (B6, B7 and B8) IO banks offer flexible but limited features. They are called “simple” banks.

Instead, the top (B9, B10, B11 and B12) and bottom banks (B2, B3, B4 and B5) are called “complex” banks and offer more electrical and functional features.

 

The following table summarizes the main IOs features available into complex and simple IO banks.

 

Feature

Complex

Simple

Number of IOs

30

30/22

 

 

 

Power supply (Vddio)

1.8, 2.5 or 3.3v

1.8, 2.5 or 3.3v

 

 

 

Supported IO standards

LVCMOS, SSTL, HSTL, LVDS

LVCMOS, SSTL, HSTL, LVDS

 

 

 

Single DFF (in, out, tri-state)

Yes

Yes

Differential SSTL/HSTL

Yes

No

LVDS

Yes

Yes (no internal input termination)

Resistive input termination

Yes

No

Programmable input/output delay

Yes

Yes

CDC (Clock Domain Changer)

Yes

No

Shift Register

Yes

No

DDR mode

Yes

No

SpaceWire

Yes

No

Simple and complex banks IO features summary

 

Simple banks: Electrical standards and supported electrical parameters

Standard

Type

Bank supply

Drive

Speed

Special considerations

LVCMOS 3.3V

SE (*)

3.3 V

2–16 mA

100Mb/s

In NG-MEDIUM, single ended I/Os have an internal default 10K to 40K PullUp. In addition the user can active a slower value (2K to 6K) optional PullUp

Slew rate SLOW/FAST for outputs

Turbo mode for inputs (faster inputs at the cost of higher static power)

(Those electrical parameters can be set by constraints in a script file)

© NanoXplore 2025