NxDesignSuite 23.5 Best Practice

Copyright

All the contents of this document are protected by the copyright law. They may not be disclosed to third parties or copied or duplicated in any form without consent of NanoXplore.

Introduction

Aim of document

This document is intended to guide users on Impulse software best practice.

The aim is to ease Impulse using and get the best recommendations in order to implement a project into NanoXplore components.

For any assistance, please contact NanoXplore support team at support@nanoxplore.com.

Content

All recommendations are divided in several categories depending on product development phase.

They are based mainly on NXpython methods but some of them can be applied in the GUI.

Design

NanoXplore primitive instantiation

All NanoXplore primitives are listed in nxLibrary-<variant_name>.vhdp available in the release archive. A documentation is also available in order to get information about all generics and io in NxDesignSuite 23.5 Library Guide .

It is recommended to only add the nxpackage in the work library and not component declarations as they are already declared in the package.

Clock management

NanoXplore FPGAs contain a low-skew network in order to spread signal with high fanout like clock, reset and load signals.

The user must be very careful about the way to spread clock through the design. It is advised to follow the following rules sorted by level of recommendation:

  1. Use a pad directly connected to the closest CKG (these pads are suffixed by _CLK). Either the user instantiates a PLL or a WFG in the design or the tool instantiates automatically a WFG in bypass mode.

  2.  

    1. MEDIUM/LARGE: Use a common pad or internal logic and use a buffer NX_BD in global_lowskew mode.

    2. ULTRA: Use a common pad or internal logic and use a NX_GCK_U in Common to System Converter CSC mode.

Please have a look at NxDesignSuite 23.5 Training Package : Application NoteUNDEFINED Design/LowskewManagement project.

PLL outputs are connected to WFG of the same CKG. User can either instantiate a WFG or let the tool instantiate a WFG in bypass mode.

WFG outputs are connected to the low-skew network.

Do not use NX_BD(global_lowskew) or NX_GCK(CSC) with connected input already in low-skew network.

In case of clock gating or clock mux, it is recommended to implement the following architecture:

  • Clock Gating:

    • MEDIUM/LARGE: Use a NX_CKS to gate a clock system signals with a command common signal.

    • ULTRA: Use a NX_GCK_U in Clock Switch CKS mode to gate a clock system signals with a command common signal.

  • Clock MUX:

    • ULTRA: Use a NX_GCK_U in Clock MUX MUX mode to switch between 2 clock system signals with a command common signal.

Please have a look at NxDesignSuite 23.5 Training Package : Application NoteUNDEFINED Component/ClockSwitch project.

Reset management

Like clocks, reset are generally spread to the whole design through the low-skew network because of high fanout too.

Reset can be asynchronous or synchronous as registers of NanoXplore FPGAs are compliant with both.

Global signals management

There are some signals with high fanout can be mapped into low-skew network and introduced some important delays. It could be the problem with synchronous signals load, set, reset, …

In order to avoid this issue, use rejectLowskew method.

p.rejectLowskew('inst1|reset_sync')

Memory initialization

There are several ways to initialize a memory (attribute, generic, python method).

It is recommended to rather use python method addMemoryInitialization described in NxDesignSuite 23.5 NxPython SpecificationUNDEFINED as it is compliant with inferred and instantiated memories.

Please have a look at NxDesignSuite 23.5 Training Package : Application NoteUNDEFINED Init/Ram project.

Memory Inference

In order to infer a memory instead of instantiating a NanoXplore primitive (advantage is the user can choose with the same RTL code to map this memory in RF, RAM, RAM_ECC, … thanks to NXpython constraints), it is recommended to follow TrainingPackage Design/MemInfer examples providing inference for ROM, SRAM, DPRAM with and without ECC.

Please have a look at NxDesignSuite 23.5 Training Package : Application NoteUNDEFINED Design/MemInfer project.

Hierarchy

It is recommended to divide the design in well sized modules depending on the function they are responsible for.

It will be very helpful to constrain the design by module then.

Logic depth

It is highly recommended to avoid large logic depth between registers as it will limit the maximum frequency of the design.

In addition, when a LUT is used, the DFF from the same Functionnal Element will be used either as a register or a buffer so adding pipeline in the design will not consume any additional instance.

Project creation

Impulse is based on Python scripts that is to consider a project as a class and all options and constraints are methods associated to this class.

The user can either create a project based on a NanoXplore template (recommended) or start it from scratch.

To use a NanoXplore template, please ask NanoXplore support team at support@nanoxplore.com.

In case of starting from scratch, it can be easier to create a project using the GUI in order to get the right basic NXpython methods.

 

The project needs the following database:

  • Top cell library and top cell name

  • All files added to associated library

  • Options and constraints

 

Whatever the project base, it is recommended to comply with the following rules:

  • Set options before any progress.

  • Set IO pads locations and parameters before any progress.

  • Save your project after each progress step.

  • Separate your project in categories (setup,synthesis, placing, routing, sta) and set your constraints at the right place.

Project check

Once the project is launched, it is necessary to check logs and reports in order to be sure there is not any issue leading to errors, undesired optimizations, …

Logs and report files analysis

Errors and warnings

First of all, check the project is successful until the required progress step.

If not, a message appears in the console. Sometimes, the message is clear enough, but it could not be. So errors.rpt and warning.rpt reports can be very useful in order to grab the issue.

Note there reports are generated only if there are messages to report to the user.

If reported messages are not clear enough to solve the issue, contact the NanoXplore support team at support@nanoxplore.com giving the archive in order to reproduce the issue (RTL sources + scripts).

Even though the project is successful, it is advised to parse these files.

Complexity

After errors and warnings are solved or understood, check hierarchy.rpt and progress.rpt in order to check the number of instances if it is relevant with the estimation.

In order to get complexity module by module without confining it in a region, it is possible to use addModule method described in NXPython User Manual.

Refer to #Floor_planning_Complexity for more details.

IO Location

Check in progress.rpt none IO has been automatically placed by the design. It is marked with an exclamation sign.

Common Errors

There are typical types of errors :

  • Oversize : Not enough instance capacity in the area

  • Overflow : Not enough routing capacity in the area

  • Blending Tile : Unable to route a path

These errors can be solved by changing the following parameters/options :

  • If modules are constrained with constrainModule method, increase area size

  • Change the Seed used for P&R

  • Change the following options: CongestionEffort, DensityEffort, RoutingEffort

STA

Once the project is mature enough, Static Timing Analysis can be done in order to check how fast the design can operate.

STA tool can be launched from any step after Preparing (Placing 1/5) with the following example method:

Timing_analysis = p.createAnalyzer() Timing_analysis.launch({'conditions': 'typical', 'maximumSlack': 500, 'searchPathsLimit': 10})

Clock creation

The input clock frequencies and other parameters must be informed to Impulse for STA.

Please refer to createClock and createGeneratedClock described in NXPython Specification .

If a PLL is used, PLL output clock frequencies are automatically computed by the software.

Special path and clock domains relationship

Declare all false paths and multi cycle paths in order to not take into account some paths between registers in some particular cases.

Please refer to addFalsePath and addMultiCyclePath in NXPython Specification .

Clock groups can be created if clock domains are completely unrelated.

Timing files analysis

When launching the STA tool, the following files are created :

  • Summary_<progress_step>_<conditions>.timing

  • Violation_<progress_step>_<conditions>.timing

  • Timing_Constraints_Report_<progress_step>_<conditions>.timing

  • DOMAIN_Input_to_<clk>_<progress_step>_<conditions>.timing

  • DOMAIN_<clk1>_to_<clk2>_<progress_step>_<conditions>.timing

  • DOMAIN_<clk>_to_Output_<progress_step>_<conditions>.timing

Use the first 3 timing files in order to get an overview of the project and other ones to analyze paths.

Optimization

Mapping

Operator

Operators are divided in 3 categories :

  • Adder (ADD)

  • LessThan (LTN)

  • Multiplier (MUL)

It is possible to map this operators in LUT, Carry or DSP using addMappingDirective method described in NXPython Specification .

By default, adders are mapped into Carry and Multipliers in DSP. But it can sometimes be interesting to modify default mapping directives.

Check #Mapping_Directive_Operator for information about constraint setting.

Memory

Memories can be mapped into logic elements (LUT/DFF), register files (RF), Memory Blocks (RAM) or Memory Blocks protected by EDAC correction using addMappingDirective method described in NXPython Specification .

By default, small memories (equal or less than 64x16) will be mapped into RF and bigger ones mapped into RAM. But it can sometimes be interesting to modify default mapping directives.

It is highly recommended to size the design according to available depths and heights. Otherwise, too many memories will be instantiated and will affect the routing and STA performances.

In case of memory with reset assertion, the memory has to be mapped into DFF as RF or RAM does not have a reset input pin.

Check #Mapping_Directive_Memory for information about constraint setting.

Placing

Placing Constraint

In order to improve the maximum clock frequency for each clock domain, it is advised to follow the following steps:

It can also have a very positive impact to create unitary projects and reuse the routed projects as a blackbox in your final top project using addBlackBox described in NXPython Specification .

 

It is also possible to place manually instances in a specified using the following NXpython methods all described in NXPython Specification :

Placing Option

Some options impact the placing and consequently the routing. In the end, STA performances are directly affected. The following options are concerned:

  • DensityEffort

  • CongestionEffort

  • PolishingEffort

Increasing efforts may optimize placing for STA.

Check NXPython Specification for more details about these options.

Design Complexity

The logic depth of a design must be controlled. the method reportDesignComplexity and reportHierarchyComplexity are very helpful to have an overview.

Please have a look at NXPython Specification reportDesignComplexity and reportHierarchyComplexity methods.

Complexity

In order to get complexity for each module of the design, check the detailed hierarchy of the design.

Complexity is given for each module with the sum of the complexity of each sub-module and the module itself.

For instance:

08:50:49:info | -------------------------------------------- 08:50:49:info | - Detailed Hierarchy Statistics - 08:50:49:info | -------------------------------------------- 08:50:49:info | ~ hierarchical 08:50:49:info | Resources: 08:50:49:info | NX_LUT : 13 08:50:49:info | NX_DFF : 73 08:50:49:info | NX_IOB : 51 08:50:49:info | NX_BFR : 99 08:50:49:info | NX_WFG : 3 08:50:49:info | ~ |-> row_col_pipe(X212B9C19) [ GEN_HIER0 ] 08:50:49:info | |-> Resources: 08:50:49:info | |-> NX_LUT : 3 08:50:49:info | |-> NX_DFF : 23 08:50:49:info | GEN_HIER0_ROW-0 | |-> timing_pipe(X2A98C8C6) [ GEN_HIER0|GEN_ROW[0].ROW_PIPE ] 08:50:49:info | | |-> Resources: 08:50:49:info | | |-> NX_DFF : 5 08:50:49:info | ~ | |-> timing_pipe(X2A98C8C6) [ GEN_HIER0|GEN_ROW[1].ROW_PIPE ] 08:50:49:info | | |-> Resources: 08:50:49:info | | |-> NX_DFF : 5

 

How to use NXpython constraints methods

Mapping directive

Operator

In order to map an operator into LUT, CY or DSP, follow the following steps:

  • Launch your design for the first time without constraint.

  • Grab the operator model or instance in operators.rpt report. For instance, “ | Operator 'add_3u_3u' | : add_L25 (line 25 in …, model name is “add_3u_3u” and instance name is “add_L25”.

  • Add the constraint specifying instance to map the operator, for instance p.addMappingDirective('getModels(add_3u_3u)','ADD','DSP')” or p.addMappingDirective('getInstances(add_L25)','ADD','DSP'), and relaunch the project.

  • Check in operators.rpt report the constraint matched with the desired instance.

Please have a look at NxDesignSuite 23.5 Training Package : Application NoteUNDEFINED MappingDirective/Operator project.

Memory

In order to map an operator into FE, RF, RAM or RAM_ECC, follow the following steps:

  • Launch your design for the first time without constraint.

  • Grab the memory model in memories.rpt report. For instance, “ | Ram 'RAM_s_mem' Analysis:, model name is “RAM_s_mem”.

  • Add the constraint specifying instance to map the operator, for instance p.addMappingDirective('getModels(RAM_s_mem)','RAM','RAM_ECC')”, and relaunch the project.

  • Check in memories.rpt report the constraint matched with the desired instance.

Please have a look at NxDesignSuite 23.5 Training Package : Application NoteUNDEFINED MappingDirective/Memory project.

Instance placing

DSP placing

In order to place manually an inferred DSP in a CGB spot, follow the following steps:

  • Launch your design for the first time without constraint.

  • Grab the DSP name in operators.rpt report. For instance, “in line : mult_L28 (line 28 in […]”, name is “mult_L28”.

  • Add the constraint specifying DSP spot (CGB coordinates and L or R respectively for Left and Right), for instance “p.addDSPLocation('DSP_mult_L28','CGB[8x8]:R')”, and relaunch the project.

  • Check in preplaced.rpt report the constraint matched with the desired instance.

Please have a look at NxDesignSuite 23.5 Training Package : Application NoteUNDEFINED PlacingConstraint/DspLocation project.

RAM placing

In order to place manually an inferred RAM in a CGB spot, follow the following steps:

  • Launch your design for the first time without constraint.

  • Grab the RAM name in memories.rpt report. For instance,“ | RAM Generation for g_loop[0].i_RAM_example|s_mem”, name is “g_loop[0].i_RAM_example|s_mem|ram0_0_0_0”.

  • Add the constraint specifying RAM spot (CGB coordinates), for instance “p.addRAMLocation('g_loop[0].i_RAM_example|s_mem|ram0_0_0_0','CGB[8x8]')”, and relaunch the project.

  • Check in preplaced.rpt report the constraint matched with the desired instance.

Please have a look at NxDesignSuite 23.5 Training Package : Application NoteUNDEFINED PlacingConstraint/RamLocation project.

Ring placing

In order to place manually an automatically created WFG in a CKG spot, follow the following steps:

  • Launch your design for the first time without constraint.

  • Grab the RAM name in lowskew.rpt report. For instance, “rg~clk_divp2 from instance wfg_B_clk_divp2”, name is “wfg_B_clk_divp2”.

  • Add the constraint specifying RAM spot (CGB coordinates), for instance “p.addRingLocation(' wfg_B_clk_divp2','CKG3.WFG_C1')”, and relaunch the project.

  • Check in preplaced.rpt report the constraint matched with the desired instance.

Please have a look at NxDesignSuite 23.5 Training Package : Application NoteUNDEFINED PlacingConstraint/RingLocation project.

Tile placing

In order to place manually an inferred TILE instance like DFF, LUT or CY, follow the following steps:

  • Launch your design for the first time without constraint.

  • Grab the register name in RegisterSummary.rpt report or in timing files. For instance, i_cpt_0|s_cpt_out_reg[5]”.

  • Add the constraint specifying TILE spot (TILE coordinates), for instance “p.setSite('i_cpt_0|s_cpt_out_reg[0]','TILE[2x2]'”, and relaunch the project.

  • Check in preplaced.rpt report the constraint matched with the desired instance.

Please have a look at NxDesignSuite 23.5 Training Package : Application NoteUNDEFINED PlacingConstraint/Site project.

Floor planning

Constrain Module

In order to confine a module (component of the design hierarchy) in a region containing TILE, CGB and MESH, follow the following steps:

  • Launch your design for the first time without constraint.

  • Grab the module name in hierarchy.rpt report. For instance, “| ~ |-> row_col_pipe(X212B9C19) [ GEN_HIER0 ]”, name is “|-> row_col_pipe(X212B9C19) [ GEN_HIER0 ]”.

  • Add the constraint specifying area coordinates, for instance “p.constrainModule('|-> row_col_pipe(X212B9C19) [ GEN_HIER0 ]','GEN_HIER0_ROW_M','Soft',9,6,2,3,'GEN_HIER0_ROW_R',False)”, and relaunch the project.

  • Check in hierarchy.rpt report the constraint matched with the desired instance.

Please have a look at NxDesignSuite 23.5 Training Package : Application NoteUNDEFINED PlacingConstraint/Region project.

Constrain path between registers

In order to confine all instances in a path between 2 registers in a region containing TILE, CGB and MESH, follow the following steps:

  • Launch your design for the first time without constraint.

  • Grab the source and target register in DOMAIN_<clk1>_to_<clk2>_<progress_step>_<conditions>.timing. For instance, “module0|submodule0|pipe_reg[0].CK” and “module0|submodule0|pipe_reg[1].CK” , names are “module0|submodule0|pipe_reg[0]” and “module0|submodule0|pipe_reg[1]”.

  • Add the constraint specifying area coordinates, for instance “p.constrainPath('|-> row_col_pipe(X212B9C19) [ GEN_HIER0 ]','PIPE_REG_M','Soft',9,6,2,3,'PIPE_REG_R',False)”, and relaunch the project.

  • Check in hierarchy.rpt report the constraint matched with the desired instance.

Please have a look at NxDesignSuite 23.5 Training Package : Application NoteUNDEFINED PlacingConstraint/ConstrainPath project.

Preplace IP

In order to preplace a macro IP of a global design in order to reach design specifications (delays, maximum frequencies, …) in an area of the chip before integrating it in the global design, follow the following steps:

  • Define the macro IP as the top cell.

  • Define a minimum aperture and all needed constraints as it was a global project to reach specifications. Save the project file after routed steps.

  • Run the project until Placing 3/5 step and use saveIP method to save the preplaced IP.

  • Do not declare the macro IP entity file in the global project. Instead, add the macro IP as a blackbox, specifying coordinates of the top left corner of the macro IP aperture in the global project, for instance p.addBlackBox('switch_counter',IP','../switch_counter_preplaced.json','g_inst.i_switch_counter_0:1x8').

  • Check in log the constraint matched.

Please have a look at NxDesignSuite 23.5 Training Package : Application NoteUNDEFINED PlacingConstraint/Preplace project.

STA constraints

Clock declaration

In order to declare a clock in your project in order to get required frequencies in logs, follow the following steps:

  • Launch your design for the first time without constraint.

  • Grab the clock name in lowskew.rpt. For instance, “rg~clk_divp2 from instance wfg_B_clk_divp2”, name is “rg~clk_divp2”.

    • Add the constraint specifying clock parameters, for instance “p.createClock('getClockNet(g~clk_divp2)','clk_div_p2',20000,5000,15000)”, and relaunch the project.

  • Check in Summary_<progress_step>_<conditions>.timing clock is renamed and required frequency is now mentioned.

Please have a look at NxDesignSuite 23.5 Training Package : Application NoteUNDEFINED StaConstraint/GeneratedClock project.

Generated clock declaration

In order to declare a clock in your project in order to get required frequencies in logs, follow the following steps:

  • Grab the clock name getting the hierarchy path to the generated clock.

    • Add the constraint specifying clock parameters, for instance “p.createGeneratedClock(getClock('clk_main'),getRegisterClock('i_clock_0|counter_reg[0]'), 'clk_fabric',{'DivideBy': 2})”.

  • Check timing files are created for this new clock domain.

Please have a look at NxDesignSuite 23.5 Training Package : Application NoteUNDEFINED StaConstraint/GeneratedClock project.

Constrain path between registers

In order to declare a path between 2 registers as a false path, multi-cycle path, min or max delay path, follow the following steps:

  • Launch your design for the first time without constraint.

  • Grab the source and target register in DOMAIN_<clk1>_to_<clk2>_<progress_step>_<conditions>.timing. For instance, “module0|submodule0|pipe_reg[0].CK” and “module0|submodule0|pipe_reg[1].CK” , names are “module0|submodule0|pipe_reg[0]” and “module0|submodule0|pipe_reg[1]”.

  • Add the constraint specifying area coordinates, for instance “p.addFalsePath('getRegisters(module0|submodule0|pipe_reg[0])','getRegisters(module0|submodule0|pipe_reg[0])')”, and relaunch the project.

  • Check in DOMAIN_<clk1>_to_<clk2>_<progress_step>_<conditions>.timingt report the path no longer appears.

Please have a look at NxDesignSuite 23.5 Training Package : Application NoteUNDEFINED StaConstraint/FalsePath project.

How to improve STA results

STA tool

It is recommended to launch the STA tool after Placing 1/5 step named “Preparing”. It allows to witness the number of logic elements crossed and check if it is possible to reach performances when the margin in the most optimistic scenario is high enough.

ClockCreation

Before launching STA tool, all constraints must be defined.

The user must define:

TimingDriven

TimingDriven option can be set with the following constraint:

In addition, the number of iterations in a row to achieve timing goals can be modified through TimingEffort option with the example following constraint:

Analysis conditions must be set before launching the tool with the example following constraint:

The number of iterations complies with the following association:

  • Low: TimingDriven is disabled

  • Medium: 3 iterations

  • High: 6 iterations

The higher the number of iterations is, the higher STA performances can be. However, higher is the runtime.

Floor planning

In order to give indications to the tool, the user can set up a floor-planning and pass it through the tool working directly on the whole design or working with unitary runs before integrating in the top design as specified in #Floor_planning_Preplace_ip.

First of all, ConstrainModule method as explained in #Floor_planning_Constrain_module must be applied in order to concentrate parts of the design and place them close to other ones if there are interconnections between them.

In addition, ConstraintPath method as explained in #Floor_planning_Constrain_path_between_registers must be applied if some paths go trough multiple modules or for critical paths.

Logic depth reducing

The highest the required frequency is, the lowest the number of combinatorial elements between 2 registers must be.

It can be iterative to control the logic depth of a design by using the method explained in #Design_complexity and adapt the code if the required frequency cannot be achieved.

© NanoXplore 2022