Tuesday, February 3, 2009
Monday, January 26, 2009
Low power and FPGAs
Source
FPGA designer has just a few knobs to turn. There’s the choice of programming element e.g. Actel, whose Flash programming cell can have considerably lower operating power than an SRAM-based cell.
Dynamic power is a different issue. “The big culprit in FPGAs is the clocks
Another important issue for FPGA dynamic power is, perhaps surprisingly, inrush current. “Battery life is non-linear in current,”
“So large current spikes reduce battery energy disproportionately. You must eliminate them.” That means not only preventing inrush current on power-up, but in all modes of the chip avoiding hazardous transition states that could result in spikes on the supply busses
FPGA designer has just a few knobs to turn. There’s the choice of programming element e.g. Actel, whose Flash programming cell can have considerably lower operating power than an SRAM-based cell.
Dynamic power is a different issue. “The big culprit in FPGAs is the clocks
Another important issue for FPGA dynamic power is, perhaps surprisingly, inrush current. “Battery life is non-linear in current,”
“So large current spikes reduce battery energy disproportionately. You must eliminate them.” That means not only preventing inrush current on power-up, but in all modes of the chip avoiding hazardous transition states that could result in spikes on the supply busses
very high speed FPGAs
Source
At heart of the FPGAs is a fabric of asynchronous logic. But the fabric has been designed so that its asynchronous nature is invisible to designers and to front-end synthesis tools.
..
The tale starts with one of the less popular approaches to asynchronous logic: two-wire signaling with a separate acknowledge wire, also known as three-wire asynchronous logic
..
Each LUT, by waiting until its inputs are ready and holding its output until its clients have acknowledged, in effect acts as a self-timed latch.
..
The fabric implements a logic design by turning each logic level into a pipeline stage.
..
All of the asynchronous signals are local to the logic fabric, and cannot be accessed by users. The fabric is surrounded by a fully synchronous ring of registers which—with a large helping of secret sauce—resynchronize the stuff going on inside the fabric so that at the pins, the device appears to be a fully synchronous,
..
You enter your RTL, synthesize it, and map it onto familiar-looking 4-LUT logic elements. The Achronix back-end tool, in effect, pipelines all of your logic into pipes with 1-gate-delay stages, and turns the clock up accordingly. And you get a fast, apparently fully synchronous FPGA design.
..
Because the heart of these FPGAs is self-timed, there are no huge clock networks running through the logic fabric. In fact there are no clocks in the logic fabric at all—they are all in the synchronization ring that surrounds the fabric. That means that the chip does not have the huge power dissipation—both dynamic and static--in clock networks that conventional FPGAs must have. And it does not exhibit the huge supply current spikes characteristic of any large synchronous design.
..
Because the majority of the circuits in the chip are self-timed, you don't have all the logic transitions in a clock domain happening at the same time, on each clock edge. The transitions are smeared out over time. Looking at a trace of supply current vs. time, you simply don't see the huge current spikes aligned with clock edges that so drive FPGA designers mad worrying about decap insertion, instantaneous IR drop, signal integrity and electromigration
At heart of the FPGAs is a fabric of asynchronous logic. But the fabric has been designed so that its asynchronous nature is invisible to designers and to front-end synthesis tools.
..
The tale starts with one of the less popular approaches to asynchronous logic: two-wire signaling with a separate acknowledge wire, also known as three-wire asynchronous logic
..
Each LUT, by waiting until its inputs are ready and holding its output until its clients have acknowledged, in effect acts as a self-timed latch.
..
The fabric implements a logic design by turning each logic level into a pipeline stage.
..
All of the asynchronous signals are local to the logic fabric, and cannot be accessed by users. The fabric is surrounded by a fully synchronous ring of registers which—with a large helping of secret sauce—resynchronize the stuff going on inside the fabric so that at the pins, the device appears to be a fully synchronous,
..
You enter your RTL, synthesize it, and map it onto familiar-looking 4-LUT logic elements. The Achronix back-end tool, in effect, pipelines all of your logic into pipes with 1-gate-delay stages, and turns the clock up accordingly. And you get a fast, apparently fully synchronous FPGA design.
..
Because the heart of these FPGAs is self-timed, there are no huge clock networks running through the logic fabric. In fact there are no clocks in the logic fabric at all—they are all in the synchronization ring that surrounds the fabric. That means that the chip does not have the huge power dissipation—both dynamic and static--in clock networks that conventional FPGAs must have. And it does not exhibit the huge supply current spikes characteristic of any large synchronous design.
..
Because the majority of the circuits in the chip are self-timed, you don't have all the logic transitions in a clock domain happening at the same time, on each clock edge. The transitions are smeared out over time. Looking at a trace of supply current vs. time, you simply don't see the huge current spikes aligned with clock edges that so drive FPGA designers mad worrying about decap insertion, instantaneous IR drop, signal integrity and electromigration
Monday, January 19, 2009
programming languages : C,RTL, etc
Languages often constrain our thinking.
As the programming folks are running into a brick wall trying to parallelize C programs, hardware architects are running into a brick wall trying to capture design specifications and reduce them to synthesizable code. We can do the job just fine if the algorithm happens to be an instance of something RTL code naturally describes—that is, a data path or a synchronous state machine. Otherwise, we are in for a difficult, manual translation process with no guarantee of a positive result
And just as many programmers tend to think of algorithms in terms of C programs, many architects and designers tend to think of algorithms in terms of a sequence of synchronous registers separated by clouds of asynchronous logic. We've forgotten—if we ever knew—that such structures are one set of constructs that was once particularly useful to the computer industry, not a general description of the class of all algorithms. And that is severely hampering our attempts to elevate design to a level of abstraction above RTL. The problem is too hard because the implementation language is too specific, or perhaps because it is specific to the wrong constructs. It's an interesting view of the issue
original source : here
As the programming folks are running into a brick wall trying to parallelize C programs, hardware architects are running into a brick wall trying to capture design specifications and reduce them to synthesizable code. We can do the job just fine if the algorithm happens to be an instance of something RTL code naturally describes—that is, a data path or a synchronous state machine. Otherwise, we are in for a difficult, manual translation process with no guarantee of a positive result
And just as many programmers tend to think of algorithms in terms of C programs, many architects and designers tend to think of algorithms in terms of a sequence of synchronous registers separated by clouds of asynchronous logic. We've forgotten—if we ever knew—that such structures are one set of constructs that was once particularly useful to the computer industry, not a general description of the class of all algorithms. And that is severely hampering our attempts to elevate design to a level of abstraction above RTL. The problem is too hard because the implementation language is too specific, or perhaps because it is specific to the wrong constructs. It's an interesting view of the issue
original source : here
Subscribe to:
Posts (Atom)