Published on Aug 15, 2016
For many signal processing applications programmability and efficiency is desired. With current technology either programmability or efficiency is achievable, not both. Conventionally ASIC's are being used where highly efficient systems are desired.
The problem with ASIC is that once programmed it cannot be enhanced or changed, we have to get a new ASIC for each modification. Other option is microprocessor based or dsp based applications.
These can provide either programmability or efficiency. Now with stream processors we can achieve both simultaneously. A comparison of efficiency and programmability of Stream processors and other techniques are done. We will look into how efficiency and programmability is achieved in a stream processor. Also we will examine the challenges faced by stream processor architecture.
The complex modern signal and image processing applications requires hundreds of GOPS (giga, or billions, of operations per second) with a power budget of a few watts, an efficiency of about 100 GOPS/W (GOPS per watt), or 10 pJ/op (Pico Joules per operation). To meet this requirement current media processing applications use ASICs that are tailor made for a particular application.
Such processors require significant design efforts and are difficult to change when a new media processing application or algorithm evolve. The other alternative to meet the changing needs is to go for a dsp or microprocessor, which are highly flexible. But these do not provide the high efficiency needed by the application. Stream processors provide a solution to this problem by giving efficiency and programmability simultaneously.
They achieve this by expressing the signal processing problems as signal flow graphs with streams flowing between computational kernels. Stream processors have efficiency comparable to ASICs (200 GOPS/W), while being programmable in a high-level language.
Many signal processing applications require both efficiency and programmability. The complexity of modern media processing, including 3D graphics, image compression, and signal processing, requires tens to hundreds of billions of computations per second. To achieve these computation rates, current media processors use special-purpose architectures tailored to one specific application.
Such processors require significant design effort and are thus difficult to change as media-processing applications and algorithms evolve. Digital television, surveillance video processing, automated optical inspection, and mobile cameras, camcorders, and 3G cellular handsets have similar needs. The demand for flexibility in media processing motivates the use of programmable processors. However, very large-scale integration constraints limit the performance of traditional programmable architectures. In modern VLSI technology, computation is relatively cheap - thousands of arithmetic logic units that operate at multi gigahertz rates can fit on a modestly sized 1 cm 2 die.
The problem is that delivering instructions and data to those ALUs is prohibitively expensive. For example, only 6.5 percent of the Itanium 2 die is devoted to the 12 integer and two floating-point ALUs and their register files; communication, control, and storage overhead consume the remaining die area. In contrast, the more efficient communication and control structures of a special purpose graphics chip, such as the NVIDIA GeForce4, enable the use of many hundreds of floating-point and integer ALUs to render 3D images.
Conventional signal processing solutions can provide high efficiency or programmability, but are unable to provide both at the same time. In applications that demand efficiency, a hardwired application-specific processor-ASIC (application-specific integrated circuit) or ASSP (application-specific standard part)-has an efficiency of 50 to 500 GOPS/W, but offers little if any flexibility.
At the other extreme, microprocessors and DSPs (digital signal processors) are completely programmable but have efficiencies of less than 10 GOPS/W. DSP (digital signal processor) arrays and FPGAs (field-programmable gate arrays) offer higher performance than individual DSPs, but have roughly the same efficiency. Moreover, these solutions are difficult to program-requiring parallelization, partitioning, and, for FPGAs, hardware design. Applications today must choose between efficiency and programmability.