|
INTRODUCTION
Given the growing processor-memory performance gap and the awkwardness of high
capacity DRAM chips, we believe that it is time to consider unifying logic and
DRAM. We call such a chip an "IRAM", standing for Intelligent RAM, since
most of transistors on this merged chip will be devoted to memory. The reason
to put the processor in DRAM rather than increasing the on-processor SRAM is that
DRAM is in practice approximately 20 times denser than SRAM. (The ratio is much
larger than the transistor ratio because DRAMs use 3D structures to shrink cell
size). Thus, IRAM enables a much larger amount of on-chip memory than is possible
in a conventional architecutre.
Although others have examined this issue in the past, IRAM is attractive today
for several reasons. First, the gap between the performance of processors and
DRAMs has been widening at 50% per year for 10 years, so that despite heroic efforts
by architects, compiler writers, and applications developers, many more applications
are limited by memory speed today than in the past. Second, since the actual processor
occupies only about onethird of the die ,the upcoming gigabit DRAM has enough
capacity that whole programs and data sets can fit on a single chip. In the past,
so little memory could fit onchip with the CPU that IRAMs were mainly considered
as building blocks for multiprocessors. Third, DRAM dies have grown about 50%
each generation; DRAMs are being made with more metal layers to accelerate the
longer lines of these larger chips. Also, the high speed interface of synchronous
DRAM will require fast transistors on the DRAM chip. These two DRAM trends should
make logic on DRAM closer to the speed of logic on logic fabs than in the past.
POTENTIAL
ADVANTAGES OF IRAM 1)
Higher Bandwidth. A DRAM naturally has extraordinary internal bandwidth,
essentially fetching the square root of its capacity each DRAM clock cycle; an
on-chip processor can tap that bandwidth.The potential bandwidth of the gigabit
DRAM is even greater than indicated by its logical organization. Since it is important
to keep the storage cell small, the normal solution is to limit the length of
the bit lines, typically with 256 to 512 bits per sense amp. This quadruples the
number of sense amplifiers. To save die area, each block has a small number of
I/O lines, which reduces the internal bandwidth by a factor of about 5 to 10 but
still meets the external demand. One IRAM goal is to capture a larger fraction
of the potential on-chip bandwidth. 2)
Lower Latency. To reduce latency, the wire length should be kept as short
as possible. This suggests the fewer bits per block the better. In addition, the
DRAM cells furthest away from the processor will be slower than the closest ones.
Rather than restricting the access timing to accommodate the worst case, the processor
could be designed to be aware when it is accessing "slow" or "fast"
memory. Some additional reduction in latency can be obtained simply by not multiplexing
the address as there is no reason to do so on an IRAM. Also, being on the same
chip with the DRAM, the processor avoids driving the offchip wires, potentially
turning around the data bus, and accessing an external memory controller. In summary,
the access latency of an IRAM processor does not need to be limited by the same
constraints as a standard DRAM part. Much lower latency may be obtained by intelligent
floor planning, utilizing faster circuit topologies, and redesigning the address/data
bussing schemes. The potential memory latency for random addresses of less than
30 ns is possible for a latency-oriented DRAM design on the same chip as the processor;
this is as fast as second level caches. Recall that the memory latency on the
AlphaServer 8400 is 253 ns.
These first two points suggest IRAM offers performance opportunities for two types
of applications: 1. Applications with predictable memory accesses, such as
matrix manipulations, may take advantage of the potential 50X to 100X increase
in IRAM bandwidth; and 2. Applications with unpredictable memory accesses and
very large memory "footprints", such as data bases, may take advantage
of the potential 5X to 10X decrease in IRAM latency. 3)
Energy Efficiency. Integrating a microprocessor and DRAM memory on the
same die offers the potential for improving energy consumption of the memory system.
DRAM is much denser than SRAM, which is traditionally used for on-chip memory.
Therefore, an IRAM will have many fewer external memory accesses, which consume
a great deal of energy to drive high-capacitance off-chip buses. Even on-chip
accesses will be more energy efficient, since DRAM consumes less energy than SRAM.
Finally, an IRAM has the potential for higher performance than a conventional
approach. Since higher performance for some fixed energy consumption can be translated
into equal performance at a lower amount of energy, the performance advantages
of IRAM can be translated into lower energy consumption 4)
Memory Size and Width. Another advantage of IRAM over conventional designs
is the ability to adjust both the size and width of the on-chip DRAM. Rather than
being limited by powers of 2 in length or width, as is conventional DRAM, IRAM
designers can specify exactly the number of words and their width. This flexibility
can improve the cost of IRAM solutions versus memories made from conventional
DRAMs. 5) Board
Space. Finally, IRAM may be attractive in applications where board area is
precious --such as cellular phones or portable computers--since it integrates
several chips into one.
<<back |