Given the growing processor-memory performance gap and the awkwardness of high capacity DRAM chips, we believe that it is time to consider unifying logic and DRAM. We call such a chip an "IRAM", standing for Intelligent RAM, since most of transistors on this merged chip will be devoted to memory. The reason to put the processor in DRAM rather than increasing the on-processor SRAM is that DRAM is in practice approximately 20 times denser than SRAM. (The ratio is much larger than the transistor ratio because DRAMs use 3D structures to shrink cell size). Thus, IRAM enables a much larger amount of on-chip memory than is possible in a conventional architecutre.
Although others have examined this issue in the past, IRAM is attractive today for several reasons. First, the gap between the performance of processors and DRAMs has been widening at 50% per year for 10 years, so that despite heroic efforts by architects, compiler writers, and applications developers, many more applications are limited by memory speed today than in the past. Second, since the actual processor occupies only about onethird of the die ,the upcoming gigabit DRAM has enough capacity that whole programs and data sets can fit on a single chip. In the past, so little memory could fit onchip with the CPU that IRAMs were mainly considered as building blocks for multiprocessors. Third, DRAM dies have grown about 50% each generation; DRAMs are being made with more metal layers to accelerate the longer lines of these larger chips. Also, the high speed interface of synchronous DRAM will require fast transistors on the DRAM chip. These two DRAM trends should make logic on DRAM closer to the speed of logic on logic fabs than in the past.
POTENTIAL ADVANTAGES OF IRAM
1) Higher Bandwidth. A DRAM naturally has extraordinary internal bandwidth, essentially fetching the square root of its capacity each DRAM clock cycle; an on-chip processor can tap that bandwidth.The potential bandwidth of the gigabit DRAM is even greater than indicated by its logical organization. Since it is important to keep the storage cell small, the normal solution is to limit the length of the bit lines, typically with 256 to 512 bits per sense amp. This quadruples the number of sense amplifiers. To save die area, each block has a small number of I/O lines, which reduces the internal bandwidth by a factor of about 5 to 10 but still meets the external demand. One IRAM goal is to capture a larger fraction of the potential on-chip bandwidth.
2) Lower Latency. To reduce latency, the wire length should be kept as short as possible. This suggests the fewer bits per block the better. In addition, the DRAM cells furthest away from the processor will be slower than the closest ones. Rather than restricting the access timing to accommodate the worst case, the processor could be designed to be aware when it is accessing "slow" or "fast" memory. Some additional reduction in latency can be obtained simply by not multiplexing the address as there is no reason to do so on an IRAM. Also, being on the same chip with the DRAM, the processor avoids driving the offchip wires, potentially turning around the data bus, and accessing an external memory controller. In summary, the access latency of an IRAM processor does not need to be limited by the same constraints as a standard DRAM part. Much lower latency may be obtained by intelligent floor planning, utilizing faster circuit topologies, and redesigning the address/data bussing schemes. The potential memory latency for random addresses of less than 30 ns is possible for a latency-oriented DRAM design on the same chip as the processor; this is as fast as second level caches. Recall that the memory latency on the AlphaServer 8400 is 253 ns.
These first two points suggest IRAM offers performance opportunities for two types of applications:
1. Applications with predictable memory accesses, such as matrix manipulations, may take advantage of the potential 50X to 100X increase in IRAM bandwidth; and
2. Applications with unpredictable memory accesses and very large memory "footprints", such as data bases, may take advantage of the potential 5X to 10X decrease in IRAM latency.
3) Energy Efficiency. Integrating a microprocessor and DRAM memory on the same die offers the potential for improving energy consumption of the memory system. DRAM is much denser than SRAM, which is traditionally used for on-chip memory. Therefore, an IRAM will have many fewer external memory accesses, which consume a great deal of energy to drive high-capacitance off-chip buses. Even on-chip accesses will be more energy efficient, since DRAM consumes less energy than SRAM. Finally, an IRAM has the potential for higher performance than a conventional approach. Since higher performance for some fixed energy consumption can be translated into equal performance at a lower amount of energy, the performance advantages of IRAM can be translated into lower energy consumption
4) Memory Size and Width. Another advantage of IRAM over conventional designs is the ability to adjust both the size and width of the on-chip DRAM. Rather than being limited by powers of 2 in length or width, as is conventional DRAM, IRAM designers can specify exactly the number of words and their width. This flexibility can improve the cost of IRAM solutions versus memories made from conventional DRAMs.
5) Board Space. Finally, IRAM may be attractive in applications where board area is precious --such as cellular phones or portable computers--since it integrates several chips into one.