Emulating a large memory with a collection of small ones

Published 3 Oct 2012 in cs.AR and cs.DC | (1210.1158v3)

Abstract: Sequential computation is well understood but does not scale well with current technology. Within the next decade, systems will contain large numbers of processors with potentially thousands of processors per chip. Despite this, many computational problems exhibit little or no parallelism and many existing formulations are sequential. It is therefore essential that highly-parallel architectures can support sequential computation by emulating large memories with collections of smaller ones, thus supporting efficient execution of sequential programs or sequential components of parallel programs. This paper demonstrates that a realistic parallel architecture with scalable low-latency communications can execute large-memory sequential programs with a factor of only 2 to 3 slowdown, when compared to a conventional sequential architecture. This overhead seems an acceptable price to pay to be able to switch between executing highly-parallel programs and sequential programs with large memory requirements. Efficient emulation of large memories could therefore facilitate a transition from sequential machines by allowing existing programs to be compiled directly to a highly-parallel architecture and then for their performance to be improved by exploiting parallelism in memory accesses and computation.