High Performance Computer Architecture

HIGH PERFORMANCE COMPUTER ARCHITECTURE

High Performance Computer Architecture

High Performance Computer Architecture

Introduction

Over the past few years the use of general purpose processors to run computationally intense multimedia applications has increased. In addition to multimedia-only applications, general purpose applications are starting to include multimedia features. This led us to analyze multimedia processors and digital signal processors (DSPs) in order to identify features that could be applied to general purpose processors and improve performance. The inherent nature of media-only applications, however, is strikingly unique compared to general purpose business and scientific programs. With respect to the memory subsystem, media-only applications usually do not make good use of traditional caches due to the limited amount of temporal locality and general reuse that can be extracted from them. This is primarily a result of repetitive read- and-throw-away algorithms in which large amounts of data are processed once and rarely, if ever, revisited. Such data rarely benefits from a large, flexible cache. Media processors and DSPs typically have small simple caches (if they have caches at all) that are often software-managed [1] in order to maximize their efficiency by controlling what is actually cached, and to provide deterministic behavior. In this paper we are going to discuss the main problems affecting the performance computational kernel and applications, and methods of their solutions compiler.

Hypothesis

In this paper we propose a new cache management policy based on the algorithms and access patterns found in many media processing streams. We propose a special cache side buffer where we will place data that we anticipate to have little temporal locality; this data will not be allocated in the main L1 data cache. The architecture will perform dynamic runtime analysis of loads to determine whether a load should be allocated in L1 or instead placed in storage outside L1 (the side buffer.) The implementation and careful management of the side buffer and L1 data cache will reduce overall (cache and side buffer combined) L1 cache miss rates by removing repeating, read-only accesses that would otherwise pollute the L1 cache through by evicting needed data. We believe that our configuration will perform similarly to unmodified configurations that have larger caches, but when compared to configurations equal-sized caches ours will perform better.

Before proposing a solution to the hypothetical problem of inefficient cache management given the changing behavior of general purpose applications, we first needed to establish the validity of the problem as posed. If valid, we needed to derive a solution that could not only outperform existing architectures but would scale in the environment of increasing relative wire delay. It would thus serve as a performance boost and a means to deal with the growing scaling problems computer architects currently face. We present the supposition of the problems faced and our evaluation of their scientific basis here. The main problems affecting the performance computational kernel and applications, and methods of their solutions are compilers.

Pro processors can say a lot in detail and certainly among readers. There are lots of people on such ...