(Solution) - Assume you are redesigning a hardware prefetcher for the unblocked -(2025 Original AI-Free Solution)

Discipline:

Type of Paper:

Academic Level: Undergrad. (yrs 3-4)

Paper Format: APA

Pages: 5 Words: 1375

Paper Details

Assume you are redesigning a hardware prefetcher for the unblocked matrix transposition code as in Exercise 5.7. However, in this case we evaluate a simple two-stream sequential prefetcher. If there are level 2 access slots available, this prefetcher will fetch up to 4 sequential blocks after a miss and place them in a stream buffer. Stream buffers that have empty slots obtain access to the level 2 cache on a round-robin basis. On a level 1 miss, the stream buffer that has least recently supplied data on a miss is flushed and reused for the new miss stream.
a. In the steady state of the inner loop, what is the performance (in cycles per iteration) when using a simple two-stream sequential prefetcher assuming performance is limited by prefetching?
b. What percentage of prefetches are useful given the level 2 cache parameters?