Sorry, but try again.
The trace cache is not really definable in size, as its size is measured in micro-ops, 12000 to be exact. Here is my detailed explanation of the P4 architecture....
http://www.sysopt.com/articles/p4/index.html
Page three discusses the cache architecture.
Quote:
Instead of measuring with the more traditional Kilobyte size rating, Intel has rated the trace cache with the ability to buffer 12,000 micro-operations. To simplify terminology, a micro-operation can be thought of as decoded data ready to be processed by the core execution stage. For most processing functions the average x86 operation is approximately 3.5 bytes large and requires an average of 2 micro-operations for decoding. Let's do a little math: 3.5 bytes * 12,000 / 2 micro-ops = 21 Kilobytes (approximately) |
The trace cache is 12 K µOP, 8-way, with 6 µOPs/Line. Microcode is inserted both into and after the trace cache, the built traces span accross taken branches, and SMC on 4 KB granularity flushes the entire trace cache. The data cache is 8 KB, 4-way, with 64 Byte/Line and 1 Line/Sector.
The P4 is unique in that the data prefetch mechanism fills the L2 cache, not the L1 data cache, as the trace cache already provides superb efficiency. The trace cache BTB is 512 entries, thus further improving branch prediction and overall cache efficiency.
Robert Richmond
PS: And you guys thought I just set around watching anime all day long.