PNX1300/01/02/11 Data Book
Philips Semiconductors
5-4
PRELIMINARY SPECIFICATION
5.3.3
Miss Processing Order
When a miss occurs, the data cache fills the block con-
taining the requested word from the critical word first.
The CPU is stalled until the first word is transferred. The
block is then filled up while the CPU keeps running.
5.3.4
Replacement Policies, Coherency
The cache implements a copyback replacement policy
with one dirty bit per 64-byte block. Thus, when a miss
occurs and the block selected for replacement has its
dirty bit set, the dirty block must be written to main mem-
ory to preserve its modified contents. On PNX1300, the
dirty block is written to memory before the needed block
is fetched.
Coherency is not maintained in any way by hardware be-
tween the data cache, the instruction cache, and main
memory. Special operations are available to implement
cache coherency in software. See
Section 5.6,
“
Cache
Coherency,
”
for a discussion of coherency issues.
Write misses are handled with an allocate-on-write poli-
cy
—
the write that caused the miss stores its data in the
cache after the missing block is fetched into the cache.
The cache implements a hierarchical LRU replacement
algorithm to determine which of the eight elements
(blocks) in a set is replaced. The algorithm partitions the
eight set elements into four groups, each group with two
elements. The hierarchical LRU replacement victim is
determined by selecting the least-recently used group of
two elements and then selecting the least-recently used
element in that group. This hierarchical algorithm yields
performance close to full LRU but is simpler to imple-
ment.
See
Section 5.5,
“
LRU Algorithm,
”
for a full discussion of
the LRU algorithm.
5.3.5
Alignment, Partial-Word Transfers,
Endian-ness
The cache implements 32-bit word, 16-bit half-word, and
8-bit byte transfers. All transfers, however, must be to
addresses that are naturally aligned; that is, 32-bit words
must be aligned on 32-bit boundaries, and 16-bit half-
words must be aligned on 16-bit boundaries.
Like other PNX1300 processing units, the CPU has the
capability to use either big- or little-endian byte order. It
is recommended that all units and the CPU run with the
same endian-ness. Detailed endian-ness description
can be found in
Appendix C,
“
Endian-ness.
”
5.3.6
Dual Ports
To allow two accesses to proceed in parallel, the data
cache is quasi-dual ported. The cache is implemented as
eight banks of single-ported memory, but the hardware
allows each bank to operate independently. Thus, when
the addresses of two simultaneous accesses select two
different banks, both accesses can complete simulta-
neously. Bank selection is determined by the three low-
order address bits [4..2] of each address. Thus, the
words in a 64-byte cache block are distributed among the
eight blocks, which prevents conflicts between two simul-
taneously issued accesses to adjacent words in a cache
block. The PNX1300 compiling system attempts to avoid
bank conflicts as much as possible.
The dual-ported cache can execute the load and store
opcodes (ild8d, uld8d, ild16d, uld16d, ld32d, h_st8d,
h_st16d, h_st32d, ild8r, uld8r, ild16r, uld16r, ld32r,
ild16x, uld16x, ld32x) in either or both of the two ports.
The special opcodes alloc, dcb, dinvalid, pref, rdtag and
rdstatus can only be executed in the second port, not in
the first port. Whenever any of these special opcodes is
issued in the second port, there should not be a concur-
rent load or store operation in the first. This is a special
scheduling constraint.
5.3.7
Cache Locking
The data cache allows the contents of up to one-half of
its blocks to be locked. Thus, on PNX1300, up to 8 KB of
the cache can be used as a high-speed local data mem-
ory. Only four out of eight blocks in any set can be
locked.
A locked block is never chosen as a victim by the re-
placement algorithm; its contents remain undisturbed un-
til either (1) the block
’
s locked status is changed explicitly
by software, or (2) a dinvalid operation is executed that
targets the locked block.
Cache locking occurs only for the data in the address
range
described
by
DC_LOCK_ADDR and DC_LOCK_SIZE. The granulari-
ty of the address range is one 64-byte cache block. The
MMIO register DC_LOCK_CTL contains the cache-lock-
ing enable bit DC_LOCK_ENABLE.
Figure 5-5
shows
the layout of the data-cache lock registers. Locking will
occur for an address if locking is enabled and both of the
following are true:
1. The address is greater than or equal to the value in
DC_LOCK_ADDR.
2. The address is less than the sum of the values in
DC_LOCK_ADDR and DC_LOCK_SIZE.
Programmers (or compilers) must combine all data that
needs to be locked into this single linear address range.
Setting DC_LOCK_ENABLE to
‘
1
’
causes the following
sequence of events:
1. All blocks that are in cache locations that will be used
for locking are copied back to main memory (if they
are dirty) and removed from the cache.
2. All blocks in the lock range are fetched from main
memory into the cache. If any block in the lock range
was already in the cache, it
’
s first copied back into
main memory (if it
’
s dirty) and invalidated.
3. The LRU status of any set that contains locked blocks
is set to the initialization value.
4. Cache locking is activated so that the locked blocks
cannot be victims of the replacement algorithm.
This sequence of events is triggered by writing
‘
1
’
to
DC_LOCK_ENABLE even if the enable is already set to
the
MMIO
registers