SGI Techpubs Library

Hardware  »  Books  »  Developer  »  
MIPS R10000 Microprocessor User Guide, Version 2.0
(document number: 007-2490-001 / published: 1997-01-30)    table of contents  |  additional info  |  download
find in page

1.7 Implications of R10000 Microarchitecture on Software

Speculative Execution


Speculative execution increases parallelism by fetching, issuing, and completing instructions even in the presence of unresolved conditional branches and possible exceptions. Following are some suggestions for increasing program efficiency:


Side Effects of Speculative Execution

To improve performance, R10000 instructions can be speculatively fetched and executed. Side-effects are harmless in cached coherent operations; however there are potential side-effects with non-coherent cached operations. These side-effects are described in the sections that follow.

Speculatively fetched instructions and speculatively executed loads or stores to a cached address initiate a Processor Block Read Request to the external interface if it misses in the cache. The speculative operation may modify the cache state and/or data, and this modification may not be reversed even if the speculation turns out to be incorrect and the instruction is aborted.

Speculative Processor Block Read Request to an I/O Address

Accesses to I/O addresses often cause side-effects. Typically, such I/O addresses are mapped to an uncached region and uncached reads and writes are made as double/single/partial-word reads and writes (non-block reads and writes) in R10000. Uncached reads and writes are guaranteed to be non-speculative.

However, if R10000 has a "garbage" value in a register, a speculative block read request to an unpredictable physical address can occur, if it speculatively fetches data due to a Load or Jump Register instruction specifying this register. Therefore, speculative block accesses to load-sensitive I/O areas can present an unwanted side-effect.

Unexpected Write Back Due to Speculative Store Instruction

When a Store instruction is speculated and the target address of the speculative Store instruction is missing in the cache, the cache line is refilled and the state is marked to be Dirty. However the refilled data may not be actually changed in the cache if this store instruction is later aborted. This could present a side-effect in cases such as the one described below:

  • The processor is storing data sequentially to memory area A, using a code-loop that includes Store and Cond.branch instructions.
  • A DMA write operation is performed to memory area B.
  • DMA area B is contiguous to the sequential storage area A.
  • The DMA operation is noncoherent.
  • The processor does not cache any lines of DMA area B.
If the processor and the DMA operations are performed in sequence, the following could occur:

1. Due to speculative execution at the exit of the code-loop, the line of data beyond the end of the memory area A -- that is, the starting line of memory area B -- is refilled to the cache. This cache line is then marked Dirty.

2. The DMA operation starts writing noncoherent data into memory area B.

3. A cache line replacement is caused by later activities of the processor, in which the cache line is written back to the top of area B. Thus, the first line of the DMA area B is overwritten by old cache data, resulting in incorrect DMA operation and data.

The OS can restrict the writable pages for each user process and so can prevent a user process from interfering with an active DMA space. The kernel, on the other hand, retains xkphys and kseg0 addresses in registers. There is no write protection against the speculative use of the address values in these registers. User processes which have pages mapped to physical spaces not in RAM may also have side-effects. These side-effects can be avoided if DMA is coherent.

Speculative Instruction Fetch

The change in a cache line's state due to a speculative instruction fetch is not reversed if the speculation is aborted. This does not cause any problems visible to the program except during a noncoherent memory operation. Then the following side-effect exists: if a noncoherent line is changed to Clean Exclusive and this line is also present in noncoherent space, the noncoherent data could be modified by an external component and the processor would then have stale data.

Workarounds for Noncoherent Cached Systems

The suggestions presented below are not exhaustive; the solutions and trade-offs are system dependent. Any one or more of the items listed below might be suitable in a particular system, and testing and simulations should be used to verify their efficacy.

1. The external agent can reject a processor block read request to any I/O location in which a speculative load would cause an undesired affect. Rejection is made by returning an external NACK completion response.

2. A serializing instruction such as a cache barrier or a CP0 instruction can be used to prevent speculation beyond the point where speculative stores are allowed to occur. This could be at the beginning of a basic block that includes instructions that can cause a store with an unsafe pointer. (Stores to addresses like stack-relative, global-pointer-relative and pointers to non-I/O memory might be safe.) Speculative loads can also cause a side-effect. To make sure there is no stale data in the cache as a result of undesired speculative loads, portions of the cache referred by the address of the DMA read buffers could be flushed after every DMA transfer from the I/O devices.

3. Make references to appropriate I/O spaces uncached by changing the cache coherency attribute in the TLB.

4. Generally, arbitrary accesses can be controlled by mapping selected addresses through the TLB. However, references to an unmapped cached xkphys region could have hazardous affects on I/O. A solution for this is given below:

First of all, note that the xkphys region is hard-wired into cached and uncached regions, however the cache attributes for the kseg0 region are programmed through the Config register. Therefore, clear the KX bit (to a zero) and set (to ones) the SX and UX bits in the Status register. This disables access to the xkphys region and restricts access to only the User and Supervisor portions of the 64-bit address space.

In general, the system needs either a coherent or a noncoherent protocol -- but not both. Therefore these cache attributes can be used by the external hardware to filter accesses to certain parts of the kseg0 region. For instance, the cache attributes for the kseg0 address space might be defined in the Config register to be cache coherent while the cache attributes in the TLB for the rest of virtual space are defined to be cached-noncoherent or uncached. The external hardware could be designed to reject all cache coherent mode references to the memory except to that prior-defined safe space in kseg0 within which there is no possibility of an I/O DMA transfer. Then before the DMA read process and before the cache is flushed for the DMA read buffers, the cache attributes in the TLB for the I/O buffer address space are changed from noncoherent to uncached. After the DMA read, the access modes are returned to the cached-noncoherent mode.

5. Just before load/store instruction, use a conditional move instruction which tests for the reverse condition in the speculated branch, and make all aborted branch assignments safe. An example is given below:



In the above example, without the MOVN the read to the address in register ra could be speculatively executed and later aborted. It is possible that this load could be premature and thus damaging. The MOVN guarantees that if there is a misprediction (r1 is not equal to 0) ra will be loaded with an address to which a read will not be damaging.

6. The following is similar to the conditional-move example given above, in that it protects speculation only for a single branch, but in some instances it may be more efficient than either the conditional move or the cache barrier workarounds.

This workaround uses the fact that branch-likely instructions are always predicted as taken by the R10000. Thus, any incorrect speculation by the R10000 on a branch-likely always occurs on a taken path. Sample code is:



The store to r1 will never be to an address referred to by the content of rx, because the store will never be executed speculatively. Thus, the address referred to by the content of rx is protected from any spurious write-backs.

This workaround is most useful when the branch is often taken, or when there are few instructions in the protected block that are not memory operations. Note that no instructions in a block following a branch-likely will be initiated by speculation on that branch; however, in the case of a serial instruction workaround, only memory operations are prevented from speculative initiation. In the case of the conditional-move workaround, speculative initiation of all instructions continues unimpeded. Also, similar to the conditional-move workaround, this workaround only protects fall-through blocks from speculation on the immediately preceding branch. Other mechanisms must be used to ensure that no other branches speculate into the protected block. However, if a block that dominates*1 the fall-through block can be shown to be protected, this may be sufficient. Thus, if block (a) dominates block (b), and block (b) is the fall-through block shown above, and block (a) is the immediately previous block in the program (i.e., only the single conditional branch that is being replaced intervenes between (a) and (b)), then ensuring that (a) is protected by serial instruction means a branch-likely can safely be used as protection for (b).




Copyright 1996, 1997, MIPS Technologies, Inc. -- 09 DEC 96


Generated with CERN WebMaker

MIPS R10000 Microprocessor User Guide, Version 2.0
(document number: 007-2490-001 / published: 1997-01-30)    table of contents  |  additional info  |  download


home/search | what's new | help