Workarounds for Noncoherent Cached Systems
The suggestions presented below are not exhaustive; the solutions and trade-offs are system dependent. Any one or more of the items listed below might be suitable in a particular system, and testing and simulations should be used to verify their efficacy.
1. The external agent can reject a processor block read request to any I/O location in which a speculative load would cause an undesired affect. Rejection is made by returning an external NACK completion response.
2. A serializing instruction such as a cache barrier or a CP0 instruction can be used to prevent speculation beyond the point where speculative stores are allowed to occur. This could be at the beginning of a basic block that includes instructions that can cause a store with an unsafe pointer. (Stores to addresses like stack-relative, global-pointer-relative and pointers to non-I/O memory might be safe.) Speculative loads can also cause a side-effect. To make sure there is no stale data in the cache as a result of undesired speculative loads, portions of the cache referred by the address of the DMA read buffers could be flushed after every DMA transfer from the I/O devices.
3. Make references to appropriate I/O spaces uncached by changing the cache coherency attribute in the TLB.
4. Generally, arbitrary accesses can be controlled by mapping selected addresses through the TLB. However, references to an unmapped cached xkphys region could have hazardous affects on I/O. A solution for this is given below:
First of all, note that the xkphys region is hard-wired into cached and uncached regions, however the cache attributes for the kseg0 region are programmed through the Config register. Therefore, clear the KX bit (to a zero) and set (to ones) the SX and UX bits in the Status register. This disables access to the xkphys region and restricts access to only the User and Supervisor portions of the 64-bit address space.
In general, the system needs either a coherent or a noncoherent protocol -- but not both. Therefore these cache attributes can be used by the external hardware to filter accesses to certain parts of the kseg0 region. For instance, the cache attributes for the kseg0 address space might be defined in the Config register to be cache coherent while the cache attributes in the TLB for the rest of virtual space are defined to be cached-noncoherent or uncached. The external hardware could be designed to reject all cache coherent mode references to the memory except to that prior-defined safe space in kseg0 within which there is no possibility of an I/O DMA transfer. Then before the DMA read process and before the cache is flushed for the DMA read buffers, the cache attributes in the TLB for the I/O buffer address space are changed from noncoherent to uncached. After the DMA read, the access modes are returned to the cached-noncoherent mode.
5. Just before load/store instruction, use a conditional move instruction which tests for the reverse condition in the speculated branch, and make all aborted branch assignments safe. An example is given below:

In the above example, without the MOVN the read to the address in register ra could be speculatively executed and later aborted. It is possible that this load could be premature and thus damaging. The MOVN guarantees that if there is a misprediction (r1 is not equal to 0) ra will be loaded with an address to which a read will not be damaging.
6. The following is similar to the conditional-move example given above, in that it protects speculation only for a single branch, but in some instances it may be more efficient than either the conditional move or the cache barrier workarounds.
This workaround uses the fact that branch-likely instructions are always predicted as taken by the R10000. Thus, any incorrect speculation by the R10000 on a branch-likely always occurs on a taken path. Sample code is:

The store to r1 will never be to an address referred to by the content of rx, because the store will never be executed speculatively. Thus, the address referred to by the content of rx is protected from any spurious write-backs.
This workaround is most useful when the branch is often taken, or when there are few instructions in the protected block that are not memory operations. Note that no instructions in a block following a branch-likely will be initiated by speculation on that branch; however, in the case of a serial instruction workaround, only memory operations are prevented from speculative initiation. In the case of the conditional-move workaround, speculative initiation of all instructions continues unimpeded. Also, similar to the conditional-move workaround, this workaround only protects fall-through blocks from speculation on the immediately preceding branch. Other mechanisms must be used to ensure that no other branches speculate into the protected block. However, if a block that dominates*1 the fall-through block can be shown to be protected, this may be sufficient. Thus, if block (a) dominates block (b), and block (b) is the fall-through block shown above, and block (a) is the immediately previous block in the program (i.e., only the single conditional branch that is being replaced intervenes between (a) and (b)), then ensuring that (a) is protected by serial instruction means a branch-likely can safely be used as protection for (b).