So, without multithreading, this will never run realtime. I'm going to try and do something that I haven't seen done in the emulation community: "runahead" execution.
The problem with cycle-accuracy, and one of the reason why many believe it isn't going to be possible on N64-generation consoles, is due to lock-contention. Let me begin by stating that a cycle-accurate emulation for this console will NEVER, ever run fast enough on a single core... unless those guys at Intel start shipping processors that use graphine as a substrate. (CEN64, in it's current state, can demonstrate this already!)
So why not multithread? Well, processors are relatively slow at shipping data to other cores. On the other hand, cycle-accurate simulation requires all cores to effectively "sync" each cycle to check for interrupts or other events... conflicting ideas. My "runahead" execution revolves around the two following assumptions:
(a) Simulated processors takes a relatively large number of cycles to perform a bus access. If the VR4300 is going to read from the memory and it misses the cache, it's going to spin a few cycles and not do work anyways.
(b) Interrupts are not TOO common. We have to simulate 93.75 million cycles/second. Most of the time, whatever we're simulating isn't going to be bothered by anything else, nor are we going to bother anyone else. And, if we do, we're going to be classified in (a) and in a "dead" period until our access is fulfilled.
So, basically, what we can do is run each core in its own thread, and allow each core to become unsync-ed with other cores. To not invalidate the guarantees of cycle-accuracy, when doing so, we will keep a "history log" of what occurred during each one of the cycles that we run ahead. On the off chance we do get interrupted by somebody else, we can use the history log to "micro reboot" ourselves to the exact point in time we got the interrupt.
Certainly a risky idea, but it's the only way I can imagine that contention can possibly reduced enough. The idea comes from reorder buffers in conventional processors that handle rollback on branch misprediction; same game here, just in software! Crossing my fingers...
EDIT: Quick and dirty inspection says that I can expect only a couple hundred interrupts to occur each second. The only other worry will be external accesses on some devices, but that should be manageable (I can use the history buffer to determine what the data was at that cycle and not even have to revert in some cases).