Dynamic recompilers are not hacks, they do not sacrifice accuracy, and anyone who tells you otherwise is an idiot.
I'm sure none of the below is news to you, MooglyTwo. But for others reading this thread ...
In theory, a dynamic recompiler can be just as accurate as an interpreter in every respect, but such an approach is usually infeasible in practice.
Take a processor where IRQs can be enabled or disabled, triggered or blocked within the middle of any single opcode, even based upon what exactly the opcode is doing. For instance, the second to last cycle could disable IRQs right before one was scheduled to occur.
To account for that, you must add native processor code to perform IRQ updates for you after each relevant bus access.
Further, a processor could communicate with another part of the system, in which case you must sync up that other component before the communication occurs, if it is behind. Given dynamically recompiled code can still perform indirect memory accesses (and hence you might not know if it will communicate with another component), this means your generated code must also perform these checks -- perhaps by calling specialized memory read/write functions.
In truth, to reach subcycle-level accuracy with a dynarec, you end up with so much safeguarding that you lose most of the speedup that dynarecs offer in the first place. And you still have to recompile all of the code, where caching only gets you so far. In the end, you end up with a lot of needless complexity for a very small speed gain.
Of course, there are cases where there is very little (or even no!) external communication with other processors, and there are not multiple independent logic units in the processor, such as interrupt request or DMA circuitry. There are also systems where subcycle-level accuracy is pointless outside of the domain of theoretics. Even the SNES gets pretty close there. I would say this level of precision on an N64 would be overkill, even for purists. Dynarecs can be used in these cases with no real downsides.
But you would be a goddamn fool to use a dynarec on a 6502 when aiming for subcycle accuracy as a means of gaining speed, for instance.
Typically, if a dynarec is more accurate than an interpreter, the fault lies equally on the fact that the interpreter is poorly written. It has nothing to do with dynarec being superior for achieving accuracy. And I'm not saying you suggested that, I'm just pointing that out for everyone else.
In fact, a dynarec is quite the opposite of something that helps accuracy. It adds another potential point of failure while providing no benefits to accuracy. They exist only as a means to gain speed. But with a talented enough programmer, he can avoid all of the potential pitfalls and still gain a nice speedup with one.
As I said before, I strongly admire those who are willing to attempt to maximize both speed
and accuracy. That's clearly the hardest goal of all in emulation.
If you're not all about FR33 G4MEZ!!!!! at FULL SPEED on L33T PENTIUM THR33ZZZ!!!!!
Very depressingly, it goes with the territory. I hate it as much as you do, but that will never, ever change. And the majority of people will stick with the fastest emulator that gets them Super Mario 64 with the most additional, non-hardware related features, regardless of how it does it. To them, it's all about the ends, and not the means.
And yea, emulating at a low level the rsp and the rdp is necessary if you want perfect accuracy. There are some rsp and graphic plugins that are emulating them at a low level (z64 and pj64 beta plugins for instance), so there's nothing in the current emulator structure that is preventing low level emulation except the details that we are discussing in this thread about the synchronization of the rsp with the main cpu.
I'd be happy to lend a hand with synchronization techniques, if you wish. I have a good deal of experience with this.
If you want, we can discuss this here or in private. I'd just need to know how all the RSP and RDP can communicate and such (do they share an address bus? do they use a specific I/O port range?), at what clock speeds they execute at, and how willing you are to forego some degree of precision in return for speed. Obviously this also depends on how sensitive the games are.