"Runahead execution", "fast forwarding", SSE optimisation, and still cycle accurate?
Sound like sh*t just got real :term: .
It seems I have lost the track of your progress though, could please remind me (us?) the things that you still need to implement?
VR4300 - needs TLB, FPU, some CP0, timings, some instructions, etc.
RSP - needs a handful more instructions, dual-issue logic worked on. Also probably has a bug.
RDP - need to fix RSP/RDP communication issue and begin implementing.
Runahead execution - haven't really started, just looked into it.
SSE optimization - present for ~80% of RSP instructions.
Fast forwarding - done, needs testing on commercial ROMs.
Cycle-accuracy - framework in place, but as hack mentioned, caches aren't modeled and delays aren't present, etc. In other words, it's as easy as adding:
Code:
vr4300->pipeline.stalls = 50;
in the VR4300 FPU for instructions that take > 1 cycle, for example. I just haven't actually taken the time to determine the delays of everything and add it in.
I've been trying to "flesh out" everything so that things can drop in place when needed, but keep performance in mind at the same time. If there's some "easy" optimization that will yield enough performance (IMO) to take advantage of without getting in the way or implementing actual functionality too much, I'll generally do it right away. Any x86 machine in the last couple years will get > 30 VI/s, and CEN64 isn't capable of playing commercial ROMs yet, so I'm just putting things like runahead execution on the backburner until they're
really needed and instead starting to look at things like the RDP.
If that's true, wouldn't that mean the typically-poor-in-high-end-emulation AMD CPUs could actually be...useful? I mean, in Dolphin the Intel CPUs are quite a ways ahead regardless of clockrate:
Dolphin CPU Benchmark Results
My bet is that Intel will still win out, but the difference between AMD/Intel at the same frequency won't be nearly as exaggerated. The goal is to have ~3GHz or so machines capable of running everything at 60 VI/s, so hopefully these kinds of benchmarks won't even be necessary.
