What's new

Announcement: Cycle-accurate N64 development underway.

Aerosol

New member
"Runahead execution" has been done by Nemesis. Check it out at www (dot) exodusemulator (dot) com. Granted, he only just released the first version a couple of days ago, but it'll probably be worth getting in touch, no? Even if you're deadset on creating your own framework, Nemesis has already applied "runahead execution" to an emulation project. A cycle accurate one, at that.
 

XICO2KX

New member
"Runahead execution" has been done by Nemesis.
Interesting... Here's what the author of this cycle-accurate MegaDrive/Genesis emulator as to say about that part: :geek:
http://www.exodusemulator.com/joomla30/index.php/what-is-exodus said:
3. Effective multithreading:

Exodus was built from the ground up to solve the unsolvable timing problems, and has a unique approach to timing accuracy. It adopts what I call the optimistic execution model.

This idea isn't new either. The concept is simple, and it goes something like this: "Most of the time, a timing problem is not going to occur. Given that assumption, I want to execute my devices unsynchronized for as long as possible. If something ends up happening in the wrong order, I want to roll back to the previous point, and repeat the operation, this time with fore-knowledge about the timing requirements". This is the execution model Exodus uses. By executing in parallel for as long as possible, we can make use of multiple cores.

The idea of state rollback is implemented as a core part of the platform itself, and is heavily optimized to be as fast as possible where a rollback is not required. Devices can assist the emulation by giving advance notice about significant events, such as interrupt generation, which are likely to affect other devices. System XML definitions can also give additional timing hints, such as forcing particular devices to always remain behind the current execution point of other devices.

A combination of these techniques allows Exodus to achieve 100% timing accuracy, while making effective use of multiple cores to execute devices in parallel for as long as possible.
 
Last edited:
OP
MarathonMan

MarathonMan

Emulator Developer
"Runahead execution" has been done by Nemesis. Check it out at www (dot) exodusemulator (dot) com. Granted, he only just released the first version a couple of days ago, but it'll probably be worth getting in touch, no? Even if you're deadset on creating your own framework, Nemesis has already applied "runahead execution" to an emulation project. A cycle accurate one, at that.

Hmm yes, this is almost exactly what I was looking into doing. I'll have to look at his project and then perhaps ping them. At least he's proven that it's feasible. :)

EDIT: After looking into it I can't find the source (?); just binary releases. Seems to support Windows only (ATM?). Performance isn't quite what I was hoping to see, either; for the genesis, people are reporting that i7 950s are only getting 55-60 FPS. While those requirements are certainly passable for Megadrive, N64 needs a lot more horsepower to keep it going.
 
Last edited:

Aerosol

New member
Hmm yes, this is almost exactly what I was looking into doing. I'll have to look at his project and then perhaps ping them. At least he's proven that it's feasible. :)

EDIT: After looking into it I can't find the source (?); just binary releases. Seems to support Windows only (ATM?). Performance isn't quite what I was hoping to see, either; for the genesis, people are reporting that i7 950s are only getting 55-60 FPS. While those requirements are certainly passable for Megadrive, N64 needs a lot more horsepower to keep it going.

That's not necessarily the fault of Exodus. It could be his emulation cores (which actually aren't entirely finished) Without looking at the sources, I don't know for sure though. I don't know if he plans on releasing the source to Exodus or any of his cores either. I sure hope so.
 

bleck

New member
Looking at his "Design Philosophy", it seems like he's going for the "code as documentation", "clean code and clean architecture before performance" approach, which could very well explain the slowness.
 

DETOMINE

New member
Performance isn't quite what I was hoping to see, either; for the genesis, people are reporting that i7 950s are only getting 55-60 FPS. While those requirements are certainly passable for Megadrive, N64 needs a lot more horsepower to keep it going.
As far as I can tell from the screenshots, it also seems to be the debug version. A release without the debug mode should be faster as well.
 
OP
MarathonMan

MarathonMan

Emulator Developer
Away on vacation, but got CEN64 working on the Google Chromebook (ARM, Samsung Exynos 5250). I don't have EGL acceleration at my fingertips, so I'm using software MESA for now, which is chewing up a large fraction of my runtime.

However, being relegated to a slower machine and a different architecture enabled me to both fix some bugs and find alternative ways to squeeze out extra performance. Even on the Chromebook, I'm able to get 6VI/s for all ROMs with SW-based OpenGL. When I disable OpenGL, ROMs are generally able to get 15-20+ VI/s... which is what about should be possible once EGL is available.

I found a way get a /lot/ of performance squeezed out of simple ROMs. Even on the Chromebook, Oman's Pong hits 30+ VI/s easily now. I managed to do this by putting the VR4300 into a "fast forward" mode that only checks a very minimal amount of things when the CPU starts executing a lot of "busy loop" code as it does when it's waiting for interrupts from other devices. I'm thinking that even modest x86 machines (< 3GHz) should see a full 60VI/s on all the ROMs that CEN64 is capable of running as a result... I will test when I get home from vacation. :)
 

Nintendo Maniac

New member
Away on vacation, but got CEN64 working on the Google Chromebook (ARM, Samsung Exynos 5250).
Great to hear CEN64 running on non-x86! :D

I'm thinking that even modest x86 machines (< 3GHz) should see a full 60VI/s on all the ROMs that CEN64 is capable of running as a result... I will test when I get home from vacation. :)
Erm... you've heard of the MHZ Myth, right? I mean, my Brisbane @ 3GHz is twice as slow as your Sandy Bridge @ 3GHz...
 
Last edited:
OP
MarathonMan

MarathonMan

Emulator Developer
Erm... you've heard of the MHZ Myth, right? I mean, my Brisbane @ 3GHz is twice as slow as your Sandy Bridge @ 3GHz...

Of course... that's a first-day computer architecture 101 topic. However, CEN64 is different in that it basically measures your processor's ability to execute integer and a handful of very, very-predictable branch instructions. The performance differences across architectures will vary much less than a traditional application that results in cache-misses, branch mispredictions, varying kinds of instructions, etc.

Haswell might be able to accelerate RSP emulation a tad better due to its large number of vector units, but frequency >>> architecture (when comparing vendors of the same architecture, whose chips were released in the same timeframe, etc.) for this application.
 

DETOMINE

New member
Away on vacation, but got CEN64 working on the Google Chromebook (ARM, Samsung Exynos 5250). I don't have EGL acceleration at my fingertips, so I'm using software MESA for now, which is chewing up a large fraction of my runtime.[...]I managed to do this by putting the VR4300 into a "fast forward" mode that only checks a very minimal amount of things when the CPU starts executing a lot of "busy loop" code as it does when it's waiting for interrupts from other devices. I'm thinking that even modest x86 machines (< 3GHz) should see a full 60VI/s on all the ROMs that CEN64 is capable of running as a result... I will test when I get home from vacation. :)
"Runahead execution", "fast forwarding", SSE optimisation, and still cycle accurate?
Sound like sh*t just got real :term: .

It seems I have lost the track of your progress though, could please remind me (us?) the things that you still need to implement?
 

Nintendo Maniac

New member
Of course... that's a first-day computer architecture 101 topic. However, CEN64 is different in that it basically measures your processor's ability to execute integer and a handful of very, very-predictable branch instructions. The performance differences across architectures will vary much less than a traditional application that results in cache-misses, branch mispredictions, varying kinds of instructions, etc.

If that's true, wouldn't that mean the typically-poor-in-high-end-emulation AMD CPUs could actually be...useful? I mean, in Dolphin the Intel CPUs are quite a ways ahead regardless of clockrate:

Dolphin CPU Benchmark Results
 
Last edited:
OP
MarathonMan

MarathonMan

Emulator Developer
"Runahead execution", "fast forwarding", SSE optimisation, and still cycle accurate?
Sound like sh*t just got real :term: .

It seems I have lost the track of your progress though, could please remind me (us?) the things that you still need to implement?

VR4300 - needs TLB, FPU, some CP0, timings, some instructions, etc.
RSP - needs a handful more instructions, dual-issue logic worked on. Also probably has a bug.
RDP - need to fix RSP/RDP communication issue and begin implementing.

Runahead execution - haven't really started, just looked into it.
SSE optimization - present for ~80% of RSP instructions.
Fast forwarding - done, needs testing on commercial ROMs.
Cycle-accuracy - framework in place, but as hack mentioned, caches aren't modeled and delays aren't present, etc. In other words, it's as easy as adding:

Code:
vr4300->pipeline.stalls = 50;

in the VR4300 FPU for instructions that take > 1 cycle, for example. I just haven't actually taken the time to determine the delays of everything and add it in.

I've been trying to "flesh out" everything so that things can drop in place when needed, but keep performance in mind at the same time. If there's some "easy" optimization that will yield enough performance (IMO) to take advantage of without getting in the way or implementing actual functionality too much, I'll generally do it right away. Any x86 machine in the last couple years will get > 30 VI/s, and CEN64 isn't capable of playing commercial ROMs yet, so I'm just putting things like runahead execution on the backburner until they're really needed and instead starting to look at things like the RDP.

If that's true, wouldn't that mean the typically-poor-in-high-end-emulation AMD CPUs could actually be...useful? I mean, in Dolphin the Intel CPUs are quite a ways ahead regardless of clockrate:

Dolphin CPU Benchmark Results

My bet is that Intel will still win out, but the difference between AMD/Intel at the same frequency won't be nearly as exaggerated. The goal is to have ~3GHz or so machines capable of running everything at 60 VI/s, so hopefully these kinds of benchmarks won't even be necessary. :)
 
Last edited:

Durza007

New member
I managed to compile it on windows with MinGW 64! Unfortunately there seems to be
something wrong with my installation or something because it couldn't compile with the following optimization flags:
Code:
-flto -fwhole-program -fuse-linker-plugin
Performance is probably suffering because of that. I get ~11.5 VI/s on my Core i5-750 at 3 GHz
 

Attachments

  • tHdekHB[1].jpg
    tHdekHB[1].jpg
    123.6 KB · Views: 427
OP
MarathonMan

MarathonMan

Emulator Developer
I managed to compile it on windows with MinGW 64! Unfortunately there seems to be
something wrong with my installation or something because it couldn't compile with the following optimization flags:
Code:
-flto -fwhole-program -fuse-linker-plugin
Performance is probably suffering because of that. I get ~11.5 VI/s on my Core i5-750 at 3 GHz

Awesome! I was trying to do it with mingw32 yesterday and got too frustrated with it. Did you have to make any changes to the source to get it to build? I'd be happy to merge them into github.

Those optimizations flags, are, unfortunately, the bread and butter of this simulator. I'm not at all surprised that you're getting less than impressive performance without them. What version of gcc did you use (gcc -v)? I might be able to point you in the right direction if you give me the linker errors, too.
 

Durza007

New member
MarathonMan said:
Awesome! I was trying to do it with mingw32 yesterday and got too frustrated with it. Did you have to make any changes to the source to get it to build? I'd be happy to merge them into github.
I had to make some changes. Most notably I had to rewrite the renderer to use GL_TEXTURE_2D instead of RECTANGLE because windows only supports GL 1.1
GL_UNSIGNED_SHORT_5_5_5_1 was also missing but I just added it manually and hoped it would work.
Also when loading a rom file I had to change:
Code:
if ((romFile = fopen(filename, "r")) == NULL) {
to
Code:
if ((romFile = fopen(filename, "rb")) == NULL) {

MarathonMan said:
Those optimizations flags, are, unfortunately, the bread and butter of this simulator. I'm not at all surprised that you're getting less than impressive performance without them. What version of gcc did you use (gcc -v)? I might be able to point you in the right direction if you give me the linker errors, too.
I tried using mingw32 at first but it would not work because the code uses uint128_t somewhere which for some reason isn't supported in the 32 bit version.

The error I get looks very generic but I've attached the output I get to this post along with gcc version info.

EDIT: I found a better version of mingw64, which had gcc 4.8. It now works with all the optimization flags.
I get ~16 VI/s now.
 

Attachments

  • compile-error.txt
    440 bytes · Views: 64
  • gcc-v.txt
    691 bytes · Views: 59
Last edited:
OP
MarathonMan

MarathonMan

Emulator Developer
I had to make some changes. Most notably I had to rewrite the renderer to use GL_TEXTURE_2D instead of RECTANGLE because windows only supports GL 1.1
GL_UNSIGNED_SHORT_5_5_5_1 was also missing but I just added it manually and hoped it would work.
Also when loading a rom file I had to change:
Code:
if ((romFile = fopen(filename, "r")) == NULL) {
to
Code:
if ((romFile = fopen(filename, "rb")) == NULL) {


I tried using mingw32 at first but it would not work because the code uses uint128_t somewhere which for some reason isn't supported in the 32 bit version.

The error I get looks very generic but I've attached the output I get to this post along with gcc version info.

EDIT: I found a better version of mingw64, which had gcc 4.8. It now works with all the optimization flags.
I get ~16 VI/s now.

I will make the change to the ROM plugin, thanks.

Could you also please send the changes for GL1.1. I'd include that option in the video repository, too!
 
OP
MarathonMan

MarathonMan

Emulator Developer
Found a few bugs that appear to be preventing the RSP from working properly, thus resulting in no display lists being sent to the RDP. RSP is looking good though! :D

Bump: Fixed a slew of RSP bugs. RDP is now receiving commands from commercial ROMs. Currently testing with "Rampage: World Tour".
 
Last edited:

Top