What's new

Announcement: Cycle-accurate N64 development underway.

Nintendo Maniac

New member
TBH that would seem a bit pointless considering no modern GPU even supports 320x240 in fullscreen (my old Radeon 9600 could do it, but my current Radeon 4200 and Geforce 8800GS cannot) and 320x240 in windowed it... pretty small.

EDIT: OH! I forgot to mention, I was concerned about the bit of assembly you used since I had previously mentioned in regards to SSE that x86-lock in isn't really the best of ideas nowadays...especially with an accuracy-focused emulator that could be useful for who knows how many years to come.
 
Last edited:
OP
MarathonMan

MarathonMan

Emulator Developer
Couldn't you add an option to output at 320x240?

Yes, I think so, though I'm not sure what the point would be, since you would have to render first and THEN scale down... there's no performance incentive. The pixel drawing and interpolation is all done on the GPU, so it's incredibly cheap to scale up to a constant resolution.

OH! I forgot to mention, I was concerned about the bit of assembly you used since I had previously mentioned in regards to SSE that x86-lock in isn't really the best of ideas nowadays...especially with an accuracy-focused emulator that could be useful for who knows how many years to come.

No other architecture has a processor even remotely capable of running the simulator at it's target speed at the moment, so I'm not concerned.
 
Last edited:

Nintendo Maniac

New member
No other architecture has a processor even remotely capable of running the simulator at it's target speed at the moment, so I'm not concerned.
There are extremely high-end CPUs that are PowerPC-based, but said high-end CPUs are not used in consumer products. Nintendo also seems to be commited to the PowerPC architecture for consoles, so I wouldn't be surprised if in 10-20 years from now they're still using PowerPC based CPUs.

Also, ARM has no high-end CPU simply because it's always been focused on performance-per-watt and therefore nobody has really tried to scale it up to a laptop and/or desktop-level power envelope (Nvidia was with Project Denver, but no idea what's going on with that).

Either way, being a simulator, it would be useful even for years and years to come since you can't ever surpass the accuracy of 100%. Remember, this is N64 emulation we're talking about, where the most accurate video plugin is designed for a GPU released over 10 years ago - I would think by now that requiring a specific hardware architecture would be something to shy away from and instead putting that effort into just optimizing for a specific architecture would be preferred (optimized != required).
 
Last edited:
OP
MarathonMan

MarathonMan

Emulator Developer
Either way, being a simulator, it would be useful even for years and years to come since you can't ever surpass the accuracy of 100%. Remember, this is N64 emulation we're talking about, where the most accurate video plugin is designed for a GPU released over 10 years ago - I would think by now that requiring a specific hardware architecture would be something to shy away from and instead putting that effort into just optimizing for a specific architecture would be preferred (optimized != required).

Yes, but I think you're underestimating the amount of tuning that has to go into even, say, a *fast*, modern Intel processor to get this thing churning at full speed. The only reason that I've been able to pull this off so far is by hammering down everything so tightly that it all just clicks... eliminating SSE support could seriously hamper RSP performance. Regardless, the simulator is implemented in such a way that you can disable SSE support at any time by simply removing the -DUSE_SSE flag in the RSP module (though there's no faithful ANSI implementation of most instructions, yet).

e.x., disabling the flag here:
https://github.com/tj90241/cen64-rsp/blob/master/Makefile#L39

will prevent SSE code from being used in CP2, here:
https://github.com/tj90241/cen64-rsp/blob/master/CP2.c#L233

again; somebody just needs to implement the non-SSE version before it's usable. If you compile with debugging macros, the simulator will emit warnings everytime instructions are executed that are lacking support.
 
Last edited:

Nintendo Maniac

New member
Yes, but I think you're underestimating the amount of tuning that has to go into even, say, a *fast*, modern Intel processor to get this thing churning at full speed. The only reason that I've been able to pull this off so far is by hammering down everything so tightly that it all just clicks... eliminating SSE support could seriously hamper RSP performance.
But is that not optimization for x86? I'm not against that at all, but it shouldn't be something that actually prevents other architectures from running CEN64, even if it's at a tenth of the speed.

Regardless, the simulator is implemented in such a way that you can disable SSE support at any time by simply removing the -DUSE_SSE flag in the RSP module (though there's no faithful ANSI implementation of most instructions, yet).

But that is exactly what I'm saying in terms of optimized vs required. In 20 years I don't think it'd be ridiculous to say we'd have smart phone-like devices that are at least as fast as Ivy Bridge, but could very well be even faster. Considering their history of not being x86, they'd most likely be ARM-based. And considering the inactive history of N64 emulation, I would not be surprised if between now and then nobody created another accuracy-focused N64 emulator (outside of Nintendo themselves) that is compatible with ARM even though the hardware would be fast enough to handle it.

Even just having CEN64 be able to run, even if unoptimized, would make a much greater chance of such a future to not happen, especially since speed optimizations for a specific architecture is something that many more people will be familiar with vs people that could make an accuracy-focused N64 emulator from scratch.

And this is without considering the fact that such a future ARM CPU may be so fast that it could run CEN64 at full speed even without optimizations. Again, being a simulator, it by nature will have lasting value for years to come.
 
Last edited:
OP
MarathonMan

MarathonMan

Emulator Developer
In 20 years I don't think it'd be ridiculous to say we'd have smart phone-like devices that are at least as fast as Ivy Bridge, but could very well be even faster. Considering their history of not being x86, they'd most likely be ARM-based. And considering the inactive history of N64 emulation, I would not be surprised if between now and then nobody created another accuracy-focused N64 emulator (outside of Nintendo themselves) that is compatible with ARM even though the hardware would be fast enough to handle it.

Even just having CEN64 be able to run, even if unoptimized, would make a much greater chance of such a future to not happen, especially since speed optimizations for a specific architecture is something that many more people will be familiar with vs people that could make an accuracy-focused N64 emulator from scratch.

I still don't think you're understanding fully how much of a difference these optimizations are making. If I tweak the compiler flags a little bit, performance can suddenly drop as much as almost 50%. IIRC, the SSE more than doubles the performance of the RSP code when it's executing vector instructions (not only because it processes several datum in parallel, but also because it frees up non-vector EUs in the x86 pipeline). I dislike vendor lock-in just as much is the next guy, which is why I provided a mechanism to still fall all the way back to ANSI C, or allow programmers to insert intrinsic for other architectures. At the end of the day, however, if I need vendor lock-ins to get playable performance out of this thing, so be it. They can be changed later.

And this is without considering the fact that such a future ARM CPU may be so fast that it could run CEN64 at full speed even without optimizations. Again, being a simulator, it by nature will have lasting value for years to come.

I doubt even a V8 could run it, even with NEON optimizations. I'm guessing we'd need to move away from silicon before you see the kind of compute power where you're able to run this thing on your phone.
 

Nintendo Maniac

New member
I still don't think you're understanding fully how much of a difference these optimizations are making.

Oh, but I do! Again, I'm all for optimizing for specific architectures. My whole point was that there should still be architecture-agnostic fallbacks that will work regardless of the performance. This is similar to software rendering in Dolphin or the "Interpreter" CPU engine in many many emulators.
 

mrmudlord

New member
Oh yes, future proofing. Just like Crysis when it was released.

Who cares if it runs 10 seconds per frame if in 50 years time it will run fine. We must leave a legacy to our children.
 

Guru64

New member
again; somebody just needs to implement the non-SSE version before it's usable. If you compile with debugging macros, the simulator will emit warnings everytime instructions are executed that are lacking support.
Is there any way to test this code? I might want to try implementing a C version sometime.
 
OP
MarathonMan

MarathonMan

Emulator Developer
Is there any way to test this code? I might want to try implementing a C version sometime.

Excellent! Iconoclast/cxd4@github already has a faithful C and highly-optimized implementation of everything. It would be great to see that merged in!

The code is quite easy to test. Clone my RSP assmebler (https://github.com/tj90241/rspasm) and build CEN64 with debugging symbols ('make debug'). Make sure you pull the most recent RSP branch. cd in rsp/Tests and issue a 'make' command. You can then feed uCode from the assembler into the RSP, and have the RSP stop and print all of its registers after a specified number of cycles.
 
OP
MarathonMan

MarathonMan

Emulator Developer
At the risk of beating a dead horse but in the interest of clarification...so you ARE saying you have a full C implementation without the x86-exclusive code?

I dislike vendor lock-in just as much is the next guy, which is why I provided a mechanism to still fall all the way back to ANSI C, or allow programmers to insert intrinsic for other architectures.

The cracks just need to be filled in.
 

Nintendo Maniac

New member
Alright, we finally got that all cleared up now. :)

In other news, I noticed that the video you posted didn't have a framerate counter...how are you able to tell how close you are to running at fullspeed?
 
OP
MarathonMan

MarathonMan

Emulator Developer
Alright, we finally got that all cleared up now. :)

In other news, I noticed that the video you posted didn't have a framerate counter...how are you able to tell how close you are to running at fullspeed?

Eyeballing it for now... if it looks fast enough, I'm happy. :D
 

Nintendo Maniac

New member
Speaking of eyeballing and performance, I was just thinking... wouldn't rendering in high-res mode (whether enabled in-game or forced via emulator) require considerably more CPU cycles since everything is software and therefore 4 times the pixels, or is the rendering of graphics nothing compared to the CPU cycles being spent on emulating everything?
 
OP
MarathonMan

MarathonMan

Emulator Developer
Speaking of eyeballing and performance, I was just thinking... wouldn't rendering in high-res mode (whether enabled in-game or forced via emulator) require considerably more CPU cycles since everything is software and therefore 4 times the pixels, or is the rendering of graphics nothing compared to the CPU cycles being spent on emulating everything?

You could leverage OpenGL to scale up the graphics to a desired resolution for you, which would be incredibly cheap. I'm not sure how it would look though. How I draw right now is by generating a texture every time I have to render a frame, then putting that texture on a quad and rendering the quad. So, if you "stretch" the quad, the texture will effectively be stretched as well.

OTOH, if you force the RDP to render in a higher resolution, it'll take more cycles to push the image.
 

Top