What's new

Announcement: Cycle-accurate N64 development underway.

OP
MarathonMan

MarathonMan

Emulator Developer
I would still advise actually emulating cycles on a cycle accurate emulator ;-)

It should be done as early as possible otherwise you will end up with something very similar to existing emulators except you have splitted cpu emulation to several stages. BTW, the real benefit (accuracy seen from the games code) to emulate each stage is to have correct cycle counts and cache. It is also important to take into considerations all possible case where the pipeline stalls. I remember having a hard time figuring out how to emulate data bypass between stages especially with the 2 cop1 registers formats: this is one of the cases that involve pipeline stalls if i remember right. Anyway, i can tell you to be very careful with cache and timing, it really has an effect on the machine seen from the software and some roms do really weird things (i'd say they were very lucky that all n64 were behaving exactly the same, it hides a lot bugs :) ).

I am still not convinced i can really run it fullspeed. Things like cache and write buffers add many things that should be checked for nearly each instruction and timing between all components takes a lot of time. The problem is not really to add delay. It is to let each components do its jobs at the same time and synchronize everything at the right number of cycles. I have also tried to emulate as correctly as i could the behaviour of the VI interface, it is also taking a good amount of cpu time (i am not talking about crt emulation that is virtually free as it is done with GPU). Taking all of that into consideration, i still think it will be extremely slow with RDP. Then again, i am not doing this project with speed as a requirement at this stage.

I won't share source code right now as a lot of values are really experimental and i feel that if i spread it, it would prevent other people from doing their own tests on the real n64. I think it is quite important now, that different people do their own experimentation and later we can share and compare to hopefully fully understant how the hardware is behaving exactly. That said, if you have any question that you feel i can answer, feel free to ask. I will do my best to answer.

Barring cache accesses and coprocessor interlocks and such, I am cycle accurate in the sense that I'm modeling the pipelines, and the ratios of the device clocks are correct (the vr4300 runs at precisely 1.5x the speed of the RCP, etc.). I'm sure the timings with the cache will make a difference (look at the iQue -- things like Ganon's castle falling in the final scene is MUCH smoother when the system is backed with DDR RAMs), but I've already planned out how it will fit in and I don't see it being much of a problem to come back to at a later time. To be fair, if you want to be really cycle accurate, you also need a cycle-accurate RDRAM model to simulate the latency of the RAMs... which involves keeping track of the refreshes and accesses... I'm okay with saving this for a later time in the interest of better overall support at the moment :p.

I haven't bothered with the VI interface at all yet, either. I just do a naive copy to video texture memory and raster it on a quad. Again... planning to do this at a later date but pushing it off in interest of getting to the RSP/RDP/etc. My philosophy is to build a structure that stands as a proof of concept, then paint the walls and fill in the holes. :). Something like the VI interface will hopefully be amenable to buffering and vectorization to help eliminate such performance issues.

I understand your decision to not release the source. I'll let you know if I run into any issues -- thanks for the offer to help!

- M
 

Zuzma

New member
Hmm so I'm trying to build it on fedora 18 32bit except it's just spitting out a bunch of errors. I'm pretty sure i installed everything I'd need to compile it (gcc, libglfw). Actually I had to install glfw 2.7.8 manually because it wasn't installing properly from the package manager. Before it complained about glfw.h missing. Anyway I'm sure I just did something wrong it's just not obvious to me.

CP2.c:1311:3: error: ‘vdVector’ undeclared (first use in this function)
CP2.c:1311:28: error: ‘vsVector’ undeclared (first use in this function)
CP2.c:1312:28: error: expected expression before ‘)’ token
CP2.c:1313:28: error: expected expression before ‘)’ token
CP2.c:1302:13: warning: unused variable ‘vd’ [-Wunused-variable]
CP2.c:1301:13: warning: unused variable ‘acc’ [-Wunused-variable]
CP2.c:1300:19: warning: unused variable ‘vs’ [-Wunused-variable]
CP2.c:1299:19: warning: unused variable ‘vt’ [-Wunused-variable]
make[1]: *** [Objects/CP2.o] Error 1
make: *** [librsp] Error 2
 
OP
MarathonMan

MarathonMan

Emulator Developer
Hmm so I'm trying to build it on fedora 18 32bit except it's just spitting out a bunch of errors. I'm pretty sure i installed everything I'd need to compile it (gcc, libglfw). Actually I had to install glfw 2.7.8 manually because it wasn't installing properly from the package manager. Before it complained about glfw.h missing. Anyway I'm sure I just did something wrong it's just not obvious to me.

CP2.c:1311:3: error: ‘vdVector’ undeclared (first use in this function)
CP2.c:1311:28: error: ‘vsVector’ undeclared (first use in this function)
CP2.c:1312:28: error: expected expression before ‘)’ token
CP2.c:1313:28: error: expected expression before ‘)’ token
CP2.c:1302:13: warning: unused variable ‘vd’ [-Wunused-variable]
CP2.c:1301:13: warning: unused variable ‘acc’ [-Wunused-variable]
CP2.c:1300:19: warning: unused variable ‘vs’ [-Wunused-variable]
CP2.c:1299:19: warning: unused variable ‘vt’ [-Wunused-variable]
make[1]: *** [Objects/CP2.o] Error 1
make: *** [librsp] Error 2

Are you compiling in a VM? What processor does your system have?

Also: performance is going to be quite bad on 32-bit systems. I natively implement the 64-bit wide registers and datapath of the VR4300, even for the majority of the time only the lower 32-bits are used. If you have a 64-bit processor, the gains from using that will be tremendous.
 
Last edited:

Zuzma

New member
Ah sorry yeah I'm running it in a VM (virtualbox). As for my CPU it's a Core 2 DUO P8400 which Supports up to SSE4.1.

Edit: I guess if you're working with it on a 64bit platform I'll just switch to that. Really I just wanted to try it out, not so much to see how fast it would go.
 
Last edited:
OP
MarathonMan

MarathonMan

Emulator Developer
Ah sorry yeah I'm running it in a VM (virtualbox). As for my CPU it's a Core 2 DUO P8400 which Supports up to SSE4.1.

Edit: I guess if you're working with it on a 64bit platform I'll just switch to that. Really I just wanted to try it out, not so much to see how fast it would go.

Yup, that chip is supported... a 64-bit environment might solve your issue.
 

Lajamerr

New member
I've been following this project for a while now. Glad to see keep on getting better.

I'm just starting out programming wise. Hope someday I can be of some help(Only if you haven't completed it by then.)

Anyways is this link any help?

gamedev.net/page/resources/_/technical/general-programming/practical-cross-platform-simd-math-r3068?recache=true
 
Last edited:
OP
MarathonMan

MarathonMan

Emulator Developer
I've been following this project for a while now. Glad to see keep on getting better.

I'm just starting out programming wise. Hope someday I can be of some help(Only if you haven't completed it by then.)

Anyways is this link any help?

gamedev.net/page/resources/_/technical/general-programming/practical-cross-platform-simd-math-r3068?recache=true

The RSP is already taking advantage of vectorization via SSSE3/SSE4.1: https://github.com/tj90241/cen64-rsp/blob/master/CP2.c#L548

Thanks though!
 
OP
MarathonMan

MarathonMan

Emulator Developer
Found the bug that was preventing my first commercial ROM from booting! :D

Unfortunately, fixing it is making something else occur and I'm not sure what's going on. I can tell that something's going on, as the ROM is now accepting input and, while very corrupted, you can vaguely see bits and pieces of what's supposed to be displayed.
 
OP
MarathonMan

MarathonMan

Emulator Developer
SO close.

http://www.emutalk.net/attachment.php?attachmentid=38579&stc=1&d=1367098882

"Your memory pak is corrupted" screen doesn't show, and the splash ties up within a second or so. One of the threads is misbehaving and I can't figure out what's causing it to run amok!

Framerate is FANTASTIC (imo) considering that one thread is basically destroying the simulator. Think I"ll be able to get a full 60VI/s easily once it's fixed.

EDIT: Lots of performance improvements today. Did a pretty hefty rewrite of a lot of the VR4300 core. Some ROMs are now displaying >= 60 VI/s. I'm also 95% certain that I found the bug is causing Namco to not boot fully. We'll see...
 

Attachments

  • namco-splash.png
    namco-splash.png
    32.9 KB · Views: 278
Last edited:
OP
MarathonMan

MarathonMan

Emulator Developer
Last edited:

ShadowFX

Guardian
Looking good so far. I'm curious to see what more you squeeze out of your performance hat :)
I read you can now boot a couple of commercial games, I take it this is still without any RDP emulation? I wonder what performance impact that will bring, while trying to be as close to hardware as possible.
 

XICO2KX

New member
That was a big step.
:)
Seeing games run like that it's a huge progress! :w00t:
Congratulations for all your work, MarathonMan! :holiday:
By the way, when you add music/sound in the future, will that slow things down or is that already being processed but not being output? :phone:
 
OP
MarathonMan

MarathonMan

Emulator Developer
Looking good so far. I'm curious to see what more you squeeze out of your performance hat :)
I read you can now boot a couple of commercial games, I take it this is still without any RDP emulation? I wonder what performance impact that will bring, while trying to be as close to hardware as possible.

The performance hat is getting harder and harder to reach into. There's likely not going to be huge gains made by rearranging code, and the code itself is about as trim and flat as I'll ever be able to make it without losing accuracy. Soon, I'm either going to have to abandon pure interpretation (most likely) or do some aggressive mutli-threading. I've been profiling the simulator and it's executing >2 x86 instructions per cycle (even on older machines), which is nearing the tip of the iceberg as to what common x86 processors are capable of. That being said, I've also see ROMs pushing 2.5+ IPC.

Really, as far as performance is concerned, everything is dependent upon how amenable this approach is to multi-threading. If I can thread it well.

This is without any RCP emulation right now; just the bus/interrupts/and essential interfaces, PIF, and VR4300 (though I suppose the RSP is also factored into things and cycling, so that's not totally true).

Seeing games run like that it's a huge progress! :w00t:
Congratulations for all your work, MarathonMan! :holiday:
By the way, when you add music/sound in the future, will that slow things down or is that already being processed but not being output? :phone:

Thanks! Music and sound is actually already being processed and written out to the appropriate places within the simulator, I just don't have any code that takes that data and drops it off on the sound card. Adding this should come with virtually no performance loss at all.
 

Nintendo Maniac

New member
AFAIK sound was done on the main MIPS CPU, so by simulating/emulating the CPU you simulate/emulate the audio as well.

In other news, regarding the input, I just hope we are able to map multiple keys to a single N64 button and map multiple N64 buttons to a single key. It particularly annoys me that certain emulators only allow one or the other, but not both.

As for the performance, maybe I'm in the minority since I wouldn't be able to run CEN64 anyway, but I think the performance seems pretty good already considering that you're using "only" a Sandy Bridge and Haswell is coming in like a month or so.
 
Last edited:
OP
MarathonMan

MarathonMan

Emulator Developer
AFAIK sound was done on the main MIPS CPU, so by simulating/emulating the CPU you simulate/emulate the audio as well.

In other news, regarding the input, I just hope we are able to map multiple keys to a single N64 button and map multiple N64 buttons to a single key. It particularly annoys me that certain emulators only allow one or the other, but not both.

As for the performance, maybe I'm in the minority since I wouldn't be able to run CEN64 anyway, but I think the performance seems pretty good already considering that you're using "only" a Sandy Bridge and Haswell is coming in like a month or so.

Configurable options are a LONG ways down the road. I'm fine with static key bindings for now. Once I get some more games working, I'm going to purchase a USB joystick and muck around with that, but even that's off in the future.

TBH, newer chips won't help performance all that much. The simulator likes frequency, and frequency only. The only thing that Haswell has to offer for this is a very, very slightly higher IPC and a beefier vector unit (which would accelerate RSP CP2 functions ATM).
 

Nintendo Maniac

New member
If it likes frequency, then overclocked Steamroller? :p

But realistically someone should bench CEN64 on an OC'd Haswell when it launches so that we know for absolute sure if it'd be enough or not.
 
OP
MarathonMan

MarathonMan

Emulator Developer
If it likes frequency, then overclocked Steamroller? :p

All jokes aside, it's actually probably one of the better processors to run it on.

Haswell will NOT make enough of a difference over SB/IB. You could overclock a low-end SB/IB chip and it'll run circles around the stock Haswell chip as far as the simulator's performance goes.
 

Top