What's new

Announcement: Cycle-accurate N64 development underway.

F

Fanatic 64

Guest
Decided to license as BSD 3-clause. Do whatever you want with the source.

For updates, follow me on github and star the sources.

There are several (err, seven?) plugins; I'll be working on pushing them up to github and update both this post and the first post accordingly. None of them will be of much use until I upload the framework that ties all the plugins together (which will likely be done last to make sure everything checks out).

** Primary Module **
Framework: http://github.com/tj90241/cen64

** Submodules **
AUDIO: http://github.com/tj90241/cen64-audio
BUS: http://github.com/tj90241/cen64-bus
PIF: http://github.com/tj90241/cen64-pif
PI: http://github.com/tj90241/cen64-rom
RSP: http://github.com/tj90241/cen64-rsp
VI: http://github.com/tj90241/cen64-video
VR4300: http://github.com/tj90241/cen64-vr4300

To compile: You only need to clone the framework. git will take care of pulling each individual submodule for you. More TBA.
The link to the framework is broken.

New BSD License? Sounds good to me.
 
F

Fanatic 64

Guest
It now works. It previously said not found or something like that.
 

DETOMINE

New member
Look amazing :holiday:

I don't want to sound mean (sorry :blush: ), but there are small typos in your readme files (audio/pif/rom/video), it should said "interface" instead of "inteface" (first line).
I thought you may want to know.
 
OP
MarathonMan

MarathonMan

Emulator Developer
Look amazing :holiday:

I don't want to sound mean (sorry :blush: ), but there are small typos in your readme files (audio/pif/rom/video), it should said "interface" instead of "inteface" (first line).
I thought you may want to know.

Thank you! I will fix this when I get the chance.
 

Mizox

New member
since, according to your readme at least, we can't really compile this under windows, do you think you could provide a pre-compiled windows executable for those of us without linux to try out?
 
OP
MarathonMan

MarathonMan

Emulator Developer
since, according to your readme at least, we can't really compile this under windows, do you think you could provide a pre-compiled windows executable for those of us without linux to try out?

I suppose I should clarify that, too. You *can* build under Windows, you just need to use cygwin (with gcc >= 4.7). There's not a whole lot to see right now; only a few select freeware ROMs will draw to the framebuffer; this was more of a plea for contributors. :p
 

zilmar

Emulator Developer
Moderator
I had always wanted to get Project64 more accurate .. tho always something else to do ..

I did look at opcode timing at one stage .. From memory a nop was increasing the count by 1/2 .. so you need to to do two nops for each increase in the count reg.

I was curious are you doing timings on a real n64, or just using documentation.

for example, does dadd take more cycles then add or how long does ddiv/ddmult take .. I know your looking at the cache side of things as well for stalls.
 
OP
MarathonMan

MarathonMan

Emulator Developer
I had always wanted to get Project64 more accurate .. tho always something else to do ..

I did look at opcode timing at one stage .. From memory a nop was increasing the count by 1/2 .. so you need to to do two nops for each increase in the count reg.

I was curious are you doing timings on a real n64, or just using documentation.

for example, does dadd take more cycles then add or how long does ddiv/ddmult take .. I know your looking at the cache side of things as well for stalls.

Yep, count is incremented at half the pclock.

Code:
vr4300->cp0.regs.count += (vr4300->pipeline.cycles & 0x01);

My "real life" job is creating a cycle-accurate ARMv8 simulator, so most of the times I'll draw out on paper whatever I think should be the case. In most cases right now, I'm not as concerned with the delays as I am getting full pipeline functionality implemented due to the overwhelming amount of things there are to do. If it bothers me a lot or I feel it's really necessary, I'll cross-verify whatever I thought was true.

A DADD wouldn't take any longer than a ADD; the ALU contains a 64-bit adder. This is needed, for example, to calculate the PC in one-cycle for taken branch instructions (if PC = 0x8000 0000 FFFF FFFC -- a 32-bit adder would not suffice here). The only instructions that should cause MCIs are (most) of the CP1 instructions due to the fact that they have to normalize on top of performing whatever other things they have to do -- too much work for one cycle. This is especially true of things like sqrt and div; they will be SEVERAL cycles. Later on, I need to explore to see if CP1 branches are also delayed.

EDIT: The cache delays will be much more difficult. In order to determine those properly, I need to model the RDRAM. The VR4300 has a 4-entry write buffer, but if it misses the write buffer, there will probably be 100+ cycle delays.
 
Last edited:

zilmar

Emulator Developer
Moderator
In essence I have defaulted each op to take 4 pipeline cycles, to average out for the lack of timing for load operations ..

so many timing issue (well most games are not so bad about them)

things like donkey kong 64 intro title sequence you can really see it.

there are a few other games where I think what your doing will make a difference as well.
 
OP
MarathonMan

MarathonMan

Emulator Developer
In essence I have defaulted each op to take 4 pipeline cycles, to average out for the lack of timing for load operations ..

so many timing issue (well most games are not so bad about them)

things like donkey kong 64 intro title sequence you can really see it.

there are a few other games where I think what your doing will make a difference as well.

That seems odd that the games would rely on timing that much... a quick and dirty way to get semi-accurate timing would be to model the data/instruction cache. Since it's direct-mapped, you can just mask off the lower bits and memoize the tag in an array. Wouldn't add that much overhead. If you miss in the cache, just "stall" for 100+ cycles.
 
OP
MarathonMan

MarathonMan

Emulator Developer
FPU is coming along very well. LaC's fire demo is now rendering. :)

http://www.emutalk.net/attachment.php?attachmentid=38558&stc=1&d=1365275206

Here's the scary part: after most of my optimizations that I've been hammering into the core (prior to open-sourcing), this demo seems to be getting closer and closer to native speed. Albeit, I'm running it on a highly overclocked PC, but progress nonetheless!

EDIT: The "messed up colors" is due to the fact that I still don't filter the graphics properly (same thing with the pong issue from a few weeks ago).
 

Attachments

  • fire-nofilter.png
    fire-nofilter.png
    51.5 KB · Views: 318
Last edited:

Guru64

New member
Neat. Is there any reason you've implemented the floating point instructions using inline assembly instead of just straight up C? Also, why are you using the x87 FPU instead of SSE?

Your code looks really clean, by the way. If I had more N64 emulation know-how/time, I'd love to help you out. :)
 
OP
MarathonMan

MarathonMan

Emulator Developer
Neat. Is there any reason you've implemented the floating point instructions using inline assembly instead of just straight up C? Also, why are you using the x87 FPU instead of SSE?

Your code looks really clean, by the way. If I had more N64 emulation know-how/time, I'd love to help you out. :)

Thanks. I've been trying to keep everything clean in hopes to attract others and maybe use it as a resume-builder if I need to.

I really didn't want to resort to the assembly, but I ended up having to. Basically, it all boils down to FPUClearExceptions() and FPUUpdateState(). IEEE FP mandates that you have to keep a handful of these "status" bits and update them based on the result of the FPU operations. They are often used to create exceptions and whatnot. Anyways, the VR4300 has these status bits too, so I need to simulate them, too.

The problem is that the compiler is free to reorder the code and it might be the case that even though I've written:
Code:
FPUClearExceptions();
/* ...  */
FPUUpdateState();

there's no guarantee that the code I want will actually get executed between those two function calls. So I could miss raising flags when I should. The solution was to use *volatile* inline assembly blocks to force the compiler to insert code at that exact, specific location.

I used the x87 FPU over SSE as the latter seems less-conformed to the IEEE spec. I could be totally off my rocker on that assumption, though.

Help is always welcome!
 

Guru64

New member
I was thinking that the 80-bit precision used by the x87 could potentially cause some problems which you wouldn't have with SSE. Apparently it's possible to set the x87 up to use different precisions (for the mantissa at least), so perhaps that should be looked into. Either way, I guess it's more of a problem if you're calculating a longer floating point formula (the instruction implementations are bound to be rather short mostly) and it's going to convert it to the proper precision after every emulated instruction at least, so there's that.
 
OP
MarathonMan

MarathonMan

Emulator Developer
I was thinking that the 80-bit precision used by the x87 could potentially cause some problems which you wouldn't have with SSE. Apparently it's possible to set the x87 up to use different precisions (for the mantissa at least), so perhaps that should be looked into. Either way, I guess it's more of a problem if you're calculating a longer floating point formula (the instruction implementations are bound to be rather short mostly) and it's going to convert it to the proper precision after every emulated instruction at least, so there's that.

Each register is flushed to a 32/64-bit FPR immediately the calculation. Thus the extra precision is not an issue. :)
 
OP
MarathonMan

MarathonMan

Emulator Developer
Now with working colors... :)

Video:
EDIT: Added a few lines to the fetch unit in the VR4300. Performance is nearly that of native hardware on high-end machines. :) I'm becoming increasingly confident in the feasibility of this.
 

Attachments

  • fire.png
    fire.png
    39 KB · Views: 238
Last edited:

Top