What's new

Announcement: Cycle-accurate N64 development underway.

F

Fanatic 64

Guest
Oh please Nintendo Maniac, don't start a conflict where there isn't one...
 

Nintendo Maniac

New member
You were provoking Mudlord by insisting he was "spewing hate" to MarathonMan even after he explained it.
Uh, I think you're over-dramatizing what I was intending... Basically I found his seemingly change-of-heart interesting and wanted to know what was up.

No offense or anything, but is it just me or do you seem to see extra drama in things that didn't really have it in the first place?
 

Nintendo Maniac

New member
Now I'm more of a hardware guy (last night I just rewired a nice working keyboard that had a failing cable, after washing now it's as good as new), so apologies if I sound like a total coding noob, but if I understand correctly, you're optimizing and reducing the amount of instructions that occur, which therefore means faster performance?
 

sanni

New member
Great progress MarathonMan!
I must admit I didn't think it would move along that fast. Last time I checked on CEN64 it did barely run some homebrew and now we already have some commerical games booting.
:sorcerer:

It did build without any problems on Windows 8.1 preview.
unbenanntliubu.png


Also it seems to benefit from multicore usage. Or at least distribute the load equally between 2 cores.
 
Last edited:
Well I was unable to get it to run at all it crashes for me, my first guess is that you are using ssse3 (as my CPU doesn't support it :p)?
 
Last edited:
OP
MarathonMan

MarathonMan

Emulator Developer
I wrote a rudimentary profiler to help me determine where all my execution time is currently going.

Surprisingly, it seems like the RSP is the biggest culprit.

From the main screen of marshall's "funnel cube" demo:
Code:
     ACCOUNT_AIF: [       112395964 clocks], 1.896240%
     ACCOUNT_RDP: [      1279115451 clocks], 21.580038%
     ACCOUNT_RSP: [      2894066366 clocks], 48.825981%
     ACCOUNT_VIF: [       420219833 clocks], 7.089556%
  ACCOUNT_VR4300: [      1221510880 clocks], 20.608191%

A simple framebuffer-only ROM:
Code:
     ACCOUNT_AIF: [       115961791 clocks], 3.337749%
     ACCOUNT_RDP: [               0 clocks], 0.000000%
     ACCOUNT_RSP: [       347965952 clocks], 10.015566%
     ACCOUNT_VIF: [       407961688 clocks], 11.742434%
  ACCOUNT_VR4300: [      2602362152 clocks], 74.904251%

Star Fox intro, light beam (tried to get a section where the RDP has a big scene to render):
Code:
     ACCOUNT_AIF: [       113339853 clocks], 3.086154%
     ACCOUNT_RDP: [      1200997990 clocks], 32.702221%
     ACCOUNT_RSP: [       856690284 clocks], 23.326996%
     ACCOUNT_VIF: [       422129545 clocks], 11.494251%
  ACCOUNT_VR4300: [      1079369744 clocks], 29.390379%

Hmm...
 

mrmudlord

New member
A small question.

How much performance do you reckon we would gain by doing some micro-optimizations on the color combiner? I noticed some things in there that definitely can be redone in SSE (like R/G/B/A shit which perfectly fits into a vector). I was pondering since many of those functions are called pretty often, might help performance a teensy bit (and any bit matters).
 
OP
MarathonMan

MarathonMan

Emulator Developer
A small question.

How much performance do you reckon we would gain by doing some micro-optimizations on the color combiner? I noticed some things in there that definitely can be redone in SSE (like R/G/B/A shit which perfectly fits into a vector). I was pondering since many of those functions are called pretty often, might help performance a teensy bit (and any bit matters).

One of the things I was looking at doing eventually, because I am certain that it would be helpful, but all the colors are passed around as pointers, so you have to coallocate them on the stack before pulling them into a xmm register.

I've rewritten one or two functions so that they take a pointer to an array of 4 ints and just load off that, but it takes a long time to refactor everything. See spanptr and dincsptr here: https://github.com/tj90241/cen64-rdp/blob/master/TCLod.c#L45
 

Nintendo Maniac

New member
I guess I'll wait for an SSE2 version or w/e

Remember, it uses software rendering, so any Intel system since the Core Duo should run it just fine regardless of crappy Intel integrated graphics.

DISCLAIMER: I haven't actually tried doing such a thing yet.

EDIT: Err, CEN64 is 64bit only isn't it? The Core Duo was 32bit only... guess a Core 2-based CPU is the minimum requirement then!

EDIT2: Well technically the newer 64bit-capable Atom CPUs could run it, but the minimum I'm referring to is the age of the CPU model, not the performance. :p

EDIT3: Maybe it's just me, but I still think it'll be a bit of a shame that Thuban CPUs won't be able to run CEN64. I personally don't care much for the quad, tri, and dual core Phenom IIs, just the Thubans (and I guess Zosma by association).
 
Last edited:

beannaich

New member
Haven't checked back in a while, and as usual progress is made! Great job man, this is nothing short of amazing what you're doing here. Have you emulated the pipeline stalls and slips? I'd be very interested to see your method for doing so, as my implementations left a lot to be desired. Also, side note, is there any documentation of these conditions and how they're handled? I kind of just guessed about what would cause problems, and how they'd be resolved.

If cycle accurate N64 isn't usable at playable frame rates then it will still be a massive boon for the speed running community! I look forward to more progress, you are doing the lord's work :)
 

Top