What's new

Announcement: Cycle-accurate N64 development underway.

Rodimus Primal

New member
I have to admit MarathonMan, every time you post an update I get a little excited that a really great running N64 emulator will be out and playable soon enough. I wish I knew enough about coding to help out.
 
OP
MarathonMan

MarathonMan

Emulator Developer
Now, is said SSE optional? Cause you obviously don't want it on ARM and/or PowerPC since SSE is x86-exclusive (though ARM does have NEON).

Optional, but I haven't implemented the "non-SSE" variant in interest of time. Code is basically:

void func_t() {

#ifdef _USE_SSE_
_mm_adds_epi16(...)
...
#else
...

}

You just need to compile with flags that tell the compiler which variant to compiler for:
gcc -D_USE_SSE_ ... -o librsp_sse.a
or
gcc ... -o librsp_generic.a
 
Last edited:
OP
MarathonMan

MarathonMan

Emulator Developer
That sounds like ps2 emulator level stuff. Pretty sure they have some kind of dynamic recompiler for the systems vector unit. Also sure it's fast now but would hooking it up to the main CPU and RCP slow it down somewhat or are you not worried about that?

Running dynarec cores isn't at all an issue, so no. I could even have some interpreted cores and dynarec cores working in unison if I wanted.
 
OP
MarathonMan

MarathonMan

Emulator Developer
Yep, turns out the entire RSP vector opcode set maps perfectly to SSE. Not a single one of my vector instructions have a loop when SSE is enabled.

Hope you have SSE4.1 though!

Code:
0000000000000180 <RSPVMADH>:
     180:       89 f2                   mov    %esi,%edx
     182:       89 f0                   mov    %esi,%eax
     184:       c5 f9 6f 15 00 00 00    vmovdqa 0x0(%rip),%xmm2        # 18c <RSPVMADH+0xc>
     18b:       00 
     18c:       c1 ea 10                shr    $0x10,%edx
     18f:       c1 e8 06                shr    $0x6,%eax
     192:       83 e2 1f                and    $0x1f,%edx
     195:       83 e0 1f                and    $0x1f,%eax
     198:       48 c1 e2 04             shl    $0x4,%rdx
     19c:       c5 fa 6f 24 17          vmovdqu (%rdi,%rdx,1),%xmm4
     1a1:       89 f2                   mov    %esi,%edx
     1a3:       c1 ee 0b                shr    $0xb,%esi
     1a6:       83 e6 1f                and    $0x1f,%esi
     1a9:       c1 ea 15                shr    $0x15,%edx
     1ac:       48 c1 e6 04             shl    $0x4,%rsi
     1b0:       83 e2 0f                and    $0xf,%edx
     1b3:       c5 fa 6f 1c 37          vmovdqu (%rdi,%rsi,1),%xmm3
     1b8:       48 c1 e2 04             shl    $0x4,%rdx
     1bc:       c4 e2 59 00 a2 00 00    vpshufb 0x0(%rdx),%xmm4,%xmm4
     1c3:       00 00 
     1c5:       c5 f9 73 dc 08          vpsrldq $0x8,%xmm4,%xmm0
     1ca:       c4 e2 79 23 fc          vpmovsxwd %xmm4,%xmm7
     1cf:       c4 e2 61 00 da          vpshufb %xmm2,%xmm3,%xmm3
     1d4:       c4 e2 79 23 cb          vpmovsxwd %xmm3,%xmm1
     1d9:       c5 e1 73 db 08          vpsrldq $0x8,%xmm3,%xmm3
     1de:       c4 e2 79 23 f3          vpmovsxwd %xmm3,%xmm6
     1e3:       c4 e2 79 23 c0          vpmovsxwd %xmm0,%xmm0
     1e8:       89 c2                   mov    %eax,%edx
     1ea:       c4 e2 41 40 c9          vpmulld %xmm1,%xmm7,%xmm1
     1ef:       c5 fa 6f 9f 00 02 00    vmovdqu 0x200(%rdi),%xmm3
     1f6:       00 
     1f7:       48 c1 e2 04             shl    $0x4,%rdx
     1fb:       c4 e2 79 40 c6          vpmulld %xmm6,%xmm0,%xmm0
     200:       c5 fa 6f a7 10 02 00    vmovdqu 0x210(%rdi),%xmm4
     207:       00 
     208:       c5 d9 61 eb             vpunpcklwd %xmm3,%xmm4,%xmm5
     20c:       c5 d9 69 db             vpunpckhwd %xmm3,%xmm4,%xmm3
     210:       c5 f1 fe cd             vpaddd %xmm5,%xmm1,%xmm1
     214:       c5 f9 fe c3             vpaddd %xmm3,%xmm0,%xmm0
     218:       c5 f1 6b d8             vpackssdw %xmm0,%xmm1,%xmm3
     21c:       c4 e2 61 00 d2          vpshufb %xmm2,%xmm3,%xmm2
     221:       c5 f9 7f 14 17          vmovdqa %xmm2,(%rdi,%rdx,1)
     226:       c5 f9 7f 8f 10 02 00    vmovdqa %xmm1,0x210(%rdi)
     22d:       00 
     22e:       c5 f9 7f 87 00 02 00    vmovdqa %xmm0,0x200(%rdi)
     235:       00 
     236:       89 87 80 02 00 00       mov    %eax,0x280(%rdi)
     23c:       c3                      retq   
     23d:       0f 1f 00                nopl   (%rax)
 
Last edited:
OP
MarathonMan

MarathonMan

Emulator Developer
...yeah, you're going to want to make that dynamically available or something, because even Thuban doesn't have anything above SSE3.

Heh, it'll be on my todo list, no worries. It's just not a foremost concern. I want to get a picture before I start working on features and older hardware.

EDIT: Yeah, just looked... I'm putting it off til later. On the plus side, only two intrinsics are from 4.1 (_mm_cvtepi16_epi32 and _mm_mullo_epi32). Maybe somebody else interested in the project will patch them for me during betas. :p

OTOH, anyone who wants to take advantage of SSE acceleration and doesn't have SSSE3 is basically out of luck ATM.
 
Last edited:
OP
MarathonMan

MarathonMan

Emulator Developer
Umm... on an unrelated note, would it be too ridiculous to suggest an overclocking function like that in 1964 Ultra Fast?

That's be so easy to implement. I'd just call cycle() on all the cores more frequently.

Though I don't you'll be able to handle it unless you have a _really_ beefy system. This is a cycle accurate simulator that already has ~100 million cycles on multiple cores to process every second.
 

Nintendo Maniac

New member
Just thinking that World Driver Championship seemed to have a variable framerate, so I'd like to take advantage of that in the future. Also Smash Bros. 64 tourney players would probably appreciate it since we can't expect the cartridges and consoles to last forever. (4-player apparently doesn't run at full speed on an N64 and requires an overclocked system for full speed)

But I gotta ask, if it's so easy to implement then why has no other emulator implemented it? Is it a limitation of HLE or something?
 
Last edited:
OP
MarathonMan

MarathonMan

Emulator Developer
Just thinking that World Driver Championship seemed to have a variable framerate, so I'd like to take advantage of that in the future. Also Smash Bros. 64 tourney players would probably appreciate it since we can't expect the cartridges and consoles to last forever. (4-player apparently doesn't run at full speed on an N64 and requires an overclocked system for full speed)

But I gotta ask, if it's so easy to implement then why has no other emulator implemented it? Is it a limitation of HLE or something?

Cycle accuracy just makes it easier because I directly control the clockrate.

I imagine that it wouldn't be _that_ much harder; maybe there's not enough of a push for it? I don't know.
 

XICO2KX

New member
I have a feeling that my cycle-accurate core is faster than PJ64's intepreted core, and continually nearing the recompiler core. Whenever/if I implement dynarec, the performance should be comparable. :)
That's great news, MarathonMan! :drool:
I hope we'll soon see N64 games running flawlessly using your project! :happy:
Keep up the really awesome work, man! ;)
 

XICO2KX

New member
Now, is said SSE optional? Cause you obviously don't want it on ARM and/or PowerPC since SSE is x86-exclusive (though ARM does have NEON).
After the SSE implementation is done, if someone wants to port it to Altivec (used in PowerPC), these technical docs from Apple might make it easier! :shifty:
Code:
h**p://developer.apple.com/legacy/mac/library/documentation/Performance/Conceptual/Accelerate_sse_migration/Accelerate_sse_migration.pdf#page=25
It explains Altivec->SSE migration, but you can also make it the other way around! :teehee:
I'm curious to see if this will ever run (at decent speeds) on the Wii!:rolleyes:
 
Last edited:

Nintendo Maniac

New member
Wow, you joined all the way back in 2006 yet made only one post and then make your second and third 7 years later one after another? o_O

Did someone happen to link you here from another site by any chance?


As for Altivec, won't that only be useful once next-gen consoles get homebrew'd? (Wii U is only running homebrew in Wii Mode currently) I don't think the PS360 have enough CPU grunt considering Marcan of Wii homebrew fame claims they have Pentium4-esque IPC.
 
Last edited:

DaFox

New member
I signed up in 2005 and my second post was in this thread, I don't see why it's a big deal. This is really one of the few interesting thing to happen in N64 Emulation since around then anyway.

There's a whole lot of technical info going on at forum.pj64-emu.com/showthread.php?t=3445 Some great back and forth later on in the thread once the trolls die out.
 

XICO2KX

New member
Wow, you joined all the way back in 2006 yet made only one post and then make your second and third 7 years later one after another? o_O

Did someone happen to link you here from another site by any chance?
Yes, but better late than never! :p
This thread seemed worthy for my 1st post! ;)
Actually, I did all my 3 first post the same day! :happy:
As for Altivec, won't that only be useful once next-gen consoles get homebrew'd? (Wii U is only running homebrew in Wii Mode currently) I don't think the PS360 have enough CPU grunt considering Marcan of Wii homebrew fame claims they have Pentium4-esque IPC.
Yes, you are right!:unsure:
At least for the Wii's Broadway CPU (an IBM PowerPC 750CL variation) does not include Altivec!:ermm:
Sorry, my mistake!:whistling
But it does support some SIMD instructions named Paired Single instructions, which you can find more about in this technical doc!:satisfied
On the other hand, both Xbox360 and PS3 CPUs support those Altivec SIMD instructions! :shifty:
But like you said, not sure if they have enough speed to run this kind of accurate N64 emulation!:geek:
 

Top