What's new

Announcement: Cycle-accurate N64 development underway.

OP
MarathonMan

MarathonMan

Emulator Developer
The RDP renders into the RDRAM. The VIF never writes to the RDRAM, it reads pixels from the RDRAM, processes them and outputs one color component per VI cycle. The VIF part of the job is represented in my function rdp_update(), which reads pixels from the emulated RDRAM, processes them and writes them into an off-screen DirectDraw surface, which represents the VI coordinate space, or, if you will, TV image coordinate space. There's thus no conceptual difference between the real hardware and my plugin, your claim is unsubstantiated. rdp_update() function implements various filters that the VI applies correctly, but it might not update certain VI registers properly, mainly because the Zilmar's plugin spec doesn't support the required level of synchronization between the RDP and CPU.

I never said the VIF writes into RDRAM:

instead of copying the frame to RDRAM and allowing the VIF to display the image

The DMAs aren't emulated, so that's a bit of an oversimplification.

I didn't know it was writing to an offscreen surface, though.
 
F

Fanatic 64

Guest
Daedalus is already on both PC x86 and PSP MIPS. Also Java, OSX, Linux and Android(ARM, X86, MIPS) are planned. That will be 5 platforms, and sub-platform(Java) on 3 different processors and 6 total ports. We already have 2 and 3 different coders working on three of the others.
And this is relevant to this thread because...?
 
OP
MarathonMan

MarathonMan

Emulator Developer
Things have been at a standstill for the last few weeks. I've been busier than usual and haven't found the time to work on CEN64 recently.

I'm now able to simulate homebrew ROMs at full speed on fast systems.

Namco Arcade is holding >= ~50VI/s.

The best part is that everything will appear to run "fast" right now due to the fact that cycle delays are not implemented yet. Once they are, CEN64 will have even more headroom than it does presently, enabling it to run on lower-spec machines.
 
OP
MarathonMan

MarathonMan

Emulator Developer
But will it blend? :p

IDK :D

The slow pace of this project has been frustrating me somewhat lately. So, I've decided to change things up a little.

I'm going to work on porting/writing plugins with cycle-accuracy and high-performance in mind, but for HLE emulators. There will be no immediate benefits to accuracy at all for some time, but it will enable me to test my plugins and not have to deal with the frustrations of not knowing which component of my emulator is broken.

EDIT: Looking into testing with Mupen64Plus ATM, since it seems the most cross-platform friendly.
 
Last edited:
F

Fanatic 64

Guest
You could later (when the plugins are actually accurate) make a Zilmar Spec port, as far as I know it's pretty easy.

:afro:
 
OP
MarathonMan

MarathonMan

Emulator Developer
You could later (when the plugins are actually accurate) make a Zilmar Spec port, as far as I know it's pretty easy.

:afro:

AFAIK, Mupen64Plus and Zilmar-spec are more or less the same.

EDIT: This was a fantastic idea. Beating down bugs like nobody's business. :D
 
Last edited:

beannaich

New member
This was a fantastic idea. Beating down bugs like nobody's business. :D

Always a good idea (whenever possible) to test your code against a working emulator. Whether it be following along with an execution log, or writing plug-ins. That's how I've always made my emulators :)
 
OP
MarathonMan

MarathonMan

Emulator Developer
Always a good idea (whenever possible) to test your code against a working emulator. Whether it be following along with an execution log, or writing plug-ins. That's how I've always made my emulators :)

I'm guilty of preoptimization. I was doing unit testing (or at least trying), but was being rather careless about it. There are all these really latent bugs everywhere that the execution logs weren't revealing. Example:

Code:
static const ShuffleKey VectorOperandsArray[16] = { 
  /* -- */ {0x0,0x1,0x2,0x3,0x4,0x5,0x6,0x7,0x8,0x9,0xA,0xB,0xC,0xD,0xE,0xF},
  /* -- */ {0x0,0x1,0x2,0x3,0x4,0x5,0x6,0x7,0x8,0x9,0xA,0xB,0xC,0xD,0xF,0xF},
  /* 0q */ {0x0,0x1,0x0,0x1,0x4,0x5,0x4,0x5,0x8,0x9,0x8,0x9,0xC,0xD,0xE,0xD},
  /* 1q */ {0x2,0x3,0x2,0x3,0x6,0x7,0x6,0x7,0xA,0xB,0xA,0xB,0xE,0xF,0xF,0xF},
  /* 0h */ {0x0,0x1,0x0,0x1,0x0,0x1,0x0,0x1,0x8,0x9,0x8,0x9,0x8,0x9,0x8,0x9},
  /* 1h */ {0x2,0x3,0x2,0x3,0x2,0x3,0x2,0x3,0xA,0xB,0xA,0xB,0xA,0xB,0xA,0xB},
  /* 2h */ {0x4,0x5,0x4,0x5,0x4,0x5,0x4,0x5,0xC,0xD,0xC,0xD,0xC,0xD,0xC,0xD},
  /* 3h */ {0x6,0x7,0x6,0x7,0x6,0x7,0x6,0x7,0xE,0xF,0xE,0xF,0xE,0xF,0xE,0xF},
  /* 0w */ {0x0,0x1,0x0,0x1,0x0,0x1,0x0,0x1,0x0,0x1,0x0,0x1,0x0,0x1,0x0,0x1},
  /* 1w */ {0x2,0x3,0x2,0x3,0x2,0x3,0x2,0x3,0x2,0x3,0x2,0x3,0x2,0x3,0x2,0x3},
  /* 2w */ {0x4,0x5,0x4,0x5,0x4,0x5,0x4,0x5,0x4,0x5,0x4,0x5,0x4,0x5,0x4,0x5},
  /* 3w */ {0x6,0x7,0x6,0x7,0x6,0x7,0x6,0x7,0x6,0x7,0x6,0x7,0x6,0x7,0x6,0x7},
  /* 4w */ {0x8,0x9,0x8,0x9,0x8,0x9,0x8,0x9,0x8,0x9,0x8,0x9,0x8,0x9,0x8,0x9},
  /* 5w */ {0xA,0xB,0xA,0xB,0xA,0xB,0xA,0xB,0xA,0xB,0xA,0xB,0xA,0xB,0xA,0xB},
  /* 6w */ {0xC,0xD,0xC,0xD,0xC,0xD,0xC,0xD,0xC,0xD,0xC,0xD,0xC,0xD,0xC,0xD},
  /* 7w */ {0xE,0xF,0xE,0xF,0xE,0xF,0xE,0xF,0xE,0xF,0xE,0xF,0xE,0xF,0xE,0xF}
};

4th row has 0xF,0xF,0xF at the end, when it should be 0xF,0xE,0xF. The only way to reveal this bug was to use specific RSP instructions with a specific element specifier in the VT operand. Spotted it right away when it tried to render graphics and audio, though!
 
OP
MarathonMan

MarathonMan

Emulator Developer
Before SSE optimizations:

Code:
xxxxx@yyyyyy:~/Projects/mupen64plus  
$ du -b test/mupen64plus-rsp-cxd4.so 
81944	test/mupen64plus-rsp-cxd4.so

After (~halfway done) SSE optimizations:

Code:
$ du -b mupen64plus-rsp-cxd4.so 
74232	mupen64plus-rsp-cxd4.so

ROMs that push the RSP are much smoother on my machine. I've been testing with Conker's BFD and running with a LLE RDP plugin. Animations look ever so slightly smoother; seems like the bottleneck is more on the RDP than it is the RSP. But still good gains so far. :)
 

beannaich

New member
I'm guilty of preoptimization. I was doing unit testing (or at least trying), but was being rather careless about it. There are all these really latent bugs everywhere that the execution logs weren't revealing.

I'm also guilty of this, and it almost always leads to bugs :p I hate bugs that you find eventually and wonder how anything was working at all! I once found a silly mistake with the BIT opcode for my 65816 emulator, and fixing it didn't change as much as you'd think.
 
OP
MarathonMan

MarathonMan

Emulator Developer
This is much more of an improvement than I would have imagined. I'm currently using a "scalar->SSE->scalar" layer that incurs an additional cost on each RSP vector instruction call. I removed it and benchmarked instruction times:

Did 100,000,000 iterations of VMADM:

Code:
$ time ./main-nosse

real	0m1.562s
user	0m1.560s
sys	0m0.000s

$ time ./main-sse

real	0m0.256s
user	0m0.252s
sys	0m0.000s

Same goes for VMADH, but even moreso:

Code:
$ time ./main-nosse

real	0m1.554s
user	0m1.552s
sys	0m0.000s


$ time ./main-sse

real	0m0.166s
user	0m0.164s
sys	0m0.000s

The differences aren't nearly as great now because of the cost of the layer, but it'll be exciting to see how much faster games are once I remove the layer.
 
OP
MarathonMan

MarathonMan

Emulator Developer
Enjoy! :D

Only partially vectorized, but massive speedups in PJ64.
 

Attachments

  • rsp.zip
    18.7 KB · Views: 83
F

Fanatic 64

Guest
Is this FatCat's RSP with SSE or your own RSP ported to Zilmar Spec?
 

grivy

New member
I don't know whether this information is of any use to you at this time, but the rotating mask before the start screen in Majora's Mask (E) (M4) [!] is not showing. This is on Project64 v2.1 with the provided dll.
 

Top