What's new

angrylion's Per-Pixel RDP with OpenGL

OP
Iconoclast

Iconoclast

New member
https://github.com/cxd4/rsp

That is if you are talking about the 32-/64-bit RSP code I was just talking about.

For the present time, I'm still not interested in discussing the pixel-accurate RDP with GitHub. The RSP is on GitHub.

As far as building on Windows, the RSP plugin compiles fastest if you have MinGW installed to C:\MinGW and simply double-click the batch file `make_w32.cmd'. A 64-bit Windows build is slightly harder...I think you either need to install MinGW-64 or MSYS2 with 64-bit in it and change the %MinGW% batch variable in my command script to refer to that to get it to compile the 64-bit DLL. I will have to confirm which method is best by the time I do a release of that plugin here as well.

Either way, the latest RSP recompiler plugin for Project64 current versions still seems faster than my interpreter. I thought for sure I did see a couple cases where I had a SSSE3 build going that was faster than the recompiler plugin in a game I tried, but in general as long as you simply need full speed the re-compiler plugin seems best.
 

fla56

New member
Hi. Thanks for this but something seems to have changed drastically for me with the VI filters on SuperMario -getting all kinds of weird jaggy pixel crawl that wasn't there in 1.2, esp obvious on 'Mario head' screen (on GTX970, OGL4.5)

-doesn't seem to make any difference with/without nearest neighbour or screen res settings in case that helps...(can't seem to post screenshots either):huh:
 
OP
Iconoclast

Iconoclast

New member
Hi. Thanks for this but something seems to have changed drastically for me with the VI filters on SuperMario -getting all kinds of weird jaggy pixel crawl that wasn't there in 1.2, esp obvious on 'Mario head' screen (on GTX970, OGL4.5)

-doesn't seem to make any difference with/without nearest neighbour or screen res settings in case that helps...(can't seem to post screenshots either):huh:

I have attached a new release to the original post of the thread which addresses this, along with a few minor changes.

Thanks for bringing this difficult-to-confirm VI divot filtering discrepancy to my attention. It was a problem with the MinGW/GCC build; prior to the creation of this thread here on EmuTalk, I was always using MSVC 2013 to do the binary releases. However, on July 24th, 2014, it seems I made quite a significant typo which affected non-Visual-Studio builds of the plugin (which, back then, was all that was possible).

Code:
#ifdef _MSC_VER
#define ZERO_MSB(word)    ((unsigned)(word) >> (8*sizeof(word) - 1))
#define SIGN_MSB(word)    ((signed)(word) >> (8*sizeof(word) - 1))
#define FULL_MSB(word)    SIGN_MSB(word)
#else
#define ZERO_MSB(word)    ((word < 0) ? +1 :  0)
#define SIGN_MSB(word)    ((word < 0) ? [B][COLOR=#ff0000]-0[/COLOR][/B] :  0)
#define FULL_MSB(word)    ((word < 0) ? ~0 :  0)
#endif
I'd intended these macros for portability of the RDP outside of a 2's complement CPU, but I seem to have overlooked typing what I actually meant: -1, not -0.

I also changed the #ifdef _MSC_VER to #if defined(USE_SSE_SUPPORT), as SSE2 is necessarily bound by the Intel CPU architecture fixations. Therefore, divot filtering should be optimized on MinGW/GCC and Linux builds, not just Windows MSVC builds.

Thanks for the help guys.
I get no crash with your static interpreter plugin.

Then it was another problem with the RSP re-compiler.


By the way folks, yesterday I've already modified post #3 of this thread to include my PixD utility attached to the post, near the bottom with the instructions on using the "Write DRAM" button in the configuration UI. It is a ZIP archive of Windows-only binary release. The Linux binary can be obtained by making sure you have the freeglut headers and libraries installed, then executing ./make.sh to build it ourselves. As for building it on Windows, it uses Mark Kilgard's non-free glut32 library instead of freeglut, which you could officially obtain from the latest NVIDIA SDK.
 

neverwind

New member
Hello,

I'm getting an 'unidentified microcode' error whatever I try to do.
Could anyone help?

PJ:2.2
OS: Win 7
GFXCard: ASUS GTX970
 

neverwind

New member
Disable High Level graphics emulation?
Hey, thanks, that worked!

I've tried WWF - No Mercy - at first it was awfully slow. After enabling the RSP plugin, it's now a lot better but still it has some slowdowns.
However, I noticed my Video Card's clocks stay the same as if it is in idle mode all the time? Could this be the reason for the slowdowns?
 

Alex Cherkasoff

New member
Just installed it on 2.2 (it's not working ot 2.1 though)
It works, but so hard
PJ64 2.2 is slow by itself (slower than 1.6)
with this plugin I'm only getting 15-20fps
Audio is stuttering
Is there a way to make SkipFrame option?
 
OP
Iconoclast

Iconoclast

New member
However, I noticed my Video Card's clocks stay the same as if it is in idle mode all the time? Could this be the reason for the slowdowns?

It's faster to use the CPU, than it is to use the video card, just to specify one pixel at a time.
So I think the reason for what you said is that the video card is mostly used to draw the final screen as a texture, but not to emulate the RDP itself.

Otherwise, it would not be a per-pixel plugin. Really, video cards are proficient at handling vectors, floating-point, things like that. Just forcing pixel-exact correctness is more of a scalar operation. It always is conceivable for some video driver/implementation, somewhere, to have some way to use hardware-accelerated OpenGL to do pixel-accurate emulation of the RDP. That would not work on every implementation, though--it would be pixel-accurate, hardware-accelerated rendering on some video hardware but most likely none of our own. ziggy did seem pretty interested in giving it a shot before he left, though. His hardware-accelerated plugin was a fantastic attempt, but it seems to have started too early in time, back when MAME implementation and some reversing of the RDP hardware was still too incomplete.

From time to time, I do think about the idea, but I think gaining more practical experience with the RDP triangle spans rendering and working on simplifying more deadwood for speed first would help that idea more if it came before rather than after.

Just installed it on 2.2 (it's not working ot 2.1 though)

Update Project64.rdb, if you want to test on 2.1.

Virtually all of the "best" RDB settings for Project64 1.6 were tested with HLE plugins. When you use LLE, some extra stuff gets executed, so it's more prone to the effects of bad re-compiler settings in the Project64 CPU and the like.

PJ64 2.2 is slow by itself (slower than 1.6)
with this plugin I'm only getting 15-20fps
Audio is stuttering

All three of those things coincide. If audio, graphics, RSP, the core, or any other component to emulation slows down the overall emulation, then audio with most plugins will also slow down and "stutter". I'm quite used to it by now. :p Actually, haven't yet found a audio plugin that doesn't stutter if emulation isn't at full speed.

Is there a way to make SkipFrame option?

It's documented in post #3: You can skip every few N triangles or every few N vertical refreshes of the screen.

The speed gain from skipping every N refreshes is less observable, of course, if you just bypass the DAC filters, which also is faster.
 

fla56

New member
Thanks for the patch, looks perfect now :tup:

On the subject of GPU vs CPU just wondering -is it possible/beneficial to use the GPU/OpenGL shaders to do the VI filters? Maintain per-pixel accuracy with a mostly software RDP but a healthy speedup overall?
 
OP
Iconoclast

Iconoclast

New member
That probably is possible. For the VI DAC filters, some of us have talked about using OpenGL hardware features to handle them instead of doing them all on the CPU. RetroArch has the best feature of investigating into that, I believe. The reasons why I do not invest in it are:

  • lack of GLSL or other forms of modern OpenGL knowledge as well as any interest in having it, along with any of other of Khronos' illogical choices of function replacement and/or deprecation
  • the fact that they can just be bypassed in the plugin settings--No filters emulation is even faster than OpenGL-accelerated emulation of them.
  • fundamental portability: OpenGL is intended to be cross-platform because it is an open specification, but systems are free to decline implementing OpenGL. In this case, doing the VI filters on the CPU as standard C code is more portable than using OpenGL 2.0+ or GLSL to do it.

However, it is an interesting idea anyway, just one that I prefer not to transfer my time into. Probably the mupen64plus-libretro core usable in the RA frontend may be able to offer it sometime, though for the time being I prefer to keep my application of OpenGL down to just the portable (which is already very hard, when you're working with OpenGL) necessities needed to put an image out to the screen.
 

fla56

New member
Interesting...as you say, need the speed so I turn the filters off but then there are games like Beetle Rally that need them on -would a neater solution be to run them on separate CPU thread(s)?

On a separate note unfortunately still looks like RDP bugs are still there, think I missed it first time I retested sorry -still getting pixel crawl++ in SM64 :-(
 
Last edited:
OP
Iconoclast

Iconoclast

New member
Interesting...as you say, need the speed so I turn the filters off but then there are games like Beetle Rally that need them on -would a neater solution be to run them on separate CPU thread(s)?

In what way does Beetle Rally need them on? Does it just look too ugly with them off?

Re: run VI/RDP on multi CPU threads, I would rather not because I'm not a multi-threading guy. The simplest and purest pieces of software engineering are written for one thread.

On a separate note unfortunately still looks like RDP bugs are still there, think I missed it first time I retested sorry -still getting pixel crawl++ in SM64 :-(

Hm, what RDP bugs in SM64?

Could you perhaps post a jumbled up link (http : // www . blah blah) to a screenshot showing this, in case EmuTalk tries to filter out links in posts from new users?

Also, if it's a RDP bug, then you'll see the glitchy graphics with VI filters turned off, not just on. I would turn them off so that the filtering doesn't intervene with the raw RDP frame buffer output for a sharper image to study the bug more clearly. However, with that said, I am confident that there are no RDP bugs at all with this release. :)
 

fla56

New member
In what way does Beetle Rally need them on? Does it just look too ugly with them off?

Hm, what RDP bugs in SM64?

Could you perhaps post a jumbled up link (http : // www . blah blah) to a screenshot showing this, in case EmuTalk tries to filter out links in posts from new users?

Also, if it's a RDP bug, then you'll see the glitchy graphics with VI filters turned off, not just on. I would turn them off so that the filtering doesn't intervene with the raw RDP frame buffer output for a sharper image to study the bug more clearly. However, with that said, I am confident that there are no RDP bugs at all with this release. :)

Hi -Beetle Racing has a VI-driven sliding film effect in the menus but have jst tested and it seems to work with the option on or off 'oops/great!' -take it VI filters separate from other parts of the VI (apparently some football games do similar things too)

Re: RDP -think I'm getting my phrasiology wrong -by 'RDP' think I really mean 'RDP plugin' -as you suggest the errors only occur with VI filters on therefore if there is a bug it's in the VI part of the RDP plugin code...

Will post screenshots to the old PJ64 forum
 
Last edited:

elchhome

New member
wrong aspect ratio

If I use pal games then the picture is stretched vertically while the ntsc (us) version has the rigth aspect ratio.
You can easily verify this with zelda oot Eur version vs. US.

PS.
This doesn't happen with the previous version of the plugin (angrylion's RDP with OpenGL 1.2) from the pj64-emu.com thread.
RSP is always the static interpreter.
 

Attachments

  • zelda.png
    zelda.png
    212.3 KB · Views: 7,398
Last edited:
OP
Iconoclast

Iconoclast

New member
I've been working on an update to fix 64-bit Linux run-time collisions (never had problems with 32-bit Mupen64 0.5 on Linux). I was able to test the build in Mupen64Plus 1.5 x64 but unable to get any video frames to send through the VI because Mupen64Plus 1.5 keeps failing to update VI_V_SYNC_REG causing a constantly negative number of active video lines--therefore, there is no way that this plugin can be ported to Mupen64Plus 1.5+ without fixing the regression in their core. Maybe I will look into trying plain 64-bit build of Hacktarux's interpreter in Mupen64 0.5, as the performance gains here with the RCP outweigh whether the CPU is being re-compiled or interpreted anyway.

Sorry about delays by the way. It's fun when you can contribute to a bunch of projects besides just your own. :)

elchhome said:
If I use pal games then the picture is stretched vertically while the ntsc (us) version has the rigth aspect ratio.
You can easily verify this with zelda oot Eur version vs. US.

PS.
This doesn't happen with the previous version of the plugin (angrylion's RDP with OpenGL 1.2) from the pj64-emu.com thread.

Yes elchhome, thanks for mentioning it. This experiment of mine is something I've spent the past few days wondering about, what I should do with it.

It was for games that draw NTSC-proportionate frame buffers, like 40 Winks, that would look squished if you had a 640x480 window instead of a 640x576 window. (There are 576 lines based on VI_V_SYNC_REG when PAL mode is detected.) You can get a pretty good comparison if you turn VI filters off.

OOT PAL forced to 480 lines, VI filters off:
6yZAbpSBV7ldxdqJWKjypvX2Jw4Si.bmp


OOT PAL at 576 lines, VI filters off:
6yZAaM6KjJV7r7D5mMK4xcv8ddcrt.bmp


The same can be observed with angrylion's unmodified plugin, except that due to the VI filters always being emulated, this kind of discrepancy isn't so noticeable. Therefore it's not a bug, but I do wonder about whether I should change to a 640x576 window automatically. Care to help me decide? You should be able to configure the graphics DLL to set a "User-defined" resolution, and enter 640x480 to the right to force the NTSC window size for OOT PAL and other games. Which do you think looks better, with or without filters?
 
OP
Iconoclast

Iconoclast

New member
Changes since the last attachment to the thread. --updated 2015.06.13 --

  • renamed extension-loaded OpenGL functions to fix Mupen64Plus collisions
  • exporting non-zilmar-spec `ReadScreen' to fix SIGSEGV's on Mupen64Plus 1.5
  • removed the automatic window size adjustment for PAL images to 640x576
  • fixed premature blanking of the rainbow colored bars boot screen :(
  • added some more forced OpenGL state configuration commands
  • fixed leftover VI garbage seen for an instant when switching between ROMs
  • staticized texture coverage LOD fraction calculations
  • optimized filling large buffers of memory with a single byte
  • merged r84 from upstream angrylions-stuff/mylittle-nocomment
  • many optimizations to YUV texture element fetching and YUV-RGB conversion
  • merged r85 from upstream angrylions-stuff/mylittle-nocomment
  • improved 16-bit R5G5B5 frame buffer reads with dynamic branch weighing
  • optimized coverage buffer allocation and YUV texture LUT decoding
  • merged r86 from upstream angrylions-stuff/mylittle-nocomment
  • folded RGB comparisons for dithering
  • merged most of r87 from upstream angrylions-stuff/mylittle-nocomment
  • improved combiner equation performance with dynamic branch prediction
  • removed the 3-bit restriction on color dither randomization masks
  • supports newly discovered VI pixel destruction on clamped hres or h_start
  • merged r88 from upstream angrylions-stuff/mylittle-nocomment
  • statically preserving x_start initial value in case of RCP timing issues
  • some finalized corrections to horizontal pass max-clamping
  • merged r89 from upstream angrylions-stuff/mylittle-nocomment
  • guaranteeing sufficient VI divot and aliasing buffer sizes
  • passes some more technical RCP hardware tests involving x line caching

It works in Mupen64Plus 1.5 32-bit and earlier versions, but 64-bit Mupen64 needs to be compiled differently due to corrupt allocation of the RCP registers (most easily ignored as a result of routinely practicing HLE of most components).
 
Last edited:

songokou

New member
Wow! What a truly impressive plugin! If it's all right to make a suggestion, I think that a scanline feature would be awesome for this. It's just a thought though. I look forward to seeing more of this promising plugin.
 
OP
Iconoclast

Iconoclast

New member
I had thought about it a year ago, but, being unfamiliar with the way things like that work, I was puzzled on a few issues.

  • how the scanlines should originate...alternating even or odd lines starting from the top or the bottom
  • how OpenGL should "render" the scanlines
  • various feedback from people preferring 50% scanlines, 25 or 75% or different concentrations

I haven't tried to implement them, but there could be a way to minimize their weight against performance, in the event that most people would prefer to play games without them.

Have had a number of interesting projects intercepting here and there, though, but I still mean to get back to this one as well.
 

Top