Page 1 of 2 12 LastLast
Results 1 to 10 of 17
  1. #1
    EmuTalk Member
    Join Date
    Jan 2006
    Posts
    10

    Running into some walls, here

    So this summer I thought I'd take it upon myself to begin my very first emulation project-- I am writing a dynamic recompiling SNES emulator for the PSP.

    I didn't quite realize when I started how ambitious this project is. A few quarters ago I took a computer organization course that dealt with MIPS, so I thought I'd be relatively prepared to jump into this. I was particularly confident since I choose to (initially, at least) take a dictionary-type translation approach. No real profiling or optimization at first. I just want something that works for now.

    I now see that even without some crazy profiler/optimization utility, this is not a trivial undertaking and I'm at the point where I could really use some advice from people who have a good understanding of how this all works.



    So, my first problem I guess. I spent the past two or so weeks writing "emit" functions (for example, emit_lda() ) that spit out MIPS-encoded instructions that perform 65c816-equivalent operations according to the address mode of interest. This weekend I wrote a MIPS disassembler/interpreter and they seem to do what they're supposed to do. However, these functions more or less assume that all ROM and and RAM data are stored contiguously in memory. After reading a little more about the SNES memory map (I'm only implementing LoROM support at the moment), I realize that this approach won't really work.

    The way I see it, I have two options:

    1) Allocate a huge wad of memory. Cut the ROM data up into 32k chunks and distribute them according to their proper bank locations (the first 32k chunk would be located at 00.8000, the second at 01.8000, etc. ).

    2) Go the typical route of writing read/writemem() functions.

    The obvious problem with #1 is that I'd be using a ton of memory, and then any writes to mirrored memory would have to be duplicated. #2 seems like it'd be the clear alternative, but the 65c816->MIPS translation already yields an approximate 1:20 instruction ratio. I want to avoid any unecessary function calls or additional code generation, and I figured the method might lend itself to an elegant solution of some kind. But maybe not?

    Suggestions are greatly appreciated.
    Last edited by phytoporg; June 21st, 2006 at 01:53.


    • Advertising

      advertising
      EmuTalk.net
      has no influence
      on the ads that
      are displayed
        
       

  2. #2
    Boring person
    Join Date
    May 2006
    Posts
    194
    I haven't done anything with SNES before but I looked at some memory maps and it has 256 64k banks (16mb total space), right? And of these banks they can be split into 4k regions corresponding to different hardware locations. Some hardware registers have to be trapped, so you can't map those directly to memory. What I would recommend doing is to allocate a 256 wide array of points to 16 wide arrays, that holds the current bank map.

    When you change banks change this, and when you access memory use the upper 4 bits of the 16bit offset to determine which sub-area it's part of. In these tables either store a pointer into the memory array it's part of, or a null pointer to indicate that it has to be handled with more logic than just a memory read/write.

    Check to see if the pointer is null, if it isn't offset it by the address with the lower 12bits masked and reference this location. If it is call a function to handle it.

    Since this uses a kind of page table you don't have to change the way memory areas are represented in the PSP's memory and you don't have to duplicate mirrored writes or anything like that. It also only uses a relatively small amount of memory, 256 * 16 * 4 bytes (16kb)

    So, 1:20 ratios? Are you sure it's even worth recompiling anymore at that point? Could you give me an example of what kind of 65c816 instructions are taking those amounts of MIPS instructions?

    EDIT: I'm currently working on an emulator with the intention of porting it to PSP one day (that's why I started it) and doing a dynarec inevitably is very much planned as well.. I'd love to share ideas, if you have AIM could you IM me? Screenname's Exophase.
    Last edited by Exophase; June 21st, 2006 at 11:17.

  3. #3
    EmuTalk Member
    Join Date
    Jan 2006
    Posts
    10
    Oh, I think I saw something like that going on in the sneq source, but I couldn't completely make out what he was doing. That's a neat idea, I'll give it a shot.

    The 1:20 ratio is only that way 'cause there isn't any profiling going on. For the moment, all 65c816 registers are statically mapped to the s-registers, and the translation includes encoded MIPS instructions to calculate the m and x flags, with appropriate branches to perform the necessary masking. Not to mention the fact that, with these statically mapped registers, I'm not taking any steps to avoid the clutter caused by the accumulator-based architecture.

    So, for example, here's my current emit_lda function (which will need some fixing to take into consideration proper memory mapping):

    Code:
    void emit_lda( unsigned int **location, ADDRESS_MODE am ) {
    	/**
    	 * --$t3 = effective address--
    	 * --$t0 = *(effective address)--
    	 * 0  andi $t1, S, MFLAG
    	 * 4  bne  $t1, $zero, 16
    	 * 8  add  A, $zero, $t0
    	 * 12 j    24
    	 * 16 andi A, A, 0xFF00
    	 * 20 or   A, A, $t0
    	 * 24 nop
    	 **/
    
    	emit_load_op( memPtr, location, T3, am );
    	emit_deref_mflag( location, T0, T3 );
    
    	unsigned int initLoc = (unsigned int)*location;
    
    	AND_I( location, T1, S, MFLAG );
    	BNE( location, T1, ZERO, 2 );
    	ADD( location, A, ZERO, T0 );
    	J( location, initLoc + 24 );
    	AND_I( location, A, A, 0xFF00 );
    	OR( location, A, A, T0 );
    	NOP( location );
    }
    The instruction macros look something like:
    #define ADD_I( location, rt, rs, immediate ) \
    **location = ITYPE( ADD_I_OP, rs, rt, immediate ); \
    ++(*location);
    Once this all works, even if it's super slow, the emit functions are going to see a pretty big overhaul so I can work them into some kind of profiling scheme for on-the-fly instruction reordering, dynamic register allocation and general basic block optimizations. Until then, things will look a little messy.

    And sure, I'll shoot you an IM when I get the chance. Right now it looks like you're idle, though. D;

  4. #4
    Boring person
    Join Date
    May 2006
    Posts
    194
    I'm not sure what's wrong with static allocation, assuming that you end up using more than just the s registers. You should have an entire 29 registers at your disposal; you have to preserve them appropriately across calls to C code but your recompiled could should ideally stay within either its own code or safe ASM you created yourself for as long as it can. 65c816 only has A, X, Y, S, DB, D, PB, and P, right? The first three of the last four not being accessed explicitely as often in comparison. As for the flags, if you're just going to check if they're zero or non-zero most of the time you'd may as well keep them in a P register, and only keep the ones that you calculate often (N, V, C, Z) in their own registers. You need maybe two or three others as temporaries and you'll want to keep things like a cycle counter, translated bank address, etc in registers, but you should have more than enough registers to manage all this.

    I'm not sure that instruction reordering or dynamic register allocation is really going to get you much of anything and the former especially is a rather expensive optimization. Like I said, you have plenty of registers so you can map each 65c816 register to a fixed MIPS register (IE, A -> $s0, X -> $s1, etc), which is what you seem to be doing. Most likely the best optimizations you'll be able to perform is redundant flag elimination. The 8bit vs 16bit stuff looks rather expensive, you'd be well off to try to determine it at runtime as much as possible. If SNES games are usually in 16bit modes (I don't know either way) then it'd be best to have the modes be global compiletime known values and flush the entire recompilation cache when it changes. However, barring that, you can keep registers that have either an 0xFF or 0xFFFF mask in it and one with its complement and have it change depending on the M flag, and do this:

    lda t0, memory_base, offset
    and t0, t0, mask
    and a, a, not_mask
    or a, a, t0

    Something I'm confused about, when you grab the MFLAG do you mean to do it from P and not S?

  5. #5
    Moderator Cyberman's Avatar
    Join Date
    Nov 2001
    Posts
    1,824
    I suggest instead of forging on sitting back and look at what is going on within a typical SNES game. Quite a bit. You might want to look at SNES9X source code for some ideas as well.

    1) Other things, the PSP is too small for NOT doing instruction optimization.
    2) I suggest usingintermediate data instead of emitting immediately. This allows you to do some analytical clean up later on before emiting actual code. IE bank switching routines can be converted to direct code leaps etc.
    3) DATA DATA DATA! Think about how the game accesses data. Why? Because you can save yourself some banking headaches this way. Banking data can be simplified the same way code can be. It's just a matter of 'seeing' what you are looking at.

    Think HLE a bit too, not everything is straight forward in a bank switched or segmented beast like the SNES but you can certainly make things a bit faster. Remember most of these games were coded modularly likely using a compilor.

    First pick a simple game IE one that doesn't use a lot of banking (yeah like FF4 (smirk).

    Tales of Phantasia is a good game to test your system with. It abuses the heck out of the SNES and is the largest game apart from Star Ocean for the SNES (6Mega bytes). It's a japanese only game but there is a patch for an english translation.
    Last edited by Cyberman; June 22nd, 2006 at 20:20.
    Progress (n.):
    The process through which the Internet has evolved from smart people in front of dumb terminals to dumb people in front of smart terminals.
    -------------------------------------------------------------------
    Recursive (adj):
    see Recursive

  6. #6
    EmuTalk Member
    Join Date
    Jan 2006
    Posts
    10
    I guess my naming convention is a little off-- I call it S for "status register" (I use SP for the stack pointer). I guess I should change that, heh.

    I'm currently only mapping the s-registers to the 65c816 regs you named in addition to the program counter (I'm actually storing the PB as the high eight bits in a larger, "full" PC register). I wanted to keep a lot of t-registers around for use elsewhere, but it's becoming more apparent that I don't need very many for general use. I'm going to be redoing my emit functions to take into account the memory map anyway (I implemented your idea yesterday, by the way-- thanks a lot!), mapping more registers might help. I'm not completely sure, though, 'cause switching from the emitted code to compiled C-code would then mean more stack manipulation on a very frequent basis, and I want to keep memory traffic to a minimum.

    And yeah, the 8-bit/16-bit mode switching is a pain, but unfortunately it happens very often since the switch needs to be done to access individual bytes rather than words and vice-versa. Once a profiler comes into play, it'll be possible to determine during run-time the certain states of the relevant flags in a code block to avoid the extra branching.

    EDIT:
    1) Other things, the PSP is too small for NOT doing instruction optimization.
    2) I suggest using data instead of emitting immediately. This allows you to do some analytical clean up. IE bank switching functions can be trimmed out, and direct code leaps instead etc.
    3) DATA DATA DATA! Think about how the game accesses data. Why? Because you can save yourself some banking headaches this way. Banking data can be simplified the same way code can be. It's just a matter of 'seeing' what you are looking at.
    Whoah, hello. Sorry, was still writing the above when you got your post in.

    1) I do intend to do instruction optimization, just later in the game. There are probably a lot of little things I can even do initially, like omitting flag calculations where they don't need to be done and that kind of thing.
    2 & 3) Yeah, I put together something similar to Exophase's suggestion in his first reply, which practically eliminates the need to even acknowledge the existance of banks and fragmented data. My only issue now is dealing with the hardware registers and other locations that can't just directly be mapped somewhere.

    I'm actually shooting to get super mario world working first, and I use "working" lightly, here. It's more or less the goal for the end of this summer. I'm not totally sure how realistic that is, but I'm crossing my fingers.
    Last edited by phytoporg; June 22nd, 2006 at 20:21.

  7. #7
    EmuTalk Member
    Join Date
    Jun 2006
    Location
    Santiago
    Posts
    13
    1) Other things, the PSP is too small for NOT doing instruction optimization.
    Are you sure? I don't know too much about SNES nor PSP, but a 333 MHz MIPS CPU should be enough to emulate a 65c816 at 3.58 MHz (and the video, audio, etc. hardware).

    I'm working on a GameBoy emulator (Z80 4.194 MHz) for my Palm Tungsten T2 (ARM 144 MHz). I don't do dynamic recompilation, just a simple interpreter written in C++ and I get 1250 fps on my P4 1.4 GHz machine (according to some benchmarks ---XviD decoding and memory speed--- the P4 1.40 GHz CPU is only 3 or 4 times faster than the ARM 144 MHz in my PDA).

  8. #8
    Moderator Cyberman's Avatar
    Join Date
    Nov 2001
    Posts
    1,824
    Quote Originally Posted by huarifaifa
    Are you sure? I don't know too much about SNES nor PSP, but a 333 MHz MIPS CPU should be enough to emulate a 65c816 at 3.58 MHz (and the video, audio, etc. hardware).
    Unlike a PC it doesn't have 256 to 4096 megs of memory to spew data into. It is limited. Although it's memory is near to the largest of SNES ROMs in size. You still need to fit the emulator and other things within it. Think EMBEDED system.
    Quote Originally Posted by huarifaifa
    I'm working on a GameBoy emulator (Z80 4.194 MHz) for my Palm Tungsten T2 (ARM 144 MHz). I don't do dynamic recompilation, just a simple interpreter written in C++ and I get 1250 fps on my P4 1.4 GHz machine (according to some benchmarks ---XviD decoding and memory speed--- the P4 1.40 GHz CPU is only 3 or 4 times faster than the ARM 144 MHz in my PDA).
    A game boy is a much smaller system in comparison. It's not too hard to load the contents of an entire ROM and run an interpretor. Dynamic recompilation generations code on the fly, this code may be 2 to 3 instructions for every instruction. In addition the instruction sizes are 32 bit. Your code size will double instruction for instruction almost immediately. So you may have more memory but you also have more memory concerns.

    Cyb
    Progress (n.):
    The process through which the Internet has evolved from smart people in front of dumb terminals to dumb people in front of smart terminals.
    -------------------------------------------------------------------
    Recursive (adj):
    see Recursive

  9. #9
    Boring person
    Join Date
    May 2006
    Posts
    194
    For systems with limited memory especially (like PSP) it's a good idea to only recompile frequently used blocks. StrmnNrmn takes this approach for Daedalus and the translation cache ends up being pretty small compared to the overall ROM size. This could improve speed as well, depending on how fast your recompilation is (if a lot of optimizations are thrown in you really don't want to waste time recompiling things that are only executed once or a few times, like initialization code).

    Cyberman; PSP's available RAM is much larger than the largest SNES ROMs, isn't it? Plus, with bank switching required for > 4MB ROMs it's easier to load the banks on/off memstick, although I don't know what kind of speed hit this would incur. Sadly, phytoporg is currently looking at far more than 2-3 MIPS instructions per 65c816 instruction. I guess the other question is how much of an SNES ROM consists of code vs. data (my guess would be usually not much for most games, they don't get awfully complex).

    huarifaifa; This is off topic, but I'm skeptical that an ARM9 based CPU at 144MHz is really only 1/4th to 1/3rd of a P4 1.4GHz, generally speaking. It sounds like you were running very memory bandwidth limited tests, which are not necessarily representative of how an interpretative emulator will behave. Most of the working data set is highly localized (registers, flags) and the locality of the emulated program itself lends itself to this as well. The ARM CPU will surely perform the same thing in significantly fewer instructions, but probably at higher cycles per instruction counts as well. I am interested to see what the comparison is like in the end though...
    Last edited by Exophase; June 24th, 2006 at 12:21.

  10. #10
    EmuTalk Member
    Join Date
    Mar 2006
    Posts
    13
    I'd advise against a recompiler (unless you are doing it for fun, then go for it! ). The benefits for 65c816 are small enough (8/16 mode switch playing a big part here), and the overhead for going out to memory large enough that it's not really worth it. Plus with the fun of keeping the APU synced (not to mention specialty chips) to the main core, you want to be pretty close to cycle accurate.

    On the DS (a 66Mhz arm9) both snesds and snezzids use a hand written arm asm core, and it runs at 60fps on everything out there. Mind you, that is using the DS's graphics hardware to draw the tiled backgrounds and sprites, but with a 333Mhz processor a hand coded asm core should leave plenty of horsepower to spare for sound&gfx.

    Edit: Just to mention why I recommend against it, debugging a recompiler is a *pain* unless you have some nice single-stepping tools. I wrote a dynarec for the SPC-700 cpu on the GBA (256kb ram, 16Mhz arm7) and it was a substantial amount of effort to get it debugged and running well.
    Last edited by gladius; June 27th, 2006 at 06:10.

Page 1 of 2 12 LastLast

Similar Threads

  1. Having problem running Spawn
    By keiteo in forum Chankast
    Replies: 1
    Last Post: April 7th, 2006, 23:01
  2. Game not running! Need help!
    By Rydium-41 in forum Chankast
    Replies: 3
    Last Post: February 19th, 2006, 09:55
  3. Probs running games on Dolphin
    By SuperTed in forum Dolphin
    Replies: 12
    Last Post: August 28th, 2004, 17:00
  4. Zelda MM-Black walls, floors
    By Master Sword in forum Project64
    Replies: 5
    Last Post: October 30th, 2002, 05:05

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •