August 10th, 2014, 20:21
That e-2 bit was probably Nintendo trying to encourage good program design. The JR instructions occupy 2 bytes in machine code. When performing the JR instruction, the PC is actually pointing to the instruction right after JR. If e is specified as -1, the PC would technically be the the address of JR's immediate, which would then become an opcode the next cycle, which could lead to some wacky code execution. It's probably perfectly legal to do what I described above, but Nintendo probably didn't want to have games messing up, so they opted to keep programmers safe to begin with.
So what I think they meant to explain was that the signed immediate (e) should always be calculated as the value you want to jump first, then subtract 2. So if you wanted to jump the PC to one byte before the JR instruction, you would not use -1 (that jumps the PC into JR's immediate) but -3. This concept is poorly communicated in Nintendo docs though, and it took me some head-scratching to figure out what they meant.
September 3rd, 2014, 09:57
I ported my emulator to the STM32F429I-Discovery board,
which has a 180MHz Cortex-M4 with 256KB RAM and 2MB flash, a 320x240 LCD and 8MB SDRAM.
It runs but it's very slow, I guess single-digit FPS. There is no input, no sound and the
game has to fit inside the flash together with the emulator.
The board can be programmed and debugged over usb, which makes things considerably easier,
but programming larger files doesn't work reliably and anything over 1MB almost always breaks .
Next up are input and some missing rendering stuff.
Getting acceptable performance will probably not be as easy and I may have to rewrite the critical
parts in assembly. It really shows that I did not write the code with speed in mind .
September 6th, 2014, 07:33
Hey, that's very cool
The closest thing I ever did was port my emulator to the Dingoo A320 a long time ago (400MHz MIPS CPU). I must admit though, 180MHz seems awfully limited, you'd have to make every cycle count. ARM assembly is pretty easy to get to grips with, and implementing the GB's CPU instructions in assembly shouldn't be that difficult. Any big plans for your STM32F429I-Discovery board?
October 30th, 2014, 12:30
After seeing your emulator on the STM32F429I-Discovery board I just had to register here because currently I'm working on a very similar project. First I started about 10 months ago to create my own Gameboy emulator with a friend. The basics were working very quickly but getting every minor detail of the Gameboy hardware right is very time consuming. I also wanted to port our emulator to a microcontroller and after the first failed attempt to use a 32 MHz ATxmega we switched to the STM32F4-Discovery Board with a 168MHz Cortex-M4 and 192 KB RAM. The emulator is supposed to be a full gameboy color emulator. Currently the emulator already runs twice the speed of the real gameboy but with disabled rendering and in normal speed mode. With rendering enabled (currently not very optimized) we are at ~70% of the real speed. The emulator is written in C++ and is currently not using assembler to improve performance. Instead we try to be very clever and only do work when it's absolutely necessary.
Too much work and no motivation and therefore no progress. I added SGB borders back in but that's about it.
@sebi707: that sounds promising. What are you using as display? I also have the STM32F4-Discovery lying around, but never did anything useful with it.
I did some profiling on the PC version of the emulator with Zelda DX, the values are the percentage of the total execution time:
Memory access is at 4.5% for reading and 0.6% for writing, so that is better than expected. (Or everything else is just much slower.)
Sound is disabled on STM32 so optimizing video seems to be the best bet. The current code uses a brute force approach and should allow for some speed gains.
I was a bit surprised to see that SDL2 spends about 17% of the total time with updating the display, even without vsync. The STM32 port writes directly to the framebuffer and should be faster.
Still, I'm not too optimistic that I can get reasonable speeds without rewriting most of it. Speed is probably single-digit fps, I will try to get some timing information on the board next.