What's new

Performance Questions...

ZeroEffect

New member
I've started working on a GameBoy emulator for fun and would like some opinions on several performance related questions:

Opcodes:
Should I directly handle all opcodes in one giant switch statement? The compiler (VC++ in my case) should convert this into a jump table improving speed. Or, would I have better performance breaking down the switch statement into multiple switch statements (based on the opcode) adding in a small amount of overhead for the function call? Obviously, in both cases I will be losing code readability but I'm looking for straight performance.

What about the idea of a function call lookup table?

Although I really shouldn't have to worry about performance for a GB emulator on my current system I would like the practice if/when I move onto a more recent console.

Graphics:
When would be the best time to call my drawing routine? If I want to lock it in at 60fps max then I would call immediately before/after the VBlank interrupt. However, if I wanted to max my frame rate, should I call a draw after each instruction... This seems wasteful, is there a better way?

Thanks,
ZeroEffect
 

ector

Emulator Developer
Big switch or function pointer table, either is fine. Hierarchial switch won't buy you much.
You should render a scanline to your buffer after emulating the number of z80 cycles a scanline takes on the real thing. Then after the last line, copy the buffer to the screen.
 

bcrew1375

New member
Draw the screen after every INSTRUCTION? Your emulator would be doing like .00000000001 FPS :p. Drawing the screen after every instruction would slow the emulation down(tremendously), not speed it up :p. The only way to get a high is to draw after the V-Blank interrupt and not restrict the Frames Per Second. You should "draw" the scanlines to a buffer, and then actually draw it after the V-Blank interrupt.
 
Last edited:

BGNG

New member
ector means to draw a scanline only after all Z80 instructions executable in one scanline's time are executed, not after each instruction.
 

bcrew1375

New member
Actually, I was referring to this:

ZeroEffect said:
Graphics:
When would be the best time to call my drawing routine? If I want to lock it in at 60fps max then I would call immediately before/after the VBlank interrupt. However, if I wanted to max my frame rate, should I call a draw after each instruction... This seems wasteful, is there a better way?
 
OP
Z

ZeroEffect

New member
Thanks for the input. Coming from a 3D graphics background I'm used to redrawing the screen each time though the game loop. Trying hard to get out of that mindset :)

As I've been looking at implementing the CPU I've realized that I will be setting the carry & half carry bits often. I've been struggling with the fastest implementation for this and all I can come up with is a brute force technique (masking and comparing) that seems overly complex. From the C language perspective what is the least costly approach?
 

bcrew1375

New member
The only way I know of setting the flags is to use an Inclusive OR on the F register. Z is 0x80, N is 0x40, H is 0x20, and C is 0x10, the the other 4 bits aren't used. To turn them off, I use predefined masks. So, to turn off flag Z, I use an AND on the F register with 0xFF - 0x80 = 0x7F.
 
OP
Z

ZeroEffect

New member
Sorry, I read through my question again and realized I asked the wrong thing. The problem I'm having has to do with determining that the carry and half carry flags need to be set (or not) in the most efficient manner. If I add two numbers together how can I tell that the respective bits have been carried? What I come up with is:

carry = ((value1 & 0x000000FF) + (value2 & 0x000000FF)) >> 8

where value1 and value2 are unsigned ints. Similarly:

halfCarry = ((value1 & 0x0000000F) + (value2 & 0x0000000F) >> 4

I have set up my flags as unsigned ints (TRUE or FALSE) so that I can eliminate the masking instructions required to determine if a flag is set.

Thanks again
 

zenogais

New member
Efficiency doesn't really factor in on the Gameboy as much as it would on a much more complex console like the Nintendo 64. Chances are when you finish your CPU and get Video running you'll have FPS to spare. So worrying about the efficiency of determining a carry and half-carry is really a moot point. Of course this is just my opinion, but I would think correctness would be more important than speed during your first revision of the CPU core.
 
OP
Z

ZeroEffect

New member
I agree with you that, on a GB emulator, I will have fps to spare. But, knowing that I will be setting these flags in >50% of the instructions and knowing that these instructions will be called millions of times, I would like to approach the problem in the most effective was possible. If I can save several operations in each instruction then it may save me having to revise my CPU too many times.

That being said, I firmly believe that premature operation is very dangerous but proper planning and selection of algorithms from the beginning is a must.
 

aprentice

Moderator
ZeroEffect said:
I agree with you that, on a GB emulator, I will have fps to spare. But, knowing that I will be setting these flags in >50% of the instructions and knowing that these instructions will be called millions of times, I would like to approach the problem in the most effective was possible. If I can save several operations in each instruction then it may save me having to revise my CPU too many times.

That being said, I firmly believe that premature operation is very dangerous but proper planning and selection of algorithms from the beginning is a must.

If anything, you lose most of your speed calculating the opcode 3 times, once for every flag, not from something as petty as 1 or 2 extra instructions from flag settings, if you're really an established programmer you should have known this already and how to get around it.
 
Last edited:

hap

New member
The fastest way (i hope :p ) is by not emulating the status register at all, and store the results/flags in individual vars, and check later when it needs to be checked. You've got to be very careful though.

eg. the AND opcode in my 6502 core:

instead of something like:
cpu.A&=DATA;
cpu.P&=BIN_01111101;
cpu.P|=cpu.A&0x80;
cpu.P|=cpu.A==0<<1;

i do:
cpu.P_NZ=cpu.A&=DATA;

and when negative/zero needs to be checked, or when status register gets pushed on the stack, which is rarely: if (cpu.P_NZ==0) do stuff.
 
OP
Z

ZeroEffect

New member
Originally Posted by: aprentice
If anything, you lose most of your speed calculating the opcode 3 times, once for every flag, not from something as petty as 1 or 2 extra instructions from flag settings, if you're really an established programmer you should have known this already and how to get around it.

I don't think I implied that I was an "established" programmer, while I consider myself intermediate at higher level programming I am quite new to low-level emu programming. This is why I am asking such questions. The approch I gave above was brute force, what I am looking for is a better approach. It is my hope that I will find "established" low-level programmers that can help me improve my skills. Hopefully I will be able to return the favour in the future as my skills mature.

hap: Thank you for your thoughts :)
 
Last edited:

Top