What's new

Game Boy

HyperHacker

Raving Lunatic
Inline functions get copied directly into the code in place of calling them, so in theory they should be as fast as if you didn't use a function at all. Hopefully compilers would be smart enough not to bother with the usual register saving and restoring when they do this.
 

Exophase

Emulator Developer
Wow ... awesome post , thanks for all the information!.

Thanks.


Maybe.
The best solution is to create an empty image , and write the pixels at runtime.
Then , you can just render the final image without having to check the color of each pixel.
I think its much faster...

Sorry, but I don't think I understand what you mean. Can you elaborate?


Hmm...im not the best coder of course , but how faster can be this:

Code:
switch(opcode)
{
	case 0 ... 255 
}

which translates to :

if (opcode == 0) else if ( ==1 ) else if ( ==255)

than this:
Code:
//stored at 0x00
forceinlined void OP_NOP(TVirtualMachine* vm)
{
	vm->pc++;
}

then executing the opcode directly :
Code:
	if( (currentOP>=0x00)&&(currentOP<= max) )CPU_FUNCS[currentOP]();

I think its much faster...

Who taught you that switches translate to this? That's a common misconception. Translating a switch to something faster than that is a very simple compiler operation. What GCC will translate it to is this:

Code:
labels switch_table_index[] =
{
  CASE_LABEL_1, CASE_LABEL_2, CASE_LABEL_3, ...
}

if(index > LAST_CASE_NUMBER)
  goto BREAK;
goto switch_table[index];

CASE_LABEL_1:
  ...

CASE_LABEL_2:
  ...

BREAK:
  ...

In games , usually we delay the update system for about ".1"ms , and it improves
alot the performance.
I think it might work well with gb emu.

You should wait for however long you need to to achieve realtime speed for Gameboy. If you can't achieve realtime speed then either the computer you're running on is 10+ years old or your emulator is way too slow. Adding waits won't make something faster though, that's kind of counter-intuitive. If it's multithreaded it'll let other things run, but there are more explicit ways to do this without delaying. That's actually not a good approach.
 

Exophase

Emulator Developer
Inline functions get copied directly into the code in place of calling them, so in theory they should be as fast as if you didn't use a function at all. Hopefully compilers would be smart enough not to bother with the usual register saving and restoring when they do this.

Are you referring to what I said about function tables? If so, you can't inline a function that's called indirectly. Or at least, it won't be inlined where you call it, of course.
 

CodeSlinger

New member
Exophase is correct about the large switch being faster than an array of function pointers. As he said the compiler converts a switch statement into a jump table and not a huge if else if mess. Well saying that some compilers do compile it into a huge if else if mess but these compilers should be avoided like the plague.

Personally I prefer using functions inside the switch statement to handle the opcodes because they can be taylored to be used for more than one opcode. You can use the function Opcode8BitAdd( ) to handle Add a,b and Add a,c etc. This minimises duplicate code so one bug fix fixes all. These functions should be made inline which is something I've yet to do on my gameboy and master system emulator. I'll start optimizing my code after I've emulated sound, this way I can study exactly how much cycles im saving. I only ever optimize after I've finished coding.
 

givemeachance

New member
Sorry, but I don't think I understand what you mean. Can you elaborate?

I will take as example "Miracle GB's" source.

This function :
Code:
// Function name: UpdateLCD
// Variables: None
// Purpose: Update the Gameboy LCD screen.
int UpdateScreen()
{
	int i;
	int j;

	SDL_Rect plotRectangle;

	for (j = 0; j < 144; j++)
	{
		for (i = 0; i < 160; i++)
		{
			plotRectangle.x = i * 3;
			plotRectangle.y = j * 3;
			plotRectangle.w = 3;
			plotRectangle.h = 3;

			switch(screenData[(j * 160) + i])
			{
			case 0:
				{
					SDL_FillRect(screen, &plotRectangle, color_white);
				}
			break;
			case 1:
				{
					SDL_FillRect(screen, &plotRectangle, color_light_grey);
				}
			break;
			case 2:
				{
					SDL_FillRect(screen, &plotRectangle, color_dark_grey);
				}
			break;
			case 3:
				{
					SDL_FillRect(screen, &plotRectangle, color_black);
				}
			break;
			}
		}
	}

	SDL_Flip(screen);
	return 0;
}

Looping through all pixels , and renders every single pixel one by one.

Now , imagine having this :

Code:
static void updateScreen(const SDL_Surface*& img,const SDL_Surface*& screen)
{
	renderSurface(img);
	SDL_Flip(screen);
}

Then in CPU.C , you can just write the pixels in the surface at once:

Code:
...outside main : SDL_Surface* screen_surface = NULL;

Then we modify the following code:
Code:
		if (IOregister_LCDC & BIT_0)
		{
			//----------------------------------------//
			// Draw the background into the screen    //
			// buffer.  These are always 8 x 8.       //
			//----------------------------------------//
			while (x < 160)
			{
				data1 = memory[BGTileData + (tileNumber * 16) + (borderY * 2)];
				data2 = memory[BGTileData + (tileNumber * 16) + (borderY * 2) + 1];

				while (borderX > 0)
				{
					color = (data1 & borderX) ? 1 : 0;
					color += (data2 & borderX) ? 2 : 0;

					if (color == 3)
						screenData[(y * 160) + x] = ((IOregister_BGP & BIT_7) >> 6) + ((IOregister_BGP & BIT_6) >> 6);
					if (color == 2)
						screenData[(y * 160) + x] = ((IOregister_BGP & BIT_5) >> 4) + ((IOregister_BGP & BIT_4) >> 4);
					if (color == 1)
						screenData[(y * 160) + x] = ((IOregister_BGP & BIT_3) >> 2) + ((IOregister_BGP & BIT_2) >> 2);
					if (color == 0)
						screenData[(y * 160) + x] = ((IOregister_BGP & BIT_1)) + ((IOregister_BGP & BIT_0));

					x++;

					if (x >= 160)
						break;

					borderX >>= 1;
				}
			}
		}

Into this:

Code:
forcedinlined void fillScreenSurface(SDL_Surface*& s,const TVirtualMachine* vm)
{
		if ( !(vm->regs.IOregister_LCDC & vm->flags.BIT_0) )return;
			//----------------------------------------//
			// Draw the background into the screen    //
			// buffer.  These are always 8 x 8.       //
			//----------------------------------------//
			for (;x < 160;x++)
			{
				const register data1& = vm->memory[vm->tileInfo->BGTileData + (vm->tileInfo->tileNumber * 16) + (vm->tileInfo->borderY * 2)];
				const register data2& = vm->memory[vm->tileInfo->BGTileData + (vm->tileInfo->tileNumber * 16) + (vm->tileInfo->borderY * 2) + 1];

				for (;borderX > 0,x<160;borderX >>=1)
				{
					color = (data1 & vm->tileInfo->borderX) ? 1 : 0;
					color += (data2 & vm->tileInfo->borderX) ? 2 : 0;
					switch(color)
					{
						case black :
						write 0x00 , 0x00 , 0x00 to surface
						etc
					}
					x++;
				}
}

I think you get my point...
 
Last edited:

Exophase

Emulator Developer
But his code scales, as far as I can tell yours doesn't. Not that his is the proper way to do it. If you mean that the surface should be scaled after it's completely written to then I would generally agree. There are a lot of other things that can improve the code though.
 

givemeachance

New member
But his code scales, as far as I can tell yours doesn't. Not that his is the proper way to do it. If you mean that the surface should be scaled after it's completely written to then I would generally agree. There are a lot of other things that can improve the code though.

Yeah , with scaling of course...

Can you please share some more tips? im really really really really :)nemu: ) interested!!

Thanks.
 

Pixman

New member
Would be cool if at the end of this everyone could post their effots for other to educate themselves and get some ideas... I long for it! :)
Greets,
Pix
 

electrophyte

New member
Hey guys

I started to program my gameboy emulator at the beginning of this year. I already have implemented very much of the system. The only thing which I have to do now, is to implement sound.
I intended to do this with DirectSound but I have no idea how to create sounds with it. I think I have to create a rectangular wave in a buffer and then let it play, but how should I do this. (I guess I can't allocate memory for each new tone). Can anyone help me?
 

KickTheChair

New member
Hi! I've been testing my emulator recently and I found that something weird happens while emulating Tetris game. If anyone has that rom, check opcode at address 0x02F0. For some reason when I load it with my app and with hex editor it shows me opcode 0x28 (JR Z), however when I load it in no$gmb debugger it shows 0x76 (halt). :eek: Can anyone tell me why does this happen?
 

HyperHacker

Raving Lunatic
Your app and the hex editor both show the same thing, but it works fine in No$GMB? Is that address actually being executed?
 

KickTheChair

New member
Yes it is, when I run it in no$gmb everything works fine, it goes into halt state, but when I run it in my emu it goes into loop, because instead of HALT I've got JR Z opcode.

Edit:
I changed 0x28 to 0x76 in rom and it works ok now. Although I still wonder why it no$gmb changed that opcode by itself.
 
Last edited:

Exophase

Emulator Developer
It didn't work okay before? It's supposed to be 0x28.

My first guess was that the transformation that No$ made is an example of "idle loop elimination." The little loop that is entered can only be exited if the byte at 0xFF85 is zero. Because this is HRAM, only software can set this location, and because of that it can only happen when an interrupt occurs.

However, it seems that it starts this way as soon as the game is loaded, so it could just be a speed hack done for Tetris. I tried changing the ROM so it has backup RAM in the hopes that I'd fool it but that halt is still there. So I don't really know what the deal is.
 

KickTheChair

New member
Well, there's one thing I still don't understand: if it changes 0x28 to 0x76 then why does it leave the next byte unchanged. After JR Z opcode comes value that's added to PC. For some reason next byte is left as 0xFB which in this case is EI opcode so it doesn't really matter, because interrupts are already enabled by the program gets to this part of code. Although it might as well could have been some other value that would mess with execution of the game.
 

Aire83

New member
hi guys im trying to emulate the cpu, but i don't know when to increase the Program Counter and the amount to increase
 

CodeSlinger

New member
The instructions are only 8bit so you'll only need to increment the program counter the once. It is up to you if you want to increase the program counter before or after executing the instruction. I believe the original hardware does it after the execution of the instruction but I chose to do it before as I found it easier to work with and have had no issues.

However remember that the gameboy z80 has the CB opcode prefix so you'll also need to increment the program counter when reading a CB opcode.
 

Aire83

New member
i've coded the cpu but im still confused with the paired registers.

to emulate these registers do u make a variable for them like BC,DE...?

for example opcode 0x01:
LD n , nn

do u guys load nn into register BC ? or B and C separately then & the two later.

some how my emulator excecute opcode ( ignore the line number )

Line Number : 1 Executing Opcode 0
Line Number : 2 Executing Opcode C3
Line Number : 3 Executing Opcode F3
Line Number : 4 Executing Opcode 31
Line Number : 5 Executing Opcode 21
Line Number : 6 Executing Opcode 1
Line Number : 7 Executing Opcode AF
Line Number : 8 Executing Opcode CD
Line Number : 9 Executing Opcode 57
Line Number : 10 Executing Opcode 7A
Line Number : 11 Executing Opcode 22
Line Number : 12 Executing Opcode B
Line Number : 13 Executing Opcode 79
Line Number : 14 Executing Opcode B0
Line Number : 15 Executing Opcode 20
Line Number : 16 Executing Opcode 7A
Line Number : 17 Executing Opcode 22
Line Number : 18 Executing Opcode B
Line Number : 19 Executing Opcode 79
Line Number : 20 Executing Opcode B0
Line Number : 21 Executing Opcode 20
Line Number : 22 Executing Opcode 7A
Line Number : 23 Executing Opcode 22
Line Number : 24 Executing Opcode B
Line Number : 25 Executing Opcode 79
Line Number : 26 Executing Opcode B0
Line Number : 27 Executing Opcode 20
Line Number : 28 Executing Opcode 7A
Line Number : 29 Executing Opcode 22
Line Number : 30 Executing Opcode B
...

it just keep on repeating Line
Number : 18 Executing Opcode B
Line Number : 19 Executing Opcode 79
Line Number : 20 Executing Opcode B0
Line Number : 21 Executing Opcode 20
Line Number : 22 Executing Opcode 7A
Line Number : 23 Executing Opcode 22

when it reaches opcode 0x20 to do condition relative jump
Zflag is always 1 so it keeps on jumping
 

CodeSlinger

New member
The pair registers and the singular registers should not be treated seperately. If you really wanted to do it that way then you can but you'll be in for a world of debugging hurt. The pair registers are just the singular registers combined together to double their size. If you change one of the singular registers value then it will modify the pair register. If you change the pair register value it will change both the singular register value.

The best way to emulate this (assuming you are using c++ or c) is with unions. A union is like a structure but the difference is each of its elements share the same memory space. This is perfect for what you need to emulate the reigsters.

Code:
union Register
{
  unsigned short reg ;
  struct
  {
   unsigned char lo ;
   unsigned char hi ;
  };
};

Register AF;

The unsigned short vairable labelled "reg" takes up 2 bytes of memory (which is exactly the same as the pair register) and each of the "lo" and "hi" variables take up 1 byte of memory, however this memory is shared with the "reg" variable.

This means if you change the value of "reg" it will also change the values of both "hi" and "lo". However if you change the value of "hi" it wont change the value of "lo" but it will change the value of the hi byte in "reg".

This way whenever you need to work with register A you can refer to it as AF.hi. Whenever you want to work with register F you can refer to it as AF.lo and whenever you want to work with pair register AF you can refer to it as AF.reg

This is the best way (imo) to emulate the Z80 registers.

Good luck and have fun
 
Last edited:

ShizZy

Emulator Developer
So it's been a while since I have posted here. Basically, I found out that a software engineering course I am taking next term was java based, and I had never written a line of java before in my life. So I decided to write a quick GB emu in it to pick up the language. Here's about 2 days worth of work:



I'm hoping to give it another day or two and at least get a few commercial games working. Sadly, I think I've learned all the Java I can from this project, and the rest is just tedious debugging :-(

(Oh and... after having to 2's complement signed values from mem by hand its safe to say I hate Java)
 

Top