What's new

128 -> 64 Bits

maurojk

New member
Ok... Hi all im new here in the forum.... i am brazilian and i am sorry about my bad english.

Ok.... i introduce myself for ask all members and mainly ask the staff about some good question....

now we are in the meaning of Windows Vista arrive... and windows Vista will finnaly run Pure 64 Bits aplication.

When we are in 64 Bits aplication running under a pure 64 Bits architecture(Athlon 64 X2 4400+ in Windows Vista) the theory says that all 128 bits conversion is larger easy that 128 bits to 32 bits(Pentium 4 in Windows XP)


so when this conversion is that size more "cheap" we belive that CPU bottleneck will be reduced.

And FPS problem in emulation will be smashed out....

But the real question is......

The staff are planning to release some 64 Bit version for this archtecture?

This theory is real? or is a insane afirmation?

can i Hope that i will play Zelda Wind Waker in my

Atholn 64 X2 4400+
1 Gbyte 400Mhz Dual Channel
GF FX6800 PCI-E 16x 512Mb eXTreme


?????????????????????????


tanks for atention and sory about my bad english....

cya guys
 

Lightning

Emulator Developer
In reality, the CPU still does 128bit processing via the SSE instruction set. That does not change in speed between 32 and 64bit processing. The reason you see an improvement in 32 to 64bit processing is that normal memory access will read/write 2x as much data in one access, and, under 64bit, you have an extra 8 general purpose registers (GPRs) to store data in, allowing for less memory access. When emulating, the extra registers are beneficial.

Now, even running under 64bit, Dolphin is currently programmed for 32bit and will not use the extra GPRs but will continue to access memory just as often. In fact, last I was aware of, Dolphin's dynarec only used the GPR and FPR registers but never used any of the extended instructions so the functionality of working with multiple values at once is not even there.
 

spotanjo3

Moderator
Moderator
Dont worry about your english. If they do not like it then that's their problem. We must respect a person's dialect, grammar, and vocabulary. If he says, "he done it," it is not "bad" English; the meaning is crystal clear. It is simply not Standard English. The learner must understand, however, that he will be discriminated against in some quarters if he does not use Standard English.

Compreender? Bom! :p

You need alot of patience because it is alot of work to do for this emulator. Can you buy nintendo game cube ?? If not, then just wait, ok ?;)

Yes, I speak portuguese! No, this is the place for English language only. :)

Cheers,

Rockmangames
 
Last edited:

Doomulation

?????????????????????????
maurojk said:
When we are in 64 Bits aplication running under a pure 64 Bits architecture(Athlon 64 X2 4400+ in Windows Vista) the theory says that all 128 bits conversion is larger easy that 128 bits to 32 bits(Pentium 4 in Windows XP)
When not using any SSE or extended instruction sets? Yes. Because on 32-bit processors, each register is 32 bits, the program needs to store 64-bit number in two registers. The same would hold true with 64 -> 128. However, this penalty is incredibly small you will *not* notice it. It might mean a few more instructions here and there but because they're stored in registers it's ultra fast.
 

BlueFalcon7

New member
So let me get this straight, the Dolphin Doesnt support SSE2, SSE3, 64 bit, or Dual core? Because after seeing how Mac utilizes SSE2, and SSE3 for Rosetta for Power PC emulation, Im pretty sure that the dolphin (or any GC emulator *hint hint*) could use those instruction sets. In fact I read up on it, and rosetta allows up to G3 Apps, and the Gekko CPU was a modification of the G3. So if it did support SSE2, and SSE3, Im pretty sure that there would be a big increase in FPS if optimized correctly.
 
OP
M

maurojk

New member
Of course the refil needs in Registers will fall down and the performance will speed up.... but if i ask you about the Math conversions....
surely is needed math conversions on floating point procedures that in 128 bits have a lot precision and mus need this precision in 32 bits for draw the grafics....

so...

for convert this math expression for a 64 bit floating point will not use a lot less instructions?

and raise our CPU bottleneck?
 

Doomulation

?????????????????????????
Your post doesn't make much sense, but...
For floating point operations, SSE or the like is used to process the data. Those registers are typically 128-bit.
Data transfer can be argued. It all depends on the amount of data processed. Common used data is stored in the processor's cache, but if you use a lot of data, then it cannot fit in the cache.
Aside from that, for 32-bit, transferring 8 bytes probably takes 2 cycles, whereas in 64-bit processors, it takes 1 cycle (because it 64-bit transfers 8 bytes per cycle and fills only one register).

In reality, though, you might get ~10% speed boost. It is a little, but not that much.
 

Lightning

Emulator Developer
The 64bit does allow for a higher precision, however, that same precision can be obtained by the SSE instructions in 32bit. Besides, The floating number has to be converted back to a 32bit (single precision instead of a double precision) for the GX chip data buffer that is emulated. The general registers of the PowerPC are also 32bit, not 64bit. So going to 64bit does not give you anything more than extra registers to work with.

As a result, there is still very little benefit going from 32 to 64bit chips when emulating the PowerPC, with the exception of extra registers to use. The true benefit is using SSE with the PowerPC for paired single instructions (working on 2 seperate floats at the same time). SSE can make it easier to emulate A[0,1] = B[0,1] + C[0,1] on floating numbers instead of doing A1 = B1 + C1, A2 = B2 + C2. The SSE version has 2 reads, 1 addition, 1 write, the non SSE version is 4 memory reads, 2 additions, 2 writes. You can see how quickly you can start to eat into the emulation speed.

Now, for all of this to be beneficial, it takes time to code, test, and enhance. I have not even begun on the dynarec yet for Gekko simply due to trying to correct bugs and flaws that would slow me down. I wish I knew what type of performance we will get out of the dynarec. I know it will be better than Dolphin's simply due to such enhancements but there is no way to even guess at what performance is seen as that is determined by how often instructions, like the paired single opcodes, are executed. You still have the standard 32bit math to do on integer values, reading and translating flags (overflow, underflow, equal, to name a few), etc.
 

BlueFalcon7

New member
Keep talking Lightning, Your making a little bit of sense to me. I Know a little bit about the SSE and floating point conversions. I was wondering a few things though... First of all, do you know if at all what SSE sets the dolphin uses for optimization? and Second of all, why are we talking about this in the dolphin forums? Is it just in hope that F|RES comes accross this one day and takes this into consideration?

But seriously, I have also read a little bit about rosetta, and it sais that SSE3 is crucial for efficiency. Its the difference between being able to run a program semi fluidly and not being able to play an MP3 with an EQ. But do you have plans for using SSE3 in the Gekko at all in the first release?

Lastly, You were talking about the dynarec. That sort of falls under the category of manipulating and deciphering codes, so what I was wondering is if you planned to incorporate any sort of HLE, such as rewriting codes and stuff like that. Or if you were going to do something where the emulator will skip codes that are less necesary to having the game run (kind of like the effect of the teaser release of the dolphin versus 1.03.2.) Except the emulator would be smarter, and the game wont crash because it would know what codes are needed to keep the game running.
 
Last edited:

Doomulation

?????????????????????????
Skipping inustructions is the worst thing an emulator can do. A dynarec would probably be way better than skipping instructions. Skipping can also be very dangerous.
 

Lightning

Emulator Developer
F|RES can correct me if I am wrong but I'm not aware of Dolphin using any SSE instructions but only relying on the general purpose and floating point registers. SSE3 is not required for emulation as it does not provide any benefit, although it may be able to be used for some small enhancements. Example, the ADDSUBPD command is the result of {A0 - B0, A1 + B1} where A and B are different XMM registers. However, the instructions required to load up A and B with the proper 4 values may not be worth the effort compared to doing an sub then an add.

Dynarec is simply taking a block of PowerPC instructions, interpreting them and generating an x86 version of code that creates the same result, then running the x86 version. This cuts out alot of back and forth work that an interpreter does (alot of memory hits). The dynarec can optimize some, example, the PowerPC requires 2 commands to move a 32bit value into a register, while the x86 only needs 1 command.

HLE is simply allowing a common function to be run more efficiently. A simple example is memcpy. There is a PowerPC version of memcpy embedded in all the games. Now, you can either run a dynarec or interpreter version of the embedded memcpy, wasting cycles decoding the opcodes and running them, or, you do a scan for the memcpy function and set a flag that the dynarec or interpreter look for. When the memcpy is called, the normal x86 C version of memcpy is used instead. You just cut out all the time required to interpret and run the function but caused the same result. The same can be done for any custom function that is written into the emulator that all games use (OS interrupt handling, dvd access, string functions, 3d math, you get the idea).

HLE is useless of course until you have something functional as you need to know the interpreted version works before HLEing. That way the HLE can create the same result in the registers and memory.

As Doomulation said, skipping commands is bad. You are better off finding more optimal ways to process the commands and get the same result.

BlueFalcon7, if you want, come by #Gekko on EFNet. I'm usually around and will freely chat :)
 
Last edited:

Top