What's new

Books on assembly language for Win32 OS's

cufunha

New member
do you know names of good books on win32 assembly covering modern processor assembly, windows xp, common assembly algoritms, mmx, etc...?
 

smcd

Active member
"Art of Assembly" is pretty good, and online for free. Also, "Assembly Language: Step by Step" is a decent book on the subject. As far as specifically for the Windows platform, you'll probably be best off googling for stuff (iczelion's tutorials are great). Intel also makes available paper versions of their processor manuals - get these for reference as well(they're free!)
 

Cyberman

Moderator
Moderator
cufunha said:
do you know names of good books on win32 assembly covering modern processor assembly, windows xp, common assembly algoritms, mmx, etc...?
First Tons of books on Assembly language for modern processors..
Let's try some manufacturers sites first :)
AMD Athalon XP (bleah bleah) MMX + 3dnow extensions
AMD Opteron (bleah bleah) Various information reguarding abusage

Intel P4 stuff skip down to MANUAL and pick what you feel best suits your needs.

If you are new to programing.. hmmm then maybe you should start with something less prone to lock up your computer. MMX instructions can be FATAL to your computers instruction execution. You cannot mix Floating point and MMX instructions for example. They use basically shadow same stack in the processor you either perform floating point or MMX instructions not both. Also you need to disable interrupts during MMX instruction sequences. After the sequences you must set a context switch to turn off MMX in case any FP instructions are executed when interrupts are enabled. I know fun stuff but that's life. You can't expect other programs to check for the MMX execution state.

Algorythm wise, depends on what you are doing. You might be better to look at books on digital signal processing for audio and video information instead. Algorythms imply a method of performing an action, and are inherently not processor specific. Implementing an algorythm optimally might be highly processor dependant. Also note you cannot use processor intrensics as MS supplies with there compilor. Some brain dead programer wrote it for there compilor and to be blunt it is TERRIBLE. There are ZERO none NYET zilch optimizations in using the processor intrensics. Essentially you my as well NOT use MMX because the implementation MS has is worse than not using processor intrensics to perform embarassingly paraellel code. As one person put it "It's utterly pathetic". I also believe you will have to implement your assembly seperately from C code instead of inline. It has been said MS has removed inline pass through assembly for there compilors post VC++6. I guess they decided there MMX intrensics optimizations were good enough (sigh). This means you cannot put your assembly within a function and have the compilor handle the function construction and then handle substitution for parameters to your MMX SSE code. You have to handle all frame pointer manipulation etc. yourself. I guess yet another reason to use borlands compilor.

Cyb

PS: I edited your posts subject to better fit your question you might get more responses that way!
 
Last edited:

Doomulation

?????????????????????????
Hmm, so what I gather from that post of yours is that the ms MMX instructions generated by the compiler are TERRIBLE, and thus would be better off without them? Although we can come to agree that MMX, 3DNow!, SSE are good optimizations if used correctly? But it seems they might cause trouble when using inline assembly?
 

Cyberman

Moderator
Moderator
Doomulation said:
Hmm, so what I gather from that post of yours is that the ms MMX instructions generated by the compiler are TERRIBLE, and thus would be better off without them? Although we can come to agree that MMX, 3DNow!, SSE are good optimizations if used correctly? But it seems they might cause trouble when using inline assembly?

Example maybe?
From Virtual Dub's pages ..
Code:
#include <xmmintrin.h>
unsigned premultiply_alpha(unsigned px) {
	__m64 px8 = _m_from_int(px);
	__m64 px16 = _m_punpcklbw(px8, _mm_setzero_si64());
	__m64 alpha = px16;

	alpha = _m_punpckhwd(alpha, alpha);
	alpha = _m_punpckhwd(alpha, alpha);

	__m64 result16 = _m_psrlwi(_m_pmullw(px16, alpha), 8);
	unsigned x = _m_to_int(_m_packuswb(result16, result16));
	_mm_empty();
	return x;
}
Here are your 'results' with using there compilor intrensics
Example_Intrensics.png

Notice emms is in the WRONG place even in the generated code.

Cyb
 

smcd

Active member
Ive noticed this as well with several programs. Tried recompiling VBA with mmx/sse optimizations in vc++.net 2003 to see if there was any noticeable difference, and it made a good deal of games lock up and die.
 

Cyberman

Moderator
Moderator
sethmcdoogle said:
Ive noticed this as well with several programs. Tried recompiling VBA with mmx/sse optimizations in vc++.net 2003 to see if there was any noticeable difference, and it made a good deal of games lock up and die.
I don't think GCC supports MMX and SSE but it does support inline assembly just not as easily as I would like :p

I've used MMX and SSE with emulator work for a software GPU for the PSX (not much really to say for it but it did have 24 bit color output ;) and did sort of work). Anyhow one must be aware of poor compilor design when it comes to MS. And now you know why it locked up the emms instruction was before the end of the MMX instruction stream, this will cause the MMX mode to continue. Whenever a FP instruction is executed your CPU will DIE. Simper Fatalis

Cyb
 

smcd

Active member
(this thread is sort of heading off topic but) Eh well i just compile with "regular" instructions used in VC++ anyhow, was just testing if the SSE/MMX stuff would make a difference. I also use MinGW, just depends on my mood. I agree, gcc's inline assembly is rather disgusting in my opinion, but then again i don't like AT&T syntax either (you can use intel syntax but you have to tell it so)...
 

Doomulation

?????????????????????????
Cyberman said:
Example maybe?
From Virtual Dub's pages ..
Code:
#include <xmmintrin.h>
unsigned premultiply_alpha(unsigned px) {
	__m64 px8 = _m_from_int(px);
	__m64 px16 = _m_punpcklbw(px8, _mm_setzero_si64());
	__m64 alpha = px16;

	alpha = _m_punpckhwd(alpha, alpha);
	alpha = _m_punpckhwd(alpha, alpha);

	__m64 result16 = _m_psrlwi(_m_pmullw(px16, alpha), 8);
	unsigned x = _m_to_int(_m_packuswb(result16, result16));
	_mm_empty();
	return x;
}
Here are your 'results' with using there compilor intrensics
Example_Intrensics.png

Notice emms is in the WRONG place even in the generated code.

Cyb
Okay I see that the compiler does indeed do a poor job here, but the thing is I can read neither of the codes. I don't understand what it's supposed to do and where all the functions does and so. But in any case, the code should do the same, no? Speed is not that much of an issue no today's computer, and (afaik?) in some cases, the compiler can do better code than hand assembly when it comes to really complex stuff? Shrug, I don't know, but I don't care. The compiler may be bad, but it does its job, now doesn't it? :)
 

ector

Emulator Developer
If you are new to programing.. hmmm then maybe you should start with something less prone to lock up your computer. MMX instructions can be FATAL to your computers instruction execution. You cannot mix Floating point and MMX instructions for example. They use basically shadow same stack in the processor you either perform floating point or MMX instructions not both. Also you need to disable interrupts during MMX instruction sequences. After the sequences you must set a context switch to turn off MMX in case any FP instructions are executed when interrupts are enabled. I know fun stuff but that's life. You can't expect other programs to check for the MMX execution state.
Some errors here. You do have to do an emms after running a (preferably big) bunch of mmx instructions, and you can't mix it with floats, but turning off interrupts? No, you don't have to do that with any modern OS. Context switching works automatically nowadays, I'm not sure exactly how it works, but it does :p
 

RJARRRPCGP

The Rocking PC Wiz
Is it just me, or does SSE literally suck with Athlon XP processors. It seems that for Athlon XP processors, just raw C++ code is the best.

I noticed that Athlon XPs seem to always lose to Pentium 4s with SSE. Even lose to lower range Pentium 4 processors (around 2.4 ghz to 3.0 ghz).


What's up with that?
 

Top