amd has supported sse3 since athlon64 stepping e3/4 according to wikipedia.
I think CEN64 is only uses SSE2/3 at the moment anyway
<MooglyGuy> Reznor007 isn't even remotely technically apt and should not be speaking on topics he knows fuckall about
<MooglyGuy> If he doesn't even know the difference between the RDP (which I haven't vectorized at all) and the RSP (which I have started vectorizing), he should stay the hell out of the thread. I've already been discussing RSP SSE optimizations with MarathonMan, and there are a handful of opcodes he has vectorized which I haven't, and I have a handful of opcodes I've vectorized that he hasn't
Right, though I thought you meant normal SSE3 at first. I think CEN64 is only uses SSE2/3 at the moment anyway so AMD not supporting SSSE3 until Bulldozer in 2011 shouldn't have been a major hinderance to anyone working on a similar project.
ewvars[EW_R] = (ewdata[8] & 0xffff0000) | ((ewdata[12] >> 16) & 0x0000ffff);
ewvars[EW_G] = ((ewdata[8] << 16) & 0xffff0000) | (ewdata[12] & 0x0000ffff);
ewvars[EW_B] = (ewdata[9] & 0xffff0000) | ((ewdata[13] >> 16) & 0x0000ffff);
ewvars[EW_A] = ((ewdata[9] << 16) & 0xffff0000) | (ewdata[13] & 0x0000ffff);
...
ewdxvars[EWDX_DRDX] = (ewdata[10] & 0xffff0000) | ((ewdata[14] >> 16) & 0x0000ffff);
ewdxvars[EWDX_DGDX] = ((ewdata[10] << 16) & 0xffff0000) | (ewdata[14] & 0x0000ffff);
ewdxvars[EWDX_DBDX] = (ewdata[11] & 0xffff0000) | ((ewdata[15] >> 16) & 0x0000ffff);
ewdxvars[EWDX_DADX] = ((ewdata[11] << 16) & 0xffff0000) | (ewdata[15] & 0x0000ffff);
ewData1 = _mm_load_si128((__m128i*) (ewdata + 8));
ewData2 = _mm_load_si128((__m128i*) (ewdata + 12));
ewDataLo = _mm_unpacklo_epi64(ewData1, ewData2);
ewDataHi = _mm_unpackhi_epi64(ewData1, ewData2);
ewDataLo = _mm_shuffle_epi8(ewDataLo, ewShuffleKey);
ewDataHi = _mm_shuffle_epi8(ewDataHi, ewShuffleKey);
_mm_store_si128((__m128i*) (ewvarstest + 0), ewDataLo);
_mm_store_si128((__m128i*) (ewdxvarstest + 0), ewDataHi);
$ du -b cen64
212064 cen64
$ du -b cen64
208736 cen64
No love for Thuban?processors that don't have the SSSE3 instruction don't have the performance capabilities anyways.
Before two nights of stupid obvious vectorization:Code:$ du -b cen64 212064 cen64
... and after:Code:$ du -b cen64 208736 cen64
I find it interesting that mrmudlord here is no longer spewing hate at MarathonMan...
Several of your posts back in March-April seemed quite... I'll let them speak for themselves:Mainly because MarathonMan is on the level. I have zero issue with cycle accuracy itself.
Meh, performance isnt the problem.
3Ghz Ivy Bridges are no problem at all. The issue should be accuracy. Since everyone cares about that, who gives a toss about speed? Byuu clearly doesn't give a damn, nor do mamedev. So neither should you.
Don't forget, there is also the pixel/cycle exact RDP too to emulate.![]()
Oh yes, future proofing. Just like Crysis when it was released.
Who cares if it runs 10 seconds per frame if in 50 years time it will run fine. We must leave a legacy to our children.
If you are not a idiot, i can give you the oman archive alegend.
but people like Exophase are here, so they can [expletive removed].
I sense irony in mrmudlord's posts.Several of your posts back in March-April seemed quite... I'll let them speak for themselves:
Ah, even in real life I have trouble identifying when things are irony, sarcasm, etc.I sense irony in mrmudlord's posts.
Anyway, I was always under the impression that you could always have (slower) fallbacks so that having the newest and latest SSE's aren't a flat-out requirement. Was my impression wrong? If it wasn't, then I REALLY don't understand why you wouldn't want to take advantage of said newer SSEs...
You see? This is exactly the issue I have with byuu and angrylion. They just don't give a fuck about optimization at all. No matter how simple.
It boggles the mind as to how truly incompetant they really are, and how they are blinded by their dogmatic practises.
Seriously, they think SSE/NEON/etc is the devil. And they tout code == documentation. I mean, ffs.