c++ - Properly specify constraint for rotate? -
i'm investigating potential speedups respect constant time rotate not violate standards.
a rotate on x86/x64 has following. simplicity, i'm going discuss rotating byte (so don't tangled in immediate-8 versus 16, 32 or 64):
- the "value" can in register or in memory
- the "count" can in register or immediate
the processor expects count
in cl
when using register. processor performs rotate masking lower 5 bits of count
.
below, value
x
, , count
y
.
template<> inline byte rotleft<byte>(byte x, unsigned int y) { __asm__ __volatile__("rolb %b1, %0" : "=mq" (x) : "ci" (y), "0" (x)); return x; }
since x
both read , write, think should using +
somewhere. can't assembler take it.
my question is, constraints represented correctly?
edit: based on jester's feedback, function changed to:
template<> inline byte rotleft<byte>(byte x, unsigned int y) { __asm__ __volatile__("rolb %b1, %0" : "+mq" (x) : "ci" (y)); return x; }
references:
you should use correct sized type operands rather trying force register correct size using operand modifer. in case truncate immediate operand correct size if it's big. david wohlferd said, don't want make asm statement volatile prevent optimizer removing if it's unused.
template<> inline byte rotleft<byte>(byte x, unsigned int y) { asm ("rolb %1, %0" : "+mq" (x) : "ci" ((byte)y)); return x; }
Comments
Post a Comment