6.29.5 AMD64 (x86_64) Assembler

The AMD64 assembler is a slightly modified version of the 386 assembler, and as such shares most of the syntax. Two new prefixes, .q and .qa, are provided to select 64-bit operand and address sizes respectively. 64-bit sizes are the default, so normally you only have to use the other prefixes. Also there are additional register operands R8-R15.

The registers lack the ’e’ or ’r’ prefix; even in 64 bit mode, rax is called ax. Additional register operands are available to refer to the lowest-significant byte of all registers: R8L-R15L, SPL, BPL, SIL, DIL.

The Linux-AMD64 calling convention is to pass the first 6 integer parameters in rdi, rsi, rdx, rcx, r8 and r9 and to return the result in rax and rdx; to pass the first 8 FP parameters in xmm0–xmm7 and to return FP results in xmm0–xmm1. So abi-code words get the data stack pointer in di and the address of the FP stack pointer in si, and return the data stack pointer in ax. The other caller-saved registers are: r10, r11, xmm8-xmm15. This calling convention reportedly is also used in other non-Microsoft OSs.

Windows x64 passes the first four integer parameters in rcx, rdx, r8 and r9 and return the integer result in rax. The other caller-saved registers are r10 and r11.

On the Linux platform, according to https://uclibc.org/docs/psABI-x86_64.pdf page 21 the registers AX CX DX SI DI R8 R9 R10 R11 are available for scratch.

The addressing modes for the AMD64 are:

\ running word A produces a memory error as the registers are not initialised ;-)
ABI-CODE A  ( -- )
    500        #               AX  MOV     \ immediate
        DX              AX  MOV     \ register
        200             AX  MOV     \ direct addressing
        DX  )           AX  MOV     \ indirect addressing
    40  DX  D)          AX  MOV     \ base with displacement
        DX  CX      I)  AX  MOV     \ scaled index
        DX  CX  *4  I)  AX  MOV     \ scaled index
    40  DX  CX  *4  DI) AX  MOV     \ scaled index with displacement

        DI              AX  MOV     \ SP Out := SP in
                            RET
END-CODE

Here are a few examples of an AMD64 abi-code words:

abi-code my+  ( n1 n2 -- n3 )
\ SP passed in di, returned in ax,  address of FP passed in si
8 di d) ax lea        \ compute new sp in result reg
di )    dx mov        \ get old tos
dx    ax ) add        \ add to new tos
ret
end-code
\ Do nothing
ABI-CODE aNOP  ( -- )
       DI  )       AX      LEA          \ SP out := SP in  
                           RET
END-CODE
\ Drop TOS
ABI-CODE aDROP  ( n -- )
   8   DI  D)      AX      LEA          \ SPout := SPin - 1
                           RET
END-CODE
\ Push 5 on the data stack
ABI-CODE aFIVE   ( -- 5 )
   -8  DI  D)      AX      LEA          \ SPout := SPin + 1
   5   #           AX  )   MOV          \ TOS := 5
                           RET
END-CODE
\ Push 10 and 20 into data stack
ABI-CODE aTOS2  ( -- n n )
   -16 DI  D)      AX      LEA          \ SPout := SPin + 2
   10  #       8   AX  D)  MOV          \ TOS - 1 := 10
   20  #           AX  )   MOV          \ TOS := 20
                           RET
END-CODE
\ Get Time Stamp Counter as two 32 bit integers
\ The TSC is incremented every CPU clock pulse
ABI-CODE aRDTSC   ( -- TSCl TSCh )
                           RDTSC        \ DX:AX := TSC
   $FFFFFFFF #     AX      AND          \ Clear upper 32 bit AX
  0xFFFFFFFF #     DX      AND          \ Clear upper 32 bit DX
       AX          R8      MOV          \ Tempory save AX
   -16 DI  D)      AX      LEA          \ SPout := SPin + 2
       R8      8   AX  D)  MOV          \ TOS-1 := saved AX = TSC low
       DX          AX  )   MOV          \ TOS := Dx = TSC high
                           RET
END-CODE
\ Get Time Stamp Counter as 64 bit integer
ABI-CODE RDTSC   ( -- TSC )
                           RDTSC        \ DX:AX := TSC
   $FFFFFFFF #     AX      AND          \ Clear upper 32 bit AX
   32  #           DX      SHL          \ Move lower 32 bit DX to upper 32 bit
       AX          DX      OR           \ Combine AX wit DX in DX
   -8  DI  D)      AX      LEA          \ SPout := SPin + 1
       DX          AX  )   MOV          \ TOS := DX
                           RET
END-CODE
VARIABLE V

\ Assign 4 to variable V
ABI-CODE V=4 ( -- )
       BX                  PUSH         \ Save BX, used by gforth
   V   #           BX      MOV          \ BX := address of V
   4   #           BX )    MOV          \ Write 4 to V
       BX                  POP          \ Restore BX
       DI  )       AX      LEA          \ SPout := SPin
                           RET
END-CODE
VARIABLE V

\ Assign 5 to variable V
ABI-CODE V=5 ( -- )
   V   #           CX      MOV          \ CX := address of V
   5   #           CX )    MOV          \ Write 5 to V
   DI )            AX      LEA          \ SPout := SPin
                           RET
END-CODE
ABI-CODE TEST2  ( -- n n )
   -16 DI  D)  AX          LEA          \ SPout := SPin + 2
   5   #       CX          MOV          \ CX := 5
   5   #       CX          CMP
   0= IF
       1   #   8   AX  D)      MOV      \ If CX = 5 then TOS - 1 := 1  <--
   ELSE
       2   #   8   AX  D)      MOV      \ else TOS - 1 := 2
   THEN
   6   #       CX          CMP
   0= IF
       3   #       AX  )       MOV      \ If CX = 6 then TOS := 3
   ELSE
       4   #       AX  )       MOV      \ else TOS := 4  <--
   THEN
                           RET
END-CODE
\ Do four loops. Expect : ( 4 3 2 1 -- )
ABI-CODE LOOP4  ( -- n n n n )
       DI          AX      MOV          \ SPout := SPin
   4   #           DX      MOV          \ DX := 4  loop counter
   BEGIN
       8   #           AX      SUB      \ SP := SP + 1
           DX          AX  )   MOV      \ TOS := DX
       1   #           DX      SUB      \ DX := DX - 1
   0= UNTIL
                           RET
END-CODE

Here’s a AMD64 example that deals with FP values:

abi-code my-f+  ( r1 r2 -- r )
\ SP passed in di, returned in ax,  address of FP passed in si
si )       dx mov         \ load fp
8 dx d)  xmm0 movsd       \ r2
dx )     xmm0 addsd       \ r1+r2
xmm0  8 dx d) movsd       \ store r
8 #      si ) add         \ update fp
di         ax mov         \ sp into return reg
ret
end-code