Memory allocation in ASM
Currently I am working on a long arithmetic problem at the university. This problem is much more complicated than I described or than a task I shall be describing now, but here’s the thing: I needed some part of memory to be allocated from within my function. And I needed this to be done in assembly.
Thus, I created this piece of snippet code:
; void addition(int* x, int x_len, int* y, int y_len, int* &z, int* z_len);
global _Z8additionPiiS_iRS_S_
_Z8additionPiiS_iRS_S_:
enter 0, 0
%define p_x [ebp + 8]
%define x_len [ebp + 12]
%define p_y [ebp + 16]
%define y_len [ebp + 20]
%define p_z [ebp + 24]
%define p_z_len [ebp + 28]
addition_allocate_mem:
; push x_len * 4 ; bytes to allocate
push 3 * 4 ; bytes to allocate
call malloc ; call malloc()
add esp, 4 ; undo push
mov edx, eax ; save returned address from malloc
mov eax, p_z
mov [eax], edx ; z = malloc(...)
mov eax, p_z_len
mov [eax], dword 3 ; *z_len = elements
addition_fill_mem:
; fill with sample values
mov eax, p_z
mov eax, [eax]
add eax, 0 * 4
mov [eax], dword 4
; mov eax, p_z
add eax, 1 * 4
mov [eax], dword 3
; mov eax, p_z
; add eax, 2 * 4
add eax, 1 * 4
mov [eax], dword 2
leave
ret
There are, however, a few really interesting things in this code:
- naming of C++ functions, generated from assembly (name mangling)
- memory allocation itself
- returning data from function via pointers… in assembly!
To demonstrate how this stuff works, we need some C++ code which uses our assembly function:
#include <stdio.h>
#include <stdlib.h>
// our addition function for BIG integers
// arguments are as follows: number and its length; two first pairs are the operands
// and the last two arguments describe the returned big integer
// thus, the result is z = x + y
extern "C" void addition(int* x, int x_len, int* y, int y_len, int* &z, int* z_len);
// helper function to convert BIG integers to strings
char* bigint2str(int* x, int len) {
char *res = (char*) malloc((len + 1) * sizeof(char));
for (int i = 0; i < len; i++) {
res[i] = x[i] + '0';
}
res[len] = '\0';
return res;
}
int main() {
int* a = 0;
int a_len = 0;
// here we add nothing with nothing
// and storing the result in a big integer `a`
addition(0, 0, 0, 0, a, &a_len);
printf("a = %s\n", bigint2str(a, a_len));
return 0;
}
Comments in the code describe those moments which are important.
To compile these codes and link them into one executable, use these:
$ nasm -g -felf32 test.asm -o test_asm.o
$ g++ -g test.cpp -c -m32 -o test_c.o
$ g++ -g -m32 -o test test_asm.o test_c.o
Now, let’s talk about name mangling. It is really important. I shall not cover all the depths of this, only the parts, related to this article.
We see that our function,
void addition(int* x, int x_len, int* y, int y_len, int* &z, int* z_len);
is known as _Z8additionPiiS_iRS_S_
in the assembly code.
What’s the..? What are all these strange prefixes? - you might ask.
Here’s the convention:
- functions are named with the underscore and an uppercase letter
- function name’ length and the name itself follows that prefix
- arguments are stored as their types only
Argument type is encoded as well. For our example, we see these:
Pi
- that means, literally,pointer to integer
i
- that stands forinteger
S_
- that is the same asPi
, equal tosigned integer
, but for some reason (yes, I do not know why this happens) if you try to replace it withPi
, your function will not be found by a linkerRS_
- this isa reference to a pointer to integer
To get know those conventions better, you might refer to g++ internals reference.
You can decode demangled (encoded) function names as well. Just use c++filt
utility:
$ c++filt -n _Z8divisionPiiS_iRS_S_
division(int*, int, int*, int, int*&, int*)
‘til next time!