示例程序

/* a.c */
int shared = 1;
extern int shared;

int main() {
int a = 100;
swap(&a, &shared);
}
/*b.c*/
void swap(int *a, int *b) {
*b ^= *a ^= *b ^= *a;
}

链接过程

主要分为两步:

1.空间和地址分配

2.符号解析和重定位

空间和地址分配

主要涉及到相似节合并和虚拟地址空间的分配以及映射关系的建立,可以观察到VMA在链接之后被分配

root@L:/home/l/c++# objdump -h ab

ab: file format elf64-x86-64

Sections:
Idx Name Size VMA LMA File off Algn
0 .note.gnu.property 00000020 00000000004001c8 00000000004001c8 000001c8 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .text 00000084 0000000000401000 0000000000401000 00001000 2**0
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .eh_frame 00000058 0000000000402000 0000000000402000 00002000 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .data 00000004 0000000000404000 0000000000404000 00003000 2**2
CONTENTS, ALLOC, LOAD, DATA
4 .comment 0000002b 0000000000000000 0000000000000000 00003004 2**0
CONTENTS, READONLY
root@L:/home/l/c++# objdump -h a.o

a.o: file format elf64-x86-64

Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000035 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
1 .data 00000004 0000000000000000 0000000000000000 00000078 2**2
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000000 0000000000000000 0000000000000000 0000007c 2**0
ALLOC
3 .comment 0000002c 0000000000000000 0000000000000000 0000007c 2**0
CONTENTS, READONLY
4 .note.GNU-stack 00000000 0000000000000000 0000000000000000 000000a8 2**0
CONTENTS, READONLY
5 .note.gnu.property 00000020 0000000000000000 0000000000000000 000000a8 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 .eh_frame 00000038 0000000000000000 0000000000000000 000000c8 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
root@L:/home/l/c++# objdump -h b.o

b.o: file format elf64-x86-64

Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000004f 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .data 00000000 0000000000000000 0000000000000000 0000008f 2**0
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000000 0000000000000000 0000000000000000 0000008f 2**0
ALLOC
3 .comment 0000002c 0000000000000000 0000000000000000 0000008f 2**0
CONTENTS, READONLY
4 .note.GNU-stack 00000000 0000000000000000 0000000000000000 000000bb 2**0
CONTENTS, READONLY
5 .note.gnu.property 00000020 0000000000000000 0000000000000000 000000c0 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 .eh_frame 00000038 0000000000000000 0000000000000000 000000e0 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

符号解析和重定位

先来看一下结果

可以看到符号的地址已经变成了VMA

root@L:/home/l/c++# readelf -s ab

Symbol table '.symtab' contains 9 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS a.c
2: 0000000000000000 0 FILE LOCAL DEFAULT ABS b.c
3: 0000000000401035 79 FUNC GLOBAL DEFAULT 2 swap
4: 0000000000404000 4 OBJECT GLOBAL DEFAULT 4 shared
5: 0000000000404004 0 NOTYPE GLOBAL DEFAULT 4 __bss_start
6: 0000000000401000 53 FUNC GLOBAL DEFAULT 2 main
7: 0000000000404004 0 NOTYPE GLOBAL DEFAULT 4 _edata
8: 0000000000404008 0 NOTYPE GLOBAL DEFAULT 4 _end
root@L:/home/l/c++# readelf -s a.O
readelf: Error: 'a.O': No such file
root@L:/home/l/c++# readelf -s a.o

Symbol table '.symtab' contains 6 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS a.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
3: 0000000000000000 4 OBJECT GLOBAL DEFAULT 3 shared
4: 0000000000000000 53 FUNC GLOBAL DEFAULT 1 main
5: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND swap
root@L:/home/l/c++# readelf -s b.o

Symbol table '.symtab' contains 4 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS b.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
3: 0000000000000000 79 FUNC GLOBAL DEFAULT 1 swap

重定位表

typedef uint64_t Elf64_Addr;
typedef uint64_t Elf64_Xword;
typedef int64_t Elf64_Sxword
typedef struct
{
Elf64_Addr r_offset; /* Address */
Elf64_Xword r_info; /* Relocation type and symbol index */
Elf64_Sxword r_addend; /* Addend */
} Elf64_Rela;

本程序实例

可以用工具需要重定位的位置,这里有个问题,就是值都是地址减4感觉是个问题,call的话明明需要减去下一条指令的地址才对,有机会研究一下。

root@L:/home/l/c++# objdump -r a.o

a.o: file format elf64-x86-64

RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
000000000000001a R_X86_64_PC32 shared-0x0000000000000004
000000000000002a R_X86_64_PLT32 swap-0x0000000000000004


RELOCATION RECORDS FOR [.eh_frame]:
OFFSET TYPE VALUE
0000000000000020 R_X86_64_PC32 .text


root@L:/home/l/c++# objdump -d a.o

a.o: file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 48 83 ec 10 sub $0x10,%rsp
c: c7 45 fc 64 00 00 00 movl $0x64,-0x4(%rbp)
13: 48 8d 45 fc lea -0x4(%rbp),%rax
17: 48 8d 15 00 00 00 00 lea 0x0(%rip),%rdx # 1e <main+0x1e>
1e: 48 89 d6 mov %rdx,%rsi
21: 48 89 c7 mov %rax,%rdi
24: b8 00 00 00 00 mov $0x0,%eax
29: e8 00 00 00 00 call 2e <main+0x2e>
2e: b8 00 00 00 00 mov $0x0,%eax
33: c9 leave
34: c3 ret
/*
接下来分析一下hex的存储。
可以观察到偏移和append都是正确的(注意小端序)
具体字段ffffff fffffffffc应该是-4,到时候怎么用还得看下面
00000003 00000002 根据符号表的位置知道高三十二位代表索引,低32位则代表类型,盲猜类型应该根怎么重定位有关系
*/
root@L:/home/l/c++# readelf -x 2 a.o

Hex dump of section '.rela.text':
0x00000000 1a000000 00000000 02000000 03000000 ................
0x00000010 fcffffff ffffffff 2a000000 00000000 ........*.......
0x00000020 04000000 05000000 fcffffff ffffffff ................

小结

到了这里大概已经了解了汇编到链接的粗浅的过程,虽然没有看过源码的实现,但是也粗浅猜测一下,最重要的就是节表和elf头表,这两个表看起来可以定位elf中的所有元素,节中元素只需要考虑对节的相对偏移即可,接下来的每一步操作只要维护节表和头表即可,看到这里感觉字符串表完全是可以省略的东西,但是一想链接需要名字来识别,但是可执行文件估计就可以少了好多东西。

/*可以看到是有字符串表的*/
root@L:/home/l/c++# readelf -S ab
There are 9 section headers, starting at offset 0x3188:

Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .note.gnu.pr[...] NOTE 00000000004001c8 000001c8
0000000000000020 0000000000000000 A 0 0 8
[ 2] .text PROGBITS 0000000000401000 00001000
0000000000000084 0000000000000000 AX 0 0 1
[ 3] .eh_frame PROGBITS 0000000000402000 00002000
0000000000000058 0000000000000000 A 0 0 8
[ 4] .data PROGBITS 0000000000404000 00003000
0000000000000004 0000000000000000 WA 0 0 4
[ 5] .comment PROGBITS 0000000000000000 00003004
000000000000002b 0000000000000001 MS 0 0 1
[ 6] .symtab SYMTAB 0000000000000000 00003030
00000000000000d8 0000000000000018 7 3 8
[ 7] .strtab STRTAB 0000000000000000 00003108
0000000000000032 0000000000000000 0 0 1
[ 8] .shstrtab STRTAB 0000000000000000 0000313a
000000000000004d 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
D (mbind), l (large), p (processor specific)
/*可以看到是有指令可以移除字符串表的,哎,安全行业堪忧*/
root@L:/home/l/c++# ld a.o b.o -e main -o ab -s
root@L:/home/l/c++# readelf -S ab
There are 7 section headers, starting at offset 0x3070:

Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .note.gnu.pr[...] NOTE 00000000004001c8 000001c8
0000000000000020 0000000000000000 A 0 0 8
[ 2] .text PROGBITS 0000000000401000 00001000
0000000000000084 0000000000000000 AX 0 0 1
[ 3] .eh_frame PROGBITS 0000000000402000 00002000
0000000000000058 0000000000000000 A 0 0 8
[ 4] .data PROGBITS 0000000000404000 00003000
0000000000000004 0000000000000000 WA 0 0 4
[ 5] .comment PROGBITS 0000000000000000 00003004
000000000000002b 0000000000000001 MS 0 0 1
[ 6] .shstrtab STRTAB 0000000000000000 0000302f
000000000000003d 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
D (mbind), l (large), p (processor specific)

关于common

书中也讲了弱符号同名该如何处理,将了一种common块的知识,但是目前gcc貌似弃用了这玩意,直接在bind中表明这东西是弱符号。

fcommon选项都不起作用了

int printf(const char* format, ...);
extern int ext;
int weak;
int strong = 1;
__attribute__((weak)) int weak2 = 2;
int week2 = 3;

int main ()
{
printf("%d",week2);
return 0;
}
root@L:/home/l/c++# gcc -nostdlib -fno-exceptions -fno-unwind-tables -fno-stack-protector -fcommon -c a.c -o a.o
root@L:/home/l/c++# readelf -s a.o

Symbol table '.symtab' contains 10 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS a.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
3: 0000000000000000 0 SECTION LOCAL DEFAULT 5 .rodata
4: 0000000000000004 4 OBJECT GLOBAL DEFAULT COM weak
5: 0000000000000000 4 OBJECT GLOBAL DEFAULT 3 strong
6: 0000000000000004 4 OBJECT WEAK DEFAULT 3 weak2
7: 0000000000000008 4 OBJECT GLOBAL DEFAULT 3 week2
8: 0000000000000000 43 FUNC GLOBAL DEFAULT 1 main
9: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND printf

杂记

重复代码消除

非常高视角的讨论一下,有个印象

c++的模板的实现,函数级别的链接,都涉及到代码的消除,否则会产生重定义的问题。具体的方法有,将不同函数放在不同的段,相同函数放在相同的段,链接的时候只保留一个。至于函数级别的链接,则涉及到无用函数的消除,但是会加大链接和汇编的成本。

c++与ABI

一句话来说是api是为了源码级别的兼容,在提供相应接口的os上,接口的行为是一样的。如posix标准,规定了一些列操作系统应该提供怎么样的接口的标准。而c库开发者若使用这些接口就很容易在不同的os上移植这些库。(粗显介绍)

至于ABI,则是应用二进制接口,就是不同的平台的二进制文件可以相互移植,基本上就是你windows的操作系统也可以处理linux的elf文件,看到这里,还是有点可能,毕竟这种文件格式的定义都是在c库中的实现,但是就算是文件结构可以合并到一起,又如何在一个机器上同时执行2种汇编代码呢?虚拟机吗?但是这不是兼容两种型号,是兼容多种型号,该如何实现?不知道了。