«深入理解Android:Java虚拟机ART»–elf章的内容
As you can see from the description above, an ELF file consists of two sections – an ELF header, and file data. The file data section can consist of a program header table describing zero or more segments, a section header table describing zero or more sections, that is followed by data referred to by entries from the program header table, and the section header table. ==Each segment contains information that is necessary for run-time execution of the file, while sections contain important data for linking and relocation==. Figure 1 illustrates this schematically.
The ELF header is 32 bytes long, and identifies the format of the file. It starts with a sequence of four unique bytes that are 0x7F followed by 0x45, 0x4c, and 0x46 which translates into the three letters E, L, and F.
Among other values, the header also indicates
Debian GNU/Linux offers the readelf command that is provided in the GNU ‘binutils’ package. Accompanied by the switch -h (short version for “–file-header”) it nicely displays the header of an ELF file. Listing 3 illustrates this for the command touch.
real command:
Sdk\ndk-bundle\toolchains\llvm\prebuilt\windows-x86_64\bin\x86_64-linux-android-readelf.exe -h git\demo\xCrash\src\native\libxcrash\obj\local\x86_64\libxcrash.so
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Shared object file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x0
Start of program headers: 64 (bytes into file)
Start of section headers: 382432 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 8
Size of section headers: 64 (bytes)
Number of section headers: 39
Section header string table index: 38
The program header shows the segments used at run-time, and tells the system how to create a process image. The header from Listing 2 shows that the ELF file consists of 9 program headers that have a size of 56 bytes each table item, and the first header starts at byte 64.
Again, the readelf command helps to extract the information from the ELF file. The switch -l (short for –program-headers or –segments) reveals more details as shown in Listing 4.
real command:
Sdk\ndk-bundle\toolchains\llvm\prebuilt\windows-x86_64\bin\x86_64-linux-android-readelf.exe -l git\demo\xCrash\src\native\libxcrash\obj\local\x86_64\libxcrash.so
Elf file type is DYN (Shared object file)
Entry point 0x0
There are 8 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000001c0 0x00000000000001c0 R 8
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x00000000000135e8 0x00000000000135e8 R E 1000
LOAD 0x0000000000013830 0x0000000000014830 0x0000000000014830
0x00000000000009a0 0x0000000000001648 RW 1000
DYNAMIC 0x0000000000013a68 0x0000000000014a68 0x0000000000014a68
0x0000000000000240 0x0000000000000240 RW 8
NOTE 0x0000000000000200 0x0000000000000200 0x0000000000000200
0x00000000000000bc 0x00000000000000bc R 4
GNU_EH_FRAME 0x0000000000013154 0x0000000000013154 0x0000000000013154
0x0000000000000494 0x0000000000000494 R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 10
GNU_RELRO 0x0000000000013830 0x0000000000014830 0x0000000000014830
0x00000000000007d0 0x00000000000007d0 RW 10
Section to Segment mapping:
Segment Sections...
00
01 .note.android.ident .note.gnu.build-id .dynsym .dynstr .hash .gnu.version .gnu.version_d .gnu.version_r .rela.dyn .rela.plt .plt .text .rodata .eh_frame .eh_frame_hdr
02 .fini_array .data.rel.ro .init_array .dynamic .got .got.plt .data .bss
03 .dynamic
04 .note.android.ident .note.gnu.build-id
05 .eh_frame_hdr
06
07 .fini_array .data.rel.ro .init_array .dynamic .got .got.plt
The third part of the ELF structure is the section header. It is meant to list the single sections of the binary. The switch -S (short for –section-headers or –sections) lists the different headers. As for the touch command, there are 27 section headers, and Listing 5 shows the first four of them plus the last one, only. Each line covers the
windows real command:
Sdk\ndk-bundle\toolchains\llvm\prebuilt\windows-x86_64\bin\x86_64-linux-android-readelf.exe -S git\demo\xCrash\src\native\libxcrash\obj\local\x86_64\libxcrash.so
Ubuntu real command:
Sdk/ndk/21.3.6528147/toolchains/llvm/prebuilt/linux-x86_64/bin/x86_64-linux-android-readelf
或使用环境变量中指定的:
~/Android/Source/android-9.0.0_r3$ which readelf
/usr/bin/readelf
There are 39 section headers, starting at offset 0x5d5e0:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .note.android.ide NOTE 0000000000000200 00000200
0000000000000098 0000000000000000 A 0 0 2
[ 2] .note.gnu.build-i NOTE 0000000000000298 00000298
0000000000000024 0000000000000000 A 0 0 4
[ 3] .dynsym DYNSYM 00000000000002c0 000002c0
0000000000000b70 0000000000000018 A 4 1 8
[ 4] .dynstr STRTAB 0000000000000e30 00000e30
00000000000005fc 0000000000000000 A 0 0 1
[ 5] .hash HASH 0000000000001430 00001430
0000000000000374 0000000000000004 A 3 0 8
[ 6] .gnu.version VERSYM 00000000000017a4 000017a4
00000000000000f4 0000000000000002 A 3 0 2
[ 7] .gnu.version_d VERDEF 0000000000001898 00001898
000000000000001c 0000000000000000 A 4 1 4
[ 8] .gnu.version_r VERNEED 00000000000018b4 000018b4
0000000000000040 0000000000000000 A 4 2 4
[ 9] .rela.dyn RELA 00000000000018f8 000018f8
0000000000000708 0000000000000018 A 3 0 8
[10] .rela.plt RELA 0000000000002000 00002000
0000000000000990 0000000000000018 AI 3 11 8
[11] .plt PROGBITS 0000000000002990 00002990
0000000000000670 0000000000000010 AX 0 0 16
[12] .text PROGBITS 0000000000003000 00003000
000000000000c6f6 0000000000000000 AX 0 0 16
[13] .rodata PROGBITS 000000000000f700 0000f700
0000000000002070 0000000000000000 A 0 0 16
[14] .eh_frame PROGBITS 0000000000011770 00011770
00000000000019e4 0000000000000000 A 0 0 8
[15] .eh_frame_hdr PROGBITS 0000000000013154 00013154
0000000000000494 0000000000000000 A 0 0 4
[16] .fini_array FINI_ARRAY 0000000000014830 00013830
0000000000000010 0000000000000008 WA 0 0 8
[17] .data.rel.ro PROGBITS 0000000000014840 00013840
0000000000000220 0000000000000000 WA 0 0 16
[18] .init_array INIT_ARRAY 0000000000014a60 00013a60
0000000000000008 0000000000000008 WA 0 0 8
[19] .dynamic DYNAMIC 0000000000014a68 00013a68
0000000000000240 0000000000000010 WA 4 0 8
[20] .got PROGBITS 0000000000014ca8 00013ca8
0000000000000010 0000000000000000 WA 0 0 8
[21] .got.plt PROGBITS 0000000000014cb8 00013cb8
0000000000000348 0000000000000000 WA 0 0 8
[22] .data PROGBITS 0000000000015000 00014000
00000000000001d0 0000000000000000 WA 0 0 16
[23] .bss NOBITS 0000000000015200 00014200
0000000000000c78 0000000000000000 WA 0 0 64
[24] .comment PROGBITS 0000000000000000 000141d0
0000000000000065 0000000000000001 MS 0 0 1
[25] .debug_str PROGBITS 0000000000000000 00014235
0000000000006eff 0000000000000001 MS 0 0 1
[26] .debug_loc PROGBITS 0000000000000000 0001b134
0000000000014b4c 0000000000000000 0 0 1
[27] .debug_abbrev PROGBITS 0000000000000000 0002fc80
00000000000026fa 0000000000000000 0 0 1
[28] .debug_info PROGBITS 0000000000000000 0003237a
0000000000018d5f 0000000000000000 0 0 1
[29] .debug_ranges PROGBITS 0000000000000000 0004b0d9
00000000000018b0 0000000000000000 0 0 1
[30] .debug_macinfo PROGBITS 0000000000000000 0004c989
0000000000000012 0000000000000000 0 0 1
[31] .debug_pubnames PROGBITS 0000000000000000 0004c99b
00000000000013f5 0000000000000000 0 0 1
[32] .debug_pubtypes PROGBITS 0000000000000000 0004dd90
00000000000026ed 0000000000000000 0 0 1
[33] .debug_line PROGBITS 0000000000000000 0005047d
0000000000009205 0000000000000000 0 0 1
[34] .debug_aranges PROGBITS 0000000000000000 00059682
0000000000000060 0000000000000000 0 0 1
[35] .note.gnu.gold-ve NOTE 0000000000000000 000596e4
000000000000001c 0000000000000000 0 0 4
[36] .symtab SYMTAB 0000000000000000 00059700
0000000000002058 0000000000000018 37 224 8
[37] .strtab STRTAB 0000000000000000 0005b758
0000000000001cd7 0000000000000000 0 0 1
[38] .shstrtab STRTAB 0000000000000000 0005d42f
00000000000001ac 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
hook 关系比较大的几个 section 是:
.dynstr
:保存了所有的字符串常量信息。.dynsym
:保存了符号(symbol)的信息(符号的类型、起始地址、大小、符号名称在 .dynstr
中的索引编号等)。函数也是一种符号。.text
:程序代码经过编译后生成的机器指令。.dynamic
:供动态链接器使用的各项信息,记录了当前 ELF 的外部依赖,以及其他各个重要 section 的起始位置等信息。.got
:Global Offset Table。用于记录外部调用的入口地址。动态链接器(linker)执行重定位(relocate)操作时,这里会被填入真实的外部调用的绝对地址。.plt
:Procedure Linkage Table。外部调用的跳板,主要用于支持 lazy binding 方式的外部调用重定位。(Android 目前只有 MIPS 架构支持 lazy binding).rel.plt
:对外部函数直接调用的重定位信息。.rel.dyn
:除 .rel.plt
以外的重定位信息。(比如通过全局函数指针来调用外部函数)graph LR
.dynamic-->当前ELF的外部依赖
.dynamic-->其他各个重要section的起始位置等信息
如果你理解了动态链接的过程,我们再回头来思考一下“.got”和“.plt”它们的具体含义。
PLT 和 GOT 记录是一一对应的,并且 GOT 表第一次解析后会包含调用函数的实际地址。既然这样,那 PLT 的意义究竟是什么呢?PLT 从某种意义上赋予我们一种懒加载的能力。当动态库首次被加载时,所有的函数地址并没有被解析。下面让我们结合图来具体分析一下首次函数调用,请注意图中黑色箭头为跳转,紫色为指针。
– 跳转 GOT 表的指令(jmp *GOT[n])。 – 为上面提到的第 0 条解析地址函数准备参数。 – 调用 PLT[0],这里 resovler 的实际地址是存储在 GOT[2] 。
在解析前 GOT[n] 会直接指向 jmp *GOT[n] 的下一条指令。在解析完成后,我们就得到了 func 的实际地址,动态加载器会将这个地址填入 GOT[n],然后调用 func。
如果你对上面的这个调用流程还有疑问,你可以参考《GOT 表和 PLT 表》这篇文章,它里面有一张图非常清晰。
当第一次调用发生后,之后再调用函数 func 就高效简单很多。首先调用 PLT[n],然后执行 jmp *GOT[n]。GOT[n] 直接指向 func,这样就高效的完成了函数调用。 总结一下,因为很多函数可能在程序执行完时都不会被用到,比如错误处理函数或一些用户很少用到的功能模块等,那么一开始把所有函数都链接好实际就是一种浪费。为了提升动态链接的性能,我们可以使用 PLT 来实现延迟绑定的功能。
对于函数运行的实际地址,我们依然需要通过 GOT 表得到,整个简化过程如下:
看到这里,相信你已经有了如何 Hack 这一过程的初步想法。这里业界通常会根据修改 PLT 记录或者 GOT 记录区分为 GOT Hook 和 PLT Hook,但其本质原理十分接近。
安卓中的动态链接器程序是 linker。源码在 这里。
动态链接(比如执行 dlopen)的大致步骤是:
mmap
预留一块足够大的内存,用于后续映射 ELF。(MAP_PRIVATE
方式)mmap
把所有类型为 PT_LOAD
的 segment 依次映射到内存中。.rel.plt
, .rela.plt
, .rel.dyn
, .rela.dyn
, .rel.android
, .rela.android
。动态链接器需要逐个处理这些 .relxxx
section 中的重定位诉求。根据已加载的 ELF 的信息,动态链接器查找所需符号的地址(比如 libtest.so 的符号 malloc
),找到后,将地址值填入 .relxxx
中指明的目标地址中,这些“目标地址”一般存在于.got
或 .data
中。DT_INIT
和 DT_INIT_ARRAY
)。各 ELF 的构造函数是按照依赖关系逐层调用的,先调用被依赖 ELF 的构造函数,最后调用 libtest.so 自己的构造函数。(ELF 也可以定义自己的析构函数(destructor),在 ELF 被 unload 的时候会被自动调用)graph LR
dlopen("动态链接(执行dlopen)")-->check("检查已加载的 ELF 列表")
dlopen-->read(".dynamic section 中读取 libtest.so 的外部依赖的 ELF 列表")
dlopen-->loadEach("逐个加载列表中的 ELF。加载步骤")
loadEach-->mmap("用 mmap 预留一块足够大的内存,用于后续映射 ELF")
loadEach-->mmapPT_LOAD("读ELF的PHT用mmap把所有类型为PT_LOAD的segment依次映射到内存中")
loadEach-->dynamic("从.dynamic segment中读取各信息项,主要是各个section的虚拟内存相对地址,计算绝对地址。")
loadEach-->|relocate|relocate("逐个处理.relxxx section中的重定位诉求,找所需符号的地址,找到后将地址值填入.relxxx 中指明的目标地址.got或.data中")
loadEach-->ELFRefCount("ELF 的引用计数加一")
dlopen-->cons("逐个调用列表中 ELF 的构造函数constructor")
graph LR
issue("直接替换掉地址中的方法有三个问题")-->基地址-->|solution|maps("/proc/self/maps")
issue-->内存访问权限-->|solution|mprotect
issue-->指令缓存-->|solution|__builtin___clear_cache
总结一下 xhook 中执行 PLT hook 的流程:
PT_LOAD
且 offset 为 0
的 segment。计算 ELF 基地址。PT_DYNAMIC
的 segment,从中获取到 .dynamic
section,从 .dynamic
section中获取其他各项 section 对应的内存地址。.dynstr
section 中找到需要 hook 的 symbol 对应的 index 值。mprotect
修改访问权限为可读也可写。mprotect
修改过内存访问权限,现在还原到之前的权限。https://linux.die.net/man/1/readelf
https://linuxtools-rst.readthedocs.io/zh_CN/latest/tool/readelf.html
https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
https://github.com/iqiyi/xHook/blob/master/docs/overview/android_plt_hook_overview.zh-CN.md