Printf: The Gift That Keeps on Giving (Away Your Data)

9 Jun

So, in an effort to be productive, I decided it would be cool to see if I could find some more vulnerabilities. I’ve published a couple in recent months and am awaiting CVE number assignment from a few more; MITRE appears to have ground to a halt of late when it comes to CVE assignment. It makes me wonder whether I should just accept that my requests have fallen into some unfathomable abyss. I won’t hold it too much against them, though, they’ve been going through a lot of late.

Anyway, I had to first pick a target. Of course, I always want to find bugs that are useful and exploitable and in software that is frequently used. Something that may hold value to the offensive team or wider security community. So, without too much further consideration I decided to check out the latest CVEs that were being actively exploited and use that as a mechanism to extrapolate a good target.

CISA (https://www.cisa.gov/known-exploited-vulnerabilities-catalog) is invaluable in this regard and produces regular reports offering a list of currently exploited vulnerabilities. I picked up some reports from the last few months and then worked through them. Eventually, after a short period of time and using our trusty and super unbiased AnchorAI landed on a viable target.

I won’t go into detail about what the software was because currently the vulnerability is undisclosed. This article’s purpose is to just talk about some interesting things identified while further research is conducted.

Anyway I got to the initial analysis. Identifying all binaries and deciding to check out some of those things. I picked out the binary that I thought would handle login and opened up the disassembler of choice.

I’m pretty comfortable with IDA pro. I’ve used it a lot in my career and find it to be generally quite wonderful. Some of the features, plugins and updates that have come out in later versions have been fantastic. However, in an effort to really see how accessible this kind of thing is – and fully understand/appreciate its power – I’ve been opening Ghidra more often lately and using that for my efforts.

It didn’t take long to follow some call paths until I reached a pretty big function that seemed to take a parameter, compare it to a list of strings and then call functions depending on what the string was.

void HandlerFunction(int param_1)
{
  int iVar1;
  FILE *pFVar2;
  size_t sVar3;
  tm *ptVar4;
  char *__format;
  char *pcVar5;
  char local_50 [32];
  time_t atStack_30 [3];

  pcVar5 = *(char **)(param_1 + 0xc);
  iVar1 = strcmp(pcVar5,"<REDACTED>");
  if (iVar1 == 0) {
    pcVar5 = FUN_00406f9c();
LAB_00405fe8:
    pcVar5 = FUN_00406d0c(pcVar5);
  }
  else {
    iVar1 = strcmp(pcVar5,"<REDACTED>");
    if (iVar1 == 0) {
      pcVar5 = (char *)FUN_00407240();
      goto LAB_00405fe8;
    }
    iVar1 = strcmp(pcVar5,"<REDACTED>");
    if (iVar1 == 0) {
      FUN_00402aa4();
      return;
    }
    iVar1 = strcmp(pcVar5,"<REDACTED>");
    if (iVar1 == 0) {
      FUN_00405eb4();
      return;
    }
    iVar1 = strcmp(pcVar5,"<REDACTED>");
    if (iVar1 == 0) {
      FUN_00402d98();
      return;
    }
  }

Decompilation of Handler Function

This function went on for miles and much of it looked like is shown in the screenshot. It did get a bit more interesting though once I got lower.

fprintf(pFVar2,"%s:%s:%d:argv[4] %s\n\n", <REDACTED>,0x7f8,
            *(undefined4 *)(param_1 + 0x10));
    fclose(pFVar2);
  }
  iVar1 = access("<REDACTED>",0);
  if ((iVar1 == 0) && (pFVar2 = fopen("/dev/console","w+"), pFVar2 != (FILE *)0x0)) {
    pcVar5 = getenv("<REDACTED>");
    fprintf(pFVar2,"%s:%s:%d:<REDACTED>=%s\n\n",<REDACTED>,<REDACTED>,0x7f9,pcVar5);
    fclose(pFVar2);
  }
  if (**(char **)(param_1 + 0x10) == '\0') {
    getenv("<REDACTED>");
  }
  FUN_00407280();
  return;
}
iVar1 = strcmp(pcVar5,"<REDACTED>");
if (iVar1 == 0) {
  pcVar5 = (char *)FUN_0040174c();
  __format = "%d";
  goto LAB_00406804;
}
iVar1 = strcmp(pcVar5,"<REDACTED>");
if (iVar1 == 0) {
  _ftext();
  _ftext();
  sleep(1);
  _ftext();
  _ftext();
  iVar1 = access("<REDACTED>",0);
  if ((iVar1 == 0) && (pFVar2 = fopen("/dev/console","w+"), pFVar2 != (FILE *)0x0)) {
    fprintf(pFVar2,"%s:%s:%d:<REDACTED> %lld,%lld\n\n",<REDACTED>,0x810);
    fclose(pFVar2);
  }
  printf("%lld,%lld");
  return;
}
iVar1 = strcmp(pcVar5,"<REDACTED>");
if (iVar1 == 0) {
  time(atStack_30);

Clearly, there’s a lot more going on in this part of the function. But there’s a very obvious bug – 2 in fact – except that one is a bit more serious than the other.

I’ll give you a minute to read the code and figure out what the problem is.

Okay, so for the people who did read the code, the vulnerability exists in fprintf and printf . Both of these functions are attempting to print %lld that are unspecified. As per the definition of the printf series of functions, this leads to undefined behaviour. In practice, however, what it generally does is print values from the local stack.

I won’t go into too many details here until the research has been fully finished and/or a CVE has been published and the issue(s) have been fixed. However, what precisely is dumped – and how it is done – depends much on both the architecture of the system and how the function handles printf (stdout/stderr/etc.). But we can walk through an example on a typical linux machine.

I wrote a small example program that would simulate the same kind of usage. The system is a 64-bit linux machine. The purpose of this program is to demonstrate that variadic functions such as fprintf and printf are considered to be unsafe-in-use when values are specified to print, but the number of variables do not match. Extra variables required are pulled from the stack. In cases where the variables on the stack contain pointers, memory addresses or just useful data, an attacker can leverage this to become aware of the state of the process.

The offending functions are shown below. As noted, they’re not identical to the in-production de-compilation we saw earlier, but they act as a good representation of the vulnerability in a controlled environment.

FILE *fp = fopen("/dev/console", "w+");
if (!fp) fp = stderr;
fprintf(fp, "[LEAK] %s:%d: %lld,%lld\n", "demo", 42, local_int);
printf("[INFO] fprintf done, check /dev/console or stderr for leaks\n");
fclose(fp);

// Vulnerable printf: extra %lld
printf("[LEAK] %lld,%lld\n");
printf("[INFO] printf done, check stdout for leaks\n");

// VULNERABLE PRINTF: Leak heap pointer by abusing format string
printf("[HEAP-LEAK] %s:%d: %lld,%lld,%lld\n", "heapdemo", 123, (long long)heap_buf);
//    ^ only 3 args provided for 5 format specifiers -> leaks two more values from stack

printf("[SAFE] heap_buf: %p\n", heap_buf);

The first fprintf statement requires 4 variables (%s, %d, %lld, %lld) but only provides 3. Meaning that the final %lld will print something from the stack.
The second vulnerable printf function demonstrates precisely the vulnerable function and requires 2 variables (%lld, %lld) neither of which are provided.
Finally, the last vulnerable demonstrates the capacity for this issue to leak pointers to heap memory, requiring 5 variables (%s, %d, %lld, %lld, %lld) with only 3 being provided.

In all cases, each function will print information from the stack that.

Below I’ll show how the variables are set up for this function.

void leak_function(int param) {
  //stack
  char local_buf[32] = "secret-data";
  long long local_int = 0xdeadbeef;
  void *local_ptr = &local_int;
  //heap
  char *heap_buf = malloc(64);
  strcpy(heap_buf, "heap-secret");

Now, when executing the code, we’ll see a bunch of pointers to these variables, as well as the leaked values.

[INFO] main: dummy = 0x12345678 at 0x7fffe5796044
[INFO] local_buf: secret-data at 0x7fffe5796000
[INFO] local_int: 0xdeadbeef at 0x7fffe5795fe0
[INFO] local_ptr: 0x7fffe5795fe0 at 0x7fffe5795fe8
[INFO] heap_buf: heap-secret at 0x5fc419a756b0
[INFO] param: 305419896 at 0x7fffe5795fdc
[LEAK] demo:42: 3735928559,7
[INFO] fprintf done, check /dev/console or stderr for leaks
[LEAK] 0,129719538294272
[INFO] printf done, check stdout for leaks
[HEAP-LEAK] heapdemo:123: 105295848625840,100,0
[SAFE] heap_buf: 0x5fc419a756b0
[INFO] main: execution complete

The [INFO] tags are coded within the example to reveal data and pointers to compare with leaks.

In the code example, “dummy” refers to the value passed in the first parameter of the function. The rest are obvious.

The output marked [LEAK] are where the (f)printf functions are called. As can be seen, there are some odd values there that don’t immediately stand out to be anything more than large numbers. The keen-eyed though might realise that %lld prints out as decimal, and so there is an extra step to take also, to convert values into hex.

For Leak 1

“[LEAK] demo:42: 3735928559,7”
If the value 3735928559 is converted into hex it becomes 0xdeadbeef. Meaning we’ve leaked a value from the stack.

For Leak 2

This actually converts to an address in LibC – another fantastic thing to reveal where libraries are loaded into memory.

For Leak 3

The value dumped in HEAP-LEAK converts to 0x5FC419A756B0 which is the address of the heap address of heap_buf. Obviously, this is a leaked address of a location of a variable in the heap.

So, while this is a contrived example, it demonstrates that malformed variadic print statements can be the cause of vulnerabilities that give extra information to attackers. How an attacker collects the information is dependent on the vector. In some cases it may be returned in a HTTP response, in others in a network packet or perhaps even written to a file somewhere. The nature of how the information is returned can obviously make or break a successful exploitation attempt.

So what exactly are memory leaks used for?

Memory leaks like these are typically used to defeat exploit mitigations that rely on memory layout randomisation, such as ASLR. By leaking a value from the stack, heap, or a shared library like libc, an attacker can determine the base address of critical memory regions. This is essential when preparing a reliable exploit, as it allows for precise calculation of offsets and function addresses needed for things like return-oriented programming or controlled memory writes.

In most modern systems, arbitrary code execution isn’t possible without first bypassing these protections. A leak provides the necessary information to make the next stage of an exploit feasible. For example, once the address of libc is known, standard gadgets or system call functions can be used. If a heap address is leaked, it may enable techniques like heap spraying or exploiting use-after-free conditions. Essentially, memory leaks are a common and necessary step toward building a stable and predictable exploit chain.

This initial analysis has demonstrated that malformed variadic function calls can lead to meaningful memory disclosures. While the example here is simplified, the impact in a production environment is clear, leaked memory can expose critical runtime information that significantly lowers the bar for exploitation. These types of bugs often serve as the foundation for more complex exploit chains, particularly when used to bypass mitigations like ASLR.

The next step is to continue deeper analysis of the system, expanding coverage beyond the initial binary and identifying additional vulnerable patterns. Now that there’s a clear path to information disclosure, the focus will shift to determining whether code execution is achievable, and under what conditions. The goal is to fully map the attack surface, establish exploit reliability, and ultimately assess the overall risk this vulnerability presents.

Why write this at all?

Mostly, because I thought it was cool. But also because I was surprised to find such an obvious example of C functions being used in an unsafe way. It’s 2025, and despite a world that (at least to me) seems obsessed with cybersecurity, vulnerabilities are still everywhere, just as they’ve always been and always will be.

Gareth C