Dune comes with a library to query OS-specific information, called configurator. It is able to evaluate C expressions and turn them into OCaml value. Surprisingly, it even works when compiling for a different architecture. How can it do that?

A CD-ROM problem

Let’s take an old school example: suppose we want to eject a CD-ROM drive. On Linux, the way to do that is to open the device file such as /dev/cdrom and to call ioctl(fd, CDROMEJECT, 0) on it. The CDROMEJECT part is a constant defined in <linux/cdrom.h>.

To do the same in OCaml, it is possible to define a C function that calls ioctl directly. Or this can be done directly using ctypes, but we need to know the value of the CDROMEJECT constant; configurator can be used to do that.

Enter configurator

How to use configurator in a dune project is a bit out of scope for this article, but at the core is a function C_define.import that can read the value of some C expressions, including macros.

The following program uses configurator to fetch and display the value of the CDROMEJECT constant.

let () =
  let open Configurator.V1 in
  main ~name:"c_test" (fun t ->
      let result =
        C_define.import t ~includes:["linux/cdrom.h"] [("CDROMEJECT", Int)]
      in
      match result with
      | [(_, Int n)] -> Printf.printf "%d\n" n
      | _ -> assert false )

Note that just getting the constants could be done by parsing the header files themselves. But this also supports constant C expressions (such as 1 << 8) and some C features such as sizeof(int).

So, how does it work?

An almost correct solution

It is certainly necessary to generate and compile some C to do this. A first version is to generate a short C program such as the following one.

#include <stdio.h>
#include <linux/cdrom.h>

int main(void)
{
	printf("%d\n", CDROMEJECT);
	return 0;
}

By running this program and parsing the output, configurator can get the correct value.

Except that dune supports cross-compilation: when compiling an unikernel for an ESP32 CPU, it could be handy to have the value of constants such as ESP_ERR_WIFI_PASSWORD that are only available using a foreign toolchain. But it is not possible to run ESP32 binaries on the host system.

A better solution

Since it is necessary to use a C compiler, but not to run a program, the solution is looking at the compiled code:

  • generate a C file containing the expressions to extract
  • build it using the target C compiler
  • parse the resulting binary

This is what configurator does. Since parsing compiled code is difficult (and not all targets use the same binary format), the values are stored in constant strings, between known markers.

Here is the generated C file. Note that unlike in the previous attempt, this is not a complete executable, just a file to be built with -c.

#include <stdio.h>
#include <linux/cdrom.h>

#define D0(x) ('0'+(x/1         )%10)
#define D1(x) ('0'+(x/10        )%10), D0(x)
#define D2(x) ('0'+(x/100       )%10), D1(x)
#define D3(x) ('0'+(x/1000      )%10), D2(x)
#define D4(x) ('0'+(x/10000     )%10), D3(x)
#define D5(x) ('0'+(x/100000    )%10), D4(x)
#define D6(x) ('0'+(x/1000000   )%10), D5(x)
#define D7(x) ('0'+(x/10000000  )%10), D6(x)
#define D8(x) ('0'+(x/100000000 )%10), D7(x)
#define D9(x) ('0'+(x/1000000000)%10), D8(x)

const char s0[] = {
  'B', 'E', 'G', 'I', 'N', '-', '0', '-',
  D9((CDROMEJECT)),
  '-', 'E', 'N', 'D'
};

The Dn(x) macros seem daunting at first, but remember that we need a string constant, so it is necessary to convert the integer value to a list of characters. The comma operator ensures that the result will look like '1', '2', '3', '4' which will be inserted in the array initializer.

After compiling this file, the string is visible directly in the binary:

00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0100 3e00 0100 0000 0000 0000 0000 0000  ..>.............
00000020: 0000 0000 0000 0000 a801 0000 0000 0000  ................
00000030: 0000 0000 4000 0000 0000 4000 0a00 0900  ....@.....@.....
00000040: 4245 4749 4e2d 302d 3030 3030 3032 3132  BEGIN-0-00000212
00000050: 3537 2d45 4e44 0047 4343 3a20 2844 6562  57-END.GCC: (Deb
00000060: 6961 6e20 382e 322e 302d 3133 2920 382e  ian 8.2.0-13) 8.
00000070: 322e 3000 0000 0000 0000 0000 0000 0000  2.0.............
00000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000090: 0100 0000 0400 f1ff 0000 0000 0000 0000  ................

It is even possible to parse it using plain Unix tools.

% strings x.o | grep BEGIN
BEGIN-0-0000021257-END

The actual configurator library will parse it using a very simple lexer. It uses the number just after BEGIN (-0- above) to distinguish between the different constants that have been requested.

It also supports more types of bindings, such as strings. In this case, the string is directly inserted between BEGIN-0- and -END.

Conclusion

Binary file formats can seem tricky to parse, but for some cases this is the correct solution. In the context of dune when it is not always possible to execute the output binaries, this is the correct solution to extract information from the target system.

As far as I know, this technique has been borrowed from ctypes where it had been implemented by @whitequark. Thanks!