Processor Topology & Affinity for ocaml

API Online

This library allows you to query the processor topology as well as set
the processor affinity for the current process. This library does
not depend on ocaml-5 (Multicore), but it can be used within ocaml-5
Domains as expected.

The topology can identify individual threads (smt), cores, sockets as
well as P-cores (Performance) and E-cores (Energy energy efficient
cores) on both AMD64 and Apple's ARM64 (M1, M2 & friends).


The library is split into 3 main modules:


Retrieves a count of threads, cores, sockets.

utop # Processor.Query.cpu_count;;
- : int = 8
utop # Processor.Query.core_count;;
- : int = 4
utop # Processor.Query.socket_count;;
- : int = 1


Build's an actual topology of each CPU, each Cpu.t expresses a
logical cpu with a logical id id, a thread id smt, a core id
core, a socket id socket and a kind which can be P-core or
E-core, which is only relevant for Intel Alder Lake and Apple's
ARM64 machines.

The topology is built uppon Module load an it's static through the runtime.

utop # Processor.Topology.t;;
- : Processor.Cpu.t list =
[{ = 0; kind = Processor.Cpu.P_core; smt = 0; core = 0; socket = 0};
 { = 1; kind = Processor.Cpu.P_core; smt = 0; core = 1; socket = 0};
 { = 2; kind = Processor.Cpu.P_core; smt = 0; core = 2; socket = 0};
 { = 3; kind = Processor.Cpu.P_core; smt = 0; core = 3; socket = 0};
 { = 4; kind = Processor.Cpu.P_core; smt = 1; core = 0; socket = 0};
 { = 5; kind = Processor.Cpu.P_core; smt = 1; core = 1; socket = 0};
 { = 6; kind = Processor.Cpu.P_core; smt = 1; core = 2; socket = 0};
 { = 7; kind = Processor.Cpu.P_core; smt = 1; core = 3; socket = 0}]


Sometimes it may be useful to see "what happens" when you restrict
your application to a set of CPUs, maybe you don't want them to cross
a socket, or maybe you want to see how it behaves without two threads
fighting for its core resources.

The affinity must be set on its own running context, so if you are
using Domains, it must be called individually within each domain.

Say you you want to restrict to running only on the threads of core 0:

utop # Processor.Affinity.set_cpus (Processor.Cpu.from_core 0 Processor.Topology.t);;
- : unit = ()
utop # Processor.Affinity.get_cpus ();;
- : Processor.Cpu.t list =
[{ = 0; kind = Processor.Cpu.P_core; smt = 0; core = 0; socket = 0};
 { = 4; kind = Processor.Cpu.P_core; smt = 1; core = 0; socket = 0}]


A simple binary called ocaml-processor-dump is provided:

$ ocaml-processor-dump
cpu_count: 8
core_count: 4
socket_count: 1
cpus-per-core: 2
cpus-per-socket: 8
cores-per-socket: 4
cpu0: smt=0 core=0 socket=0 kind=P_core
cpu1: smt=0 core=1 socket=0 kind=P_core
cpu2: smt=0 core=2 socket=0 kind=P_core
cpu3: smt=0 core=3 socket=0 kind=P_core
cpu4: smt=1 core=0 socket=0 kind=P_core
cpu5: smt=1 core=1 socket=0 kind=P_core
cpu6: smt=1 core=2 socket=0 kind=P_core
cpu7: smt=1 core=3 socket=0 kind=P_core

Implementation Details

Turns out all of this is harder than it should, there are basically no
portable APIs and even the consensus of what a CPU thread is, is
sketchy between different architectures.

Linux, FreeBSD

On AMD64 we visit each CPU, by pinning our current context, and then
do the whole CPUID dance manually, the only thing we need from the
system is a working pthread_setaffinity_np. Query and Topology
will be accurate as long as the process doesn't start in an already
restricted affinity.

On anything other than AMD64 we will build a fake topology by using
Query, each CPU will be its own core and everyone will be on the
same socket.

Initially I've added support for parsing /proc/cpuinfo on Linux for
other architectures, but the format is not standarized, so it isn't
worth it.

NetBSD, OpenBSD, DragonflyBSD

On these systems Query is accurate for cpu_count, but
thread_count and socket_count will be faked, topology will be
faked and affinity is a nop. NetBSD and DragonflyBSD could have
affinity support but I don't want to maintain it. OpenBSD has no
support for it.


Apple doesn't support affinity/pinning, so in order to retrieve the
actual apicid in AMD64 we have to go through the horrible ioreg
stuff from Apple, which we do. On Apple ARM64 we also go through
ioreg to retrieve the relationship between E-cores and P-cores.
On Apple, Query and Topology will always be accurate.

Future Work

  • Windows support, hopefully I work on this when I get a more
    comfortable windows environment to develop.

  • Cache topology would be welcome as well.

  • CPU model/brand, there is some support but I want to make it right before

  • CI/CD setup.

  • SPARC64 and RiscOS support would be welcome.

If you want to work on cache topology, I'll send you beers.

08 Jul 2022
>= "3.2"
>= "4.08"
Reverse Dependencies