Thomas Leonard's work notes

(see roscidus.com for my main blog)

(new contract, uring, eio, mdx)

Introduction

I've got a new contract to do some Eio maintenance and related work for 3 months, and I intend to post rough notes about what I'm working on here.

Unix FDs

I've been having a look at the backlog of PRs for ocaml-uring. ocaml-uring#129 and ocaml-uring#130 were two different attempts at dealing with file descriptors (working around OCaml's lack of a safe way to convert between int and Unix.file_descr). After trying the proposed API out with Eio, I went with a modified version of #129:

  • Add Uring.Res module to handle result values #131.

Ideally, Eio would use the new features to handle results more safely, but until the new version is released I just added a cast so that Eio will work with either version:

  • Update for new Uring.Res API #839.

Detecting hangs in MDX tests

MDX reads a markdown document and runs all the examples in it, checking that they give the expected result. However, if a test hangs then MDX doesn't tell you where the problem was, and if a test hangs in CI then you just get a message at the end of an hour saying that the tests timed out, which was making it difficult to find out why the uring tests were hanging on riscv.

I made a PR to MDX that shows the current test location on Ctrl-C, and also prints warning messages if a test is taking a long time:

  • Report location of hangs #476.

Unfortunately, it doesn't work very well when used with dune yet (dune uses SIGKILL to terminate MDX, and hides output until the whole file completes by default).

The riscv hang turned out to be because the CI machines are running a very old version of Linux. The tests tried to write a byte to a pipe using uring and then read it back. The write failed because Linux was too old to support that API, but MDX continued anyway with the read.

  • If the sending test fails, don't wait to receive #134.

Updated uring to 2.14.0

The main problem here was getting the new C header file in the right place. The existing build process ended up with 4 copies of each header, which was pretty confusing. I simplified it a bit and documented the rest:

  • Document and simplify the build process #137.
  • Update to uring 2.14 #138.

Improve uring performance in Eio

I tried using various flags to tell uring that it doesn't need to interrupt us whenever a completion is due, as we have an event loop and will check soon anyway:

  • eio_linux: use newer uring flags for performance #840.

Using IORING_SETUP_COOP_TASKRUN led to a big speed up on my machine for the HTTP tests (from about 350k requests per second to 450k!). However, it seemed to have no noticable effect on the CI benchmark machine.

Using IORING_SETUP_DEFER_TASKRUN had a similar effect, but caused one of the tests to hang (my MDX hang-detector came in handy here!). The problem is that if there are no pending completions from the kernel, but we do have other work to do, then we do that first, but then we never enter the kernel and the completions are never written. Would probably be easy enough to change the scheduler to enter the kernel anyway from time to time.

Various minor fixes

  • Use MSG_CMSG_CLOEXEC when receiving file descriptors #132.
  • Warn about ZFS bug when using write_fixed #133.

Reviewed and merged various PRs:

  • Register polling_timeout with GC in ocaml_uring_setup #136.
  • Simplify custom "env" type in example #832.
  • Add Eio_unix.Stdenv.override for updating environments #823.
  • Add send_zc and sendmsg_zc ops #139.

ocaml-tar bug

An ocaml-tar bug was found that could cause it to write files outside of the target directory. Anil Madhavapeddy made a quick fix for the tar-eio package and I reviewed it (though I'm unfamiliar with tar-eio).

Interestingly, while the tar-unix and tar-lwt-unix versions are always vulnerable, the unpatched version is not necessarily unsafe with tar-eio. This is because tar-eio gets its ability to write files from its caller. If the caller only gives it access to the output directory then the bug cannot be exploited.

However, you can forget to do that, so ensuring tar-eio does it in all cases is still a good idea.

Set up work log

I needed something simple to publish these work updates, without polluting my main blog, so I set up this sub-site for it. I copied the OCaml static site generator I wrote for the main blog and simplified it for this one.