I was thinking about [Bug 3099]... in that while it's easy to get a 2-3x
for the average app using parallel scans, the upper and lower bounds on that
speed increase could be <1x in a worst case (very unlikely, but with
or constrained (in a container or VM) HW, the chances are raised.

Better, with less std. deviation, I believe, might be to move I/O calls
to all being
AIO -- It seems that would allow them to be completed at the OS's
which, in the idea case would be minimal wasted disk-head.  The
advantage in
AIO, being that OS can coalesce calls more at its leisure, vs. an upper
app algorithm, that might divide up the work fairly, but not know how much
each underlying request costs in terms of wasted head movement.

Is that already in there, in the works, or do you think it would avoid worst
case division of file scanning based on FS-hierarchical structure vs.
disk layout?

