[m-users.] Reading the entire directory contents (contribution for the posix library)

Julian Fondren jfondren at minimaltype.com
Mon Oct 14 14:56:50 AEDT 2019


On 2019-10-13 21:57, Julien Fischer wrote:
> In many circumstances you would presuambly be interested in what
> kind of file each entry is as well.

And here, Linux has an extended dirent struct that includes file type
information. It's both kernel (age) and filesystem dependent though.
I wrote a symlink scanner recently in Rust, C++, and Mercury, and
since I was only interested in directories and symlinks, it was
useful to skip on all the allocations that would've been required for
normal files, especially considering that I would be hitting target
platforms where an occasional cache directory can have *millions* of
files in it.

Even if you can afford to have millions of filenames in memory, you
probably don't want to sort them.

The core of that was:

   :- pred readdir(dirp, string, int, string, readdir_result, io, io).
   :- mode readdir(in, in, in, out, out, di, uo) is det.
   :- pragma foreign_proc("C",
       readdir(Dir::in, Path::in, Pathlen::in, Name::out, Res::out,
           _IO0::di, _IO::uo),
       [promise_pure, will_not_call_mercury],
   "
       int reslen;
       struct dirent *de;
       while (NULL != (de = readdir(Dir))) {
           if (de->d_type == DT_DIR) {
               reslen = strlen(de->d_name);
               if (reslen == 1 && de->d_name[0] == '.') continue;
               if (reslen == 2 && de->d_name[0] == '.' && de->d_name[1] 
== '.') continue;
               Res = SYM_FT_DIRECTORY;
               break;
           }
           if (de->d_type == DT_LNK) {
               reslen = strlen(de->d_name);
               Res = SYM_FT_SYMLINK;
               break;
           }
           /* skip other types of files */
       }
       if (NULL != de) {
           MR_allocate_aligned_string_msg(Name, reslen + 1 + Pathlen, 
MR_ALLOC_ID);
           memmove(Name, Path, Pathlen);
           Name[Pathlen] = '/';
           memmove(&Name[Pathlen + 1], de->d_name, reslen + 1);
       } else {
           Res = SYM_FT_EOD;
       }
   ").

On spammed exim spools on VPSes, some naive programs would spend an
hour doing absolutely nothing but calling getdents() over and over
again, because had code like "get all the files in this directory and
then delete them" -- that "and delete them" was too much delayed for
these programs to be useful.

Still, the proposed interface could be useful in something like an
io.scripting library. Something designed to be quick (to use) and
useful in most cases.


More information about the users mailing list