[m-dev.] issues with installing libraries

Zoltan Somogyi zoltan.somogyi at runbox.com
Sat Aug 31 10:32:10 AEST 2024


While I was working (very part-time) on the just-posted DIFF.mli4,
much of my time was spent trying to understand the issues below.
I would like to ask you guys to think about them too, and see what answers,
if any, you can give to the questions below.

Zoltan.

Issue 1:

    We currently install .mh files, which are not grade-specific,
    to *two* sets of non-grade-specific directories. These are

    set 1:   Prefix/lib/mercury/inc
    set 2:   Prefix/lib/mercury/ints and Prefix/lib/mercury/ints/Mercury/mhs

    I believe that set 1 is intended for use by hand-written user code
    that uses Mercury predicates and/or functions exported to C.

    QUESTION Does anyone know the purpose of set 2? What purpose does it serve
    that set 1 cannot satisfy?

Issue 2:

    We currently install .mih and .opt files, which are grade-specific,
    into grade-specific directories. That is fine. However, we *also* install
    them into non-grade-specific directories, which is ... not fine.
    We do so in two separate ways.

    - For .opt files, we install them into a non-grade-specific directory
      for the *current* grade only. This is the grade in which the installed
      directory is compiled.

    - For .mih files, we install them into a non-grade-specific directory
      for *all* libgrades, starting with the current grade. In the usual
      case where the current grade is not the only libgrade, this means that
      each install of a .mih file for a non-first libgrade in the
      non-grade-specific directory will overwrite the install of
      that same .mih file for the previous libgrade, leaving only the
      .mih files of the last libgrade in that non-grade-specific directory.

    QUESTION Does anyone know of any reason why the install of these
    grade-specific files in a non-grade-specific directory would be useful?

    QUESTION Does anyone recall any problem that you suspect *could* have
    been caused by getting a .mih or .opt file *for the wrong grade*
    from a non-grade-specific directory?

    For .mih files, I expect such problems to be quite rare, because
    they exist only for MLDS grades targeting C, and I expect that for
    most modules, their .mih files will be either identical or
    mostly-identical for all such grades.

    For .opt files, such problems may be somewhat more frequent, but
    since probably the most frequent symptom of such problems is that
    the compiler does not perform an optimization that it *should*
    be able to do, it would be difficult to say for sure.

Issue 3:

    We can install a library in multiple grades

    - either via mmake (the "lib%.install_grades" target
      in scripts/Mmake.rules, which delegates part of its work
      to two mmake rules in auto-gemerated .dep files),

    - or via "mmc --make lib%.install" (so to speak).

    These two routes differ in several ways.

    - The set of grades to install is determined differently by the
      two routes. The mmake route install the librades specified by the
      ALL_LIBGRADES make variable, whose value can be specified via
      a library-specific make variable. On the other hand the set of grades
      to install is the same for every library when using mmc --make;
      it will be the set of libgrades found to be installed by
      compiler/check_libgrades.m, if this is not overridden by the
      --libgrade accumulating option.

    - The mmake route installs .trans_opt files (though the code that does
      this also handles .int* files but NOT .opt files, which is strange),
      while the mmc --make route never does.

    I believe there are two related root causes of this, and an unrelated
    third root cause.

    Cause 1 is that there is no central documentation of what files
    should be installed where, and more important, *why* are they installed
    in those places. This would require the full documentation of the search
    methods we use to find things in install directories. (See Issue 4.)

    Note; the EXT file I sent to m-rev recently documents what *does*
    get installed where, which can be a starting point for the first half
    of this documentation. However, I have not yet found all the places
    that *set up* search paths, and I suspect that the most important ones
    are not in the Mercury implementation at all, but in individual users'
    Mmakefiles and/or Mercury.options files.

    Cause 2 is that the places in the system that deal with installing files
    do not even have links to each other, so there is no reminder to people
    working on one part (e.g. mmc --make) to update the corrresponding parts
    (in e.g. Mmake.rules). I plan to add those links in a future diff.

    Cause 3 is that our requirements changed over time. When we worked
    on termination analysis in the late 1990s, we installed .trans_opt files
    to give users access to its results. By the time Simon implemented
    the initial version of mmc --make in 2002, this became less important,
    so we willingly tolerated the difference in the treatment of .trans_opt
    files.

    QUESTION Ideally, both install routes should install the same files
    in the same directories. In practice, making that happen will take
    a significant amount of work. For each of the differences above,
    can you please say whether you have been impacted by the difference.

Issue 4:

    One constraint on the structure of the install directory (and the
    *main* constraint on the structure of its non-user-facing parts,
    which make its bulk) are the compiler's search methods. Those methods
    need to find the .int, .opt etc files of

    - the modules that are part of the current program, and
    - the modules that are not part of the current program, and are instead
      parts of external libraries.

    For modules that are not part of the current program, both mmake
    and mmc --make will install all grade-specific files into grade-specific
    directories, and all non-grade-specific files into non-grade-specific
    directories (with some exceptions, such as Issue 2 above).

    For modules that are part of the current program, we cannot assume that;
    with --no-use-grade-subdirs, grade-specific files may be in
    non-grade-specific directories, whose names may, or may not,
    have a suffix such as Mercury/int3s (for .int3 files).

    We currently handle such uncertainty in two different ways, depending
    on whether the extension of the file want to search for contains "max"
    in the names of its extension category in file_names.m.

    Consider searching for module_x.int3. The name of the extension category
    for .int3 files does NOT contain "max". This means that e.g. with
    --use-subdirs and --no-use-grade-subdirs, both module_name_to_file_name
    and module_name_to_search_file_name will return Mercury/int3s/module_x.int3
    when given module_x and .int3 as inputs. If the search_directories
    option (which contains the accumulated strings specified for the
    --search-directory option) contains ["sd1", "sd2", "sd3"], then our
    search will look for

        sd1/Mercury/int3s/module_x.int3
        sd2/Mercury/int3s/module_x.int3
        sd3/Mercury/int3s/module_x.int3

    Consider searching for module_x.opt, whose extension is in the
    ext_cur_ngs_gs_max_ngs category, which DOES contain "max".
    This means that e.g. with --use-grade-subdirs, when given module_x
    and .opt as inputs,

        module_name_to_file_name will return
            Mercury/<grade>/<arch>/Mercury/opts/module_x.opt

        but module_name_to_search_file_name will return
            Mercury/opts/module_x.opt

        so we would search
            sd1/Mercury/opts/module_x.opt
            sd2/Mercury/opts/module_x.opt
            sd3/Mercury/opts/module_x.opt

    I think it is this search strategy that was the motivation
    for the current setup of install directories in a way that causes
    the second "Mercury" in the name of grade-specific directories.
    Specifically, the correspondence between

    Mercury/<grade>/<arch>/Mercury/opts/module_x.opt and
                           Mercury/opts/module_x.opt

    allows InstallDir/Mercury/<grade>/<arch>/Mercury/opts/module_x.opt
    to be found by
    
    - BOTH a search for the full grade-specific name in InstallDir,
    - AND a search for the non-grade-specific name, Mercury/opts/module_x.opt,
      in InstallDir/<grade>/<arch>.

    QUESTION What do you guys think of this theory? Does anyone have an
    alternative theory?

    QUESTION Does anyone know, or have reason to believe, that there is
    still a need to find .opt files using both kinds of searches?
    If there is, or may be, such a need, would it go away if we implemented
    one of the mechanisms in the next question?

    QUESTION Would a special option that says "here is opts at sdN,
    add the directories

            sdN/Mercury/<grade>/<arch>/Mercury/opts
            sdN/Mercury/opts
            sdN

    to the search_directories accumulating string option" be sufficiently
    useful to implement?

    How about an option that says "here is @sdN, add the directories

            sdN/Mercury/<grade>/<arch>/Mercury/ExtDir
            sdN/Mercury/ExtDir
            sdN

    to the search_directories accumulating string option, *but only when
    searching for a file with an extension for which ext_to_dir_path
    in compiler/file_names.m returns ExtDir*"?

    Both of these would work both on files in installed libraries *and*
    on files in workspaces.

Issue 5:

    For .mih files, which are in the ext_cur_ngs_gs_max_cur,
    the results would be

        module_name_to_file_name will return
            Mercury/<grade>/<arch>/Mercury/mihs/module_x.mih

        but module_name_to_search_file_name will return
            module_x.mih

        so we would search
            cid1/module_x.mih
            cid2/module_x.mih
            cid3/module_x.mih

    The difference is that it is mmc that would search for .opt files,
    but the search for .mih files is done by the C compiler, with the
    directories to be searched (which are accumulated from -c-include-directory
    options, hence cidN above). This is why e.g. compiler/COMP_FLAGS.in
    specifies BOTH --c-include-directory ../library and --c-include-directory
    ../library/Mercury/mihs, and would have to include the grade-specific
    subdir name as well, if we ever wanted mmc to be built with mmc --make
    and --use-grade-subdirs. I presume that programs that do use mmc --make
    and --use-grade-subdirs already *do* have to specify the full pathname
    of the directories that contain the .mih files of a library they want
    to use.

    QUESTION Do we give any guidance to Mercury programmers (who are not
    on the Mercury team) about what directory names they need to give to
    --c-include-directory? The user guide's section on that option, 9.11,
    does not give any. We could add some, describing the present complicated
    setup, ... or we could try to simplify the setup, and describe *that*
    setup. Maybe we could add a sepcial option that would effectively add three
    directories to the --c-include-directory accumulating string option.
    If its argument is e.g. mihs at sd1, it would add the directories

            sd1/Mercury/<grade>/<arch>/Mercury/mihs
            sd1/Mercury/mihs
            sd1

    to the accumulating list.

    QUESTION What would people prefer? Some of the above, or the status quo?

Issue 6:

    The directory paths we install grade-specific files in
    include one directory name component that represents the grade,
    and another that represents the target architecture. The latter
    is clearly useful for execution and library files that contain
    machine code, in that they detect and report attempts to use
    e.g. an x86-64 library on an Arm machine. However, not *all*
    grade-specific files contain machine code. For these, the directory
    name component that identifies the target architecture is unnecessary,
    though it is not harmful. It can even be very slightly helpful in that

    - installing a library in e.g. /path/to/install_dir on one architecture,
    - and then installing a slightly different version of that same library,
      also to /path/to/install_dir, for a different architecture,

    will not overwrite the existing, old versions of those grade-specific
    files, and each grade-and-architecture-specific directory will contain
    grade-specific non-machine-code files and grade-specific machine-code
    files that are from the same version and should thus be consistent
    with each other.

    However, this helpfulness is useless, because the second install
    with mmc -mmake *will* overwrite the old version's installed
    *non*-grade-specific files, leaving them out of sync with the first
    install's grade-specific files. When installing with mmake, nothing
    should be out of sync, because mmake's install rules will *delete*
    the target directories before recreating and populating them.

    QUESTION Does anyone remember any occasion where we said or promised
    anything to users about the outcome of any attempt to install to an
    already-existing directory?

    If the answer is "no", which I think it is, then we don't need
    to include the <arch> component in the grade-specific relative path name
    of the grade specific files that are not themselves architecture dependent.
    (For the files that contain machine code, we want to keep it
    for the help it gives in detecting attempts to link e.g. a .so file
    with .o files of a different architecture.)


More information about the developers mailing list