[m-rev.] for review: speed up find_closest_seq
Julien Fischer
jfischer at opturion.com
Fri Jan 26 13:36:15 AEDT 2024
On Fri, 26 Jan 2024, Zoltan Somogyi wrote:
> For review by Julien, since he reviewed the related earlier diff.
>
> I am in two minds about the effect of this change on the speed
> of find_edit_distance when invoked directly, i.e. not via find_closests_seq.
> On the one hand, it it a slight slowdown, for the reason listed in the
> log message. On the other, find_edit_distance should still be much faster
> than it was two days ago, before my previous diff.
>
> We *could* keep around the old code of find_edit_distance, which
> does not have the overhead of computing the minimum cost in each row.
> However, this would require at least some code duplication, and it would
> not help the compiler, since the compiler *always* invokes find_edit_distance
> through find_closests_seq, not directly. That's why this diff does not do that,
> though I could be persuaded otherwise.
I don't think keeping the old version is worth it.
> Speed up find_closest_seqs.
>
> library/edit_distance.m:
> When find_closest_seqs computes the distance between the query sequence
> and a second or later candidate sequence, it already knows the minimum
> distance it has seen so far, and it knows that it will throw away
> any distance that exceeds this minimum. Allow it to tell find_edit_distance
> to stop computing the distance once it is certain that it will be
> thrown away.
>
> This increases the cost of computing each row slightly (since we now
> compute the minimum cost in each row), but it reduces the number of rows
> we have to compute, and in typical uses cases of find_closest_seqs,
> we expect the second effect to be dominant.
>
> In situations in which the caller invokes find_edit_distance directly,
> not through find_closest_seqs, we get a slight slowdown, since the second
> effect cannot happen.
That's fine.
Julien.
More information about the reviews
mailing list