Directory fingerprints

Introduction

The basic idea is that for a directory in a tree (committed or otherwise), we will have a single scalar value. If these values are the same, the contents of the subtree under that directory are necessarily the same.

This is intended to help with these use cases, by allowing them to quickly skip over directories with no relevant changes, and to detect when a directory has changed:

Use-case oriented APIs

Most of this will be hidden behind the Tree interface. This should cover log -v, diff, status, merge (and implicit merge during push, pull, update):

tree.iter_changes(other_tree)
tree.get_file_lines(file_id)   # and get_file, get_file_text

commit

Commit is similar to iter_changes, but different because it needs to compare to all the trees. Commit currently needs to compare the working tree to all the parent trees, which is needed to update the last_modified field and would be unnecessary if we removed that field (for both files and directories) and did not store per-file graphs. This would potentially speed up commit after merge.

Verbose commit also displays the merged files, which does require looking at all parents of files that aren't identical to the left-hand parent.

log

Log is interested in two operations: finding the revisions that touched anything inside a directory, and getting the differences between consecutive revisions (possibly filtered to a directory):

find_touching_revisions(branch, file_id) # should be on Branch?

Log shows the revisions that merged a change. At the moment that is not included in the per-file graph, and it would also not be visible if the directories were hashed.

Open questions

Conclusions

Design changes

API changes