Breadth First Search vs. Python Method Resolution Operator (MRO) of Classes

In reading the recent changes to class member/method lookup rules in Python 2.2, it struck me interesting that the simple Breadth First Search (BFS) was not used instead of the MRO method currently implemented.  To understand this issue, I would like to explain it very simply; the reader should have some knowledge of Depth First Searches (DFS), BFS and the current MRO, which is a modified DFS with all duplicates removed save the last.

In the above example, we have 4 new-style (derived from object) classes.  In the old style class lookup (assuming object is now an old-style class), to resolve a member reference, the MRO would generate the list [F, B, object, E, C, object, D, object], a DFS traversal of the class hierarchy.  In Python 2.2, this now resolves to:

>>> F.mro() == [F, B, E, C, D, object]

because the DFS list minus all but the last duplicate (e.g. remove all but the last occurrence of object since no other classes are duplicated) results in the above MRO.  A BFS evaluation of the hierarchy on the other hand would result in [F, B, E, object, C, D, object, object].  Even if we could safely ignore duplicate traversals of object as we did in the old-style DFS example, we would still be visiting object too early.  This is because the base class hierarchy should not be based on the depth from F, where B's object is closer than C or D, but rather a reverse traversal from the lowest class to the highest, e.g. [object, D, C, B, E, F].reverse().  Even this would not be a perfect resolution though it is closer to the correct MRO.  The long and the short of it is, for this case at least, MRO is superior to BFS (though in the A, B, C, D diamond hierarchy in Guido's example MRO and BFS perform equivalently with the exception of duplicate traversal).

Another interesting example of Python's MRO evaluation is this apparent bug found by Tim Peters.  In the above example, there is a basic ambiguity in the order of derivation for the bases of X, Y and Z.  However, there is only a couple interpretation that would be of interest.  If the __bases__ are well ordered, you would expect X.__bases__ == (A, B), Y.__bases__ == (A, B) and Z.__bases__ == (X, Y).  This would then give:

>>> Z.mro() == [Z, X, Y, A, B, object]

Since the DFS produces [Z, X, A, object, B, object, Y, A, object, B, object] and we remove all but the last A, B and object.  This result is what we would expect, and by reversing Z.__bases__ we would still get an expected result of [Z, Y, X, A, B, object].  If instead we reverse both X.__bases__ and Y.__bases__, we still get the correct result of [Z, X, Y, B, A, object].  The problem arises when we reverse only one of X.__bases__ or Y.__bases__, but not the other.

Take now for example the definition:

>>> X.__bases__ == (A, B)
>>> Y.__bases__ == (B, A)
>>> Z.__bases__ == (X, Y)
Z.mro() == [Z, X, Y, A, B, object]

Now, there was originally a bug in 2.2 where Z.mro() incorrectly returned [Z, X, Y, B, A, object] since if you apply the simple Python 2.2 MRO rules, you get [Z, X, A, object, B, object, Y, B, object, A, object] and when you remove duplicates for all but last, you get the above incorrectly result.  This was fixed as of Python 2.2 though I'm not sure who contributed the patch to get the correct MRO result.

If BFS was used, one would have [Z, X, Y, A, B, B, A, object, object, object, object] which, ignoring all but the first duplicate since in the MRO search we stop only if a class implements the requested method so if it didn't find the method in that class the first time, it will ignore it on subsequent visits (baring multi-threading issues).  However, BFS only works here because we are working with a balanced class hierarchy, unlike in the first example where BFS fails.  That is why the current Python MRO is superior to BFS.