'A' Longest Common Subsequence (LCS) implementation

Revision 15 vs. Revision 16

Legend:

Unmodified
Added
Removed
  • Content

    r15 r16  
    9393The next thing I decided to do was think about this in terms of the sequence being any object.  As a result I would need to think more about indices and what it meant to be at the end of a set of objects and the size of the object collection.  That was too much to think about so I decided to make an LCS C++ template class that could be used for c++ containers or at least some containers. The code is here ``:snippet:`375``` and test/example code is here ``:snippet:`382```.  
    9494  
    9595Chapter 3  What else?  
    9696---------------------  
    9797  
    98 At this point comes the meat of my problem.  I want to compare two graphs.  I've not read any literature about that (yet) but am considering the following way of doing things.  At each node in the graph implementation I am using.  
     98At this point comes the meat of my problem.  I want to compare two graphs.  I've not read any literature about that (yet) but am considering the following way of doing things.  At each node in the graph there are edges from that node to the nodes below.  What makes a node in one graph equal to a node in the other graph?  The edges must match and the nodes they point to must match.  
     99  
     100Here is a simple graph similar to the graphs I want to compare.  
     101  
     102  
     103.. graph::  
     104  
     105    digraph finite_state_machine {  
     106        rankdir=TB;  
     107        size="6,9"  
     108        node [shape = circle];  
     109        edge [fontsize = "0.8em"];  
     110        N1 [fillcolor = cornflowerblue];  
     111        N21 [fillcolor = red];  
     112        N22 [fillcolor = red];  
     113        N31 [fillcolor = gold];  
     114        N32 [fillcolor = gold];  
     115        N41 [fillcolor = green];  
     116        N1 -> N21 [ label = "p1_1" color = cornflowerblue ];  
     117        N1 -> N21 [ label = "p1_2" color = cornflowerblue ];  
     118        N1 -> N22 [ label = "p1_3" color = cornflowerblue ];  
     119        N1 -> N22 [ label = "p1_4" color = cornflowerblue ];  
     120        N1 -> N22 [ label = "p1_5" color = cornflowerblue ];  
     121        N21 -> N31 [ label = "p2_1" color = gold ];  
     122        N21 -> N32 [ label = "p2_2" color = gold ];  
     123        N32 -> N41 [ label = "p3_1" color = green ];  
     124        N32 -> N41 [ label = "p3_2" color = green ];  
     125        N32 -> N41 [ label = "p3_3" color = green ];  
     126        N32 -> N41 [ label = "p3_4" color = green ];  
     127        N32 -> N41 [ label = "p3_5" color = green ];  
     128    }  
     129  
     130Suppose we have two graphs like the above, G1 and G2.  Functionally we are asking is G1 == G2.  This is the same as asking if G1(N1) == G2(N1).  At this point I make a rule to simplify my life.  If the top nodes match and each edge to the next nodes match and those nodes match (not their children but just the nodes themselves) then I say that G1(N1) IS equal to G2(N1).  I do not insist that ALL descendants match.  Furthermore, I do not insist that p1_1 and p1_2 are in a particular order, just that they match some edge in G2.  For example if G(N1 - p1_1 -> N21) == G(N1 - p1_2 -> N21) then this means that p1_1 and p1_2 are equal in the sense that the edges match.  
     131  
     132In my case, this lead to looking at the longest common subsequence (LCS) of the edges that match.  This would then allow diff-like output in the case where one or more edges DO match but others do not match or are missing.  This would provide more information than my current comparison performs because my current comparison stops upon encountering an edge that is out of sequence without being able to check if maybe the NEXT one would have matched.  
     133  
     134What I am working with and how I do the comparison between the two graphs at the moment is to grab the list of nodes.  For each node, I grab the list of edges and compare them until I fail to find a match.  At that point the whole process stops.  I want to be able to continue to check other nodes and other edges without stopping.  
     135  
     136To proceed, the meaning of a node matching is simple:  the characteristics of the node are compared and if they are equal, we proceed with the next step. [Note to self: For optimization I may decide to keep some of this info around as well.  There are many repeated nodes in the graphs I am comparing and if node N1 has been compared to N1 before, there is no need to compare it's characteristics all over again.]  Next compare the edges. I approach this as an LCS problem.  Each edge_list G1(N1)(p1_1, p1_2, p1_3, etc) will be compared to G2(N1)(p1_1, p1_2, p1_3, etc) the same way we approach the longest common sub-sequence problem.  Then we go on to the next node in our node_list and do the same thing.  
     137  
     138[Note to self: just try making something at CMS and see what happens.  The LCS there will have to allow for comparator (defaults to operator==).  If I use the template I made, what will I be putting?  LCS<edge_list>.  Does edge_list have operator []?  If not then I need to figure out that too...]  
     139