Testing algorithm was very simple:

1. expected = predefined data set (so called test fixture)

2. import data from XML file to DB

3. actual = load imported data from DB

4. assert_equal expected, actual, "all data should be equal"

The problem was in assert_equal, because "default" equal was not I wanted to be.

**Default equal is shallow**, i.e. compares only objects itself without its dependencies.

**I needed deep equality.**

Example 1

title_a = ['Some title', author_a]

author_a = ['Gauss', 'C']

In this example title and author are in relation "has a" - title_a has author_a. This causes "standard" equal to not work correctly. Let for example

author_b = ['Gauss', 'C'].

Authors a and b are equal, but have different identifiers (identities, references), thus if we define

title_b = ['Some title', author_b]

we'll get title_a not equal title_b. To obtain expected equality result I must redefine default equality operator. So far, so obvious...

The

**problem**was, that I had to

**redefine**this

**operator**

**in all**data model

**classes**. Ordinary thing, I thought. I was always doing this way. This is very simple, I have to choose which "fields" compare directly and which "through references". Very simple. And stupid. And

**error prone**. And time consuming. And very

**hard to maintain**. Why, the hell, languages I am working with do not provide such obvious functionality!? Maybe I missed something? Maybe everyone knows how to do this very easily, except me?

I did small

**research**and it appeared that such

**functionality**is

**implemented**in...

**Eiffel**, but only partailly [1]. Java, C++, Ruby and even Smalltalk don't have such language feature.

Maybe it is hard? I thought. My

**first idea**was to

**compare**two directed

**graphs**created from object instances and (some*) references between them. The problem is well known, it is called graph isomorphism. Quickly it appeared, that graph isomorphism (GI) problem has no polynomial solution. Curiously, it is believed that GI is neither P nor NP-complete... There is special complexity class, called GI-complete, for this problem and all problems with polynomial-time Turing reduction. It is belived, although obviously (P=NP?) not proven, that:

P-complete < GI-complete < NP-complete

Anyway it seemed, that although

**solution**is

**possible**, it may be

**not**

**efficient**...

* - more about this in my next post

My

**second idea**was that, comparing to GI problem, we have additional information - "root" vertices we are starting comparison from, and thus the solution may be more efficient than for GI problem. After some googling I found I was right. I found very interesting paper [2] which, besides formal point of view on deep equality, defines

**polynomial algorithm**for the problem, even in the case with circular references! Additionally, algorithm looks quite simple to implement.

My third idea was, that the solution may be even simpler if we assume no circular references. I don't know if this is true, but I would like to check this, because my...

Fourth,

**final idea**, was to

**implement deep equality**solution for

**Ruby**. But

**before**I'll do this

**I will write next post**entitled

**"Deep equality and Aggregation"**i.a. describing how to declare which relations should be taken into consideration by deep_equal operation.

References:

1. Copying and Comparing: Problems and Solutions, P. Grogono &

M. Sakkinen, 2000

2. Deep Equality Revisited, S. Abiteboul & J.V. den Bussche, 1995

## 1 comment:

Interesting. If a reverse transform (DB to XML) was easy to code you could use existing tools for deep XML comparison.

Maybe you wouldn't even have to build a complete XML tree - list of XPath-like expressions should suffice.

Post a Comment