Deep Diff Pizza – Jakub Korab

There is nothing I love more than a proprietary, undocumented API. Call it an unfortunate fact of life, but weird object models that hang together by the skin of their teeth are out there. Most of the time there’s no validation logic to check that they’re semantically or syntatically correct until you send this tangle of objects to a system you’re integrating with. Having been burned badly in the past on this sort of work, I’ve been looking for a way to work effectively in these types of scenarios.

In my most recent case, I had some example code that built up these object graphs for various use cases. The code wasn’t transportable, but it did generate correct inputs. After reverse engineering the graph construction logic into a builder by working out of XStream dumps to console, I was then able to build up exploratory integration tests that triggered the various use case scenarios. So far, so good. Now to just remove the need to have the back-end system up.

What I needed was a way to compare a test object graph with a constructed one. I already had an XML representation thanks to XStream of the expected output, so I could reload it into memory as needed – so there’s the test data. The equals() and hashCode() methods on the model were unreliable, so that’s no good. I toyed with the idea of writing a general purpose deep diff mechanism for object graphs, but came to the conclusion that it wasn’t a good idea. Aside from not wanting to get into writing gnarly reflection code I realised that the problem was more difficult than it seemed. You get into questions like “What is equality?”, “How much change is acceptable?”, “Where do I stop? Primitives, java.util.*?”, and “How can I mark things as being acceptably different?”. It’s probably a good indication that noone had written this sort of thing already.

Then I realised that the answer was staring me in the face. Just diff the XML representation programatically! By loading up an expected input from a file next to the unit test, I could then dump the model under test out to another String and… Rock and Roll.

XMLUnit has a quite good diffing utility, which was about 80% of the way there. Some of the data in my model changed over time, so I needed a way to say “ignore these bits of the tree”. There’s a neat little interface in XMLUnit called DifferenceListener, that gets notified whenever the mechanism finds a diff. You can implement a method that decides whether to report the difference as different, the same, or similar (why you’d want to do that is beyond me – “this is different-ish”). I hacked up my own implementation that took some XPath expressions, mapping the nodes in the control graph that ought to be ignored and voila.

The more I code the more uses I find for XStream. In combination with XMLUnit, it’s like hazelnuts to chocolate. It’s an excellent option next time you need to diff object graphs.

Comments

2 responses to “Deep Diff Pizza”

Martin Harris

November 17, 2009

Nice. I have had to do similar in the past. How does it cope if the order of the records is the only difference. I presume you can ignore that using the DifferenceListener?
Jake

November 18, 2009

There’s a method on Diff called overrideElementQualifier that lets you do that. You register an ElementNameAndTextQualifier and job’s done. I don’t know what the side effects of it are (I always get uncomfortable when I see “and” in a class name) but you should be able to get that from the source.