FB_init

Thursday, April 17, 2008

Delta or difference between DataSets

Problem: determining the delta or difference between System.Data.DataSets.

Suppose you have one original DataSet (or XML that can be transformed to a DataSet) and a user modifies the data, yielding a second DataSet. Suppose you want to have smaller DataSets with only what was changed. With this smaller DataSets you can update the database or send data through a web service touching only what is needed.

The problem with DataSet's Merge and GetChanges methods is that I couldn't get changes alone. For some reason the DataSet would not recognize my primary keys. And I couldn't find a way to have only the data that was touched.

Microsoft's XmlDiff package is near what we need. It can detect changes and create patches for it. But it reconstructs the whole original data after the patch is applied. It doesn't single out only what was changed.

Approach: use and change the XmlDiff package (available here: http://msdn2.microsoft.com/en-us/library/aa302294.aspx ) to annotate the changes, and XSLT to filter what is needed. XmlDiff will compare two XML documents and annotate differences.

This is how one may use it:


using System.Xml;
using
Microsoft.XmlDiffPatch;

...

protected static string Delta(string originalXmlRead, string userData) {

XmlReader
xrOriginal= XmlReader.Create(new StringReader(originalXmlRead));
XmlReader
xrUserGiven = XmlReader.Create(new StringReader(userData));

XmlDiff diff = new XmlDiff(
XmlDiffOptions.IgnoreChildOrder | XmlDiffOptions.IgnoreComments |
XmlDiffOptions.IgnoreDtd | XmlDiffOptions.IgnoreNamespaces |
XmlDiffOptions.IgnorePI | XmlDiffOptions.IgnorePrefixes |
XmlDiffOptions.IgnoreWhitespace | XmlDiffOptions.IgnoreXmlDecl);

StringBuilder sb = new StringBuilder();
diff.Compare(xrOriginal, xrUserGiven, XmlWriter.Create(sb));
string diffg = sb.ToString();
XmlPatch
patch = new XmlPatch();

XmlDocument xmlDelta = new XmlDocument();
xmlDelta.LoadXml(originalXmlRead);
patch.Patch(xmlDelta, XmlReader.Create(new StringReader(diffg)));

return xmlDelta.InnerXml;
}

( It would be more elegant to extend the classes, but I was prototyping. And when I prototype I like to violate all the rules of Object Orientation. If for nothing else, just to prove the importance of OO :) )

- Change XmlDiffPatch's source by adding extra attributes according to the operation.

In XmlPatchOperations.cs:

- change PatchAddXmlFragment's Apply method:

while ( enumerator.MoveNext() )
{
XmlNode newNode = doc.ImportNode( (XmlNode)enumerator.Current, true );
// new stuff
XmlAttribute newAtt = doc.CreateAttribute("Patch"); // - begin gf
newAtt.Value = "PatchAddXmlFragment";
newNode.Attributes.Append(newAtt); // -- end gf

parent.InsertAfter ( ... // old stuff


- change PatchChange's Apply method:

case XmlNodeType.CDATA:
case XmlNodeType.Comment:
Debug. blah blah blah
((XmlCharacterData) blah blah blah
currentPosition = blah blah

// new stuff
if (parent.NodeType == XmlNodeType.Element) // -- begin gf
{

XmlAttribute xa = parent.OwnerDocument.CreateAttribute("Patch");
xa.Value = "PatchChange";
parent.Attributes.Append(xa);

} // -- end gf
break;


- The change to PatchRemove's Apply is left as an exercise to the reader :)

You'll then end up with some annotated XML, with new attributes pointing to changes, like this:

<vivaomengo>

<product>
<productid>Id_1</productid>
<definition>AA1</definition>
</product>
<product>
<productid>Id_2</productid>
<definition patch="PatchChange">AA2Prime</definition>
</product>
<product patch="
PatchAddXmlFragment">
<productid>Id_4</productid>
<definition>AA4New</definition>
</product>
</
vivaomengo>

The 'untouched' XML parts are still there. Let's filter it out. Apply this transformation:

<?xml version="1.0"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl" > <xsl:output method="xml" indent="yes"/>

<xsl:template match="/VivaOMengo">
<changesplit>
<changes>
<VivaOMengo>
<xsl:apply-templates select = "*[*/@Patch='PatchChange']" />
<xsl:apply-templates select = "*[@Patch='PatchChange']" />
</VivaOMengo>
</changes>
<additions>
<VivaOMengo>
<xsl:apply-templates select = "*[*/@Patch='PatchAddXmlFragment']" />
<xsl:apply-templates select = "*[@Patch='PatchAddXmlFragment']" />
</VivaOMengo>
</additions>
</
changesplit>
</
xsl:template>

<xsl:template match="Product[*/@Patch]">
<xsl:copy-of select="."/>
</xsl:template>

<xsl:template match="*[@Patch]">
<xsl:copy-of select="."/> </xsl:template>

</xsl:stylesheet>


A couple of things to point out. First, changes and additions are separated. Second, one element is hardcoded in the matching rule <span style="font-weight:
bold;">match="Product[*/@Patch]" .</span> You may find
some other XPATH expression that is agnostic to your data.

You'll end up with something like this:

<changesplit>
<changes>
<
vivaomengo>
<product>
<
productid>Id_2</productid>
<definition patch="PatchChange">AA2Prime</definition>
</product>
</
vivaomengo>
</changes>
<additions>
<
vivaomengo>
<product patch="
PatchAddXmlFragment">
<productid>Id_4</productid>

<definition>AA4New</definition>
</product>
</vivaomengo>
</additions>
</changesplit>


Note that the XML includes <span style="font-weight: bold;">only what was changed</span>! And that the XML inside the changes and additions elements are 'DataSet' friendly.

Now we can apply separate XSLT to get changes and additions. This is the one for changes:


<?xml version="1.0"?>
<
xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
exclude-result-prefixes="msxsl" >

<xsl:output method="xml" indent="yes"/>
<
xsl:template match="changesplit">
<xsl:apply-templates select = "changes" />
</
xsl:template>
<
xsl:template match="changes">
<
xsl:copy-of select="*"/>
</
xsl:template>

</xsl:stylesheet>

When applied to the previous XML it yields:


<
vivaomengo>
<product>
<
productid>Id_2</productid>
<definition patch="
PatchChange">AA2Prime</definition>
</product>
</vivaomengo>


And very similar to the transformation before, here's the one for additions:

<?xml version="1.0"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl" >

<xsl:output method="xml" indent="yes"/>
<
xsl:template match="changesplit">
<xsl:apply-templates select = "additions" />
</
xsl:template>
<
xsl:template match="additions">
<xsl:copy-of select="*"/>
</
xsl:template>

</xsl:stylesheet>

- This is how you can use it in code.


// get only what is to be updated

protected string Annotate(string delta)
{
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(Properties.Settings.Default.SplitXSLTPath);
StringBuilder ann = new StringBuilder();
StringWriter sw = new StringWriter(ann);
xslt.Transform(XmlReader.Create(new StringReader(delta)), null, sw);
return ann.ToString();
}

and now we have

string delta = Delta( originalXML, newUserData);
string
annData = Annotate(delta);
webServicesClient.UpdateTheDatabase(
annData);


On the server side you can then filter the updates and de-serialize the XML string to DataSets.


public void UpdateTheDatabase( string annData ) {
string xmlDatasetUpdate = FilterUpdateDataset(annData);
DataSet ds = new DataSet();
ds.ReadXmlSchema(XmlReader.Create(Properties.Settings.Default.SchemaPath));
ds.ReadXml(XmlReader.Create( new StringReader(xmlDatasetUpdate)), XmlReadMode.ReadSchema);
...

private string FilterUpdateDataset(string diffAnnotated)
{
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(Properties.Settings.Default.FilterChangesXSLTPath);
StringBuilder changed = new StringBuilder();
StringWriter sw = new StringWriter(changed);
xslt.Transform(XmlReader.Create(new StringReader(diffAnnotated)),null, sw);
return changed.ToString();
}


The chage for additions is analogous. You just have to use the other XSLT in another method similar to FilterUpdateDataset.



No comments: