Json Diffing Discussion

pturmel · October 3, 2024, 3:27pm

That appears to be the version used in most of v8.1.x. I build against v8.1.0, mostly.

Brian_Ray · October 3, 2024, 3:57pm

We should probably publish it to the thirdparty repo instead. @Brian_Ray might know more.

Will dig around today or tomorrow at the latest and report back here.

pturmel · October 3, 2024, 4:24pm

It isn't urgent. Looks like Gson will simply do the right thing with ordered maps when encoding. Will test.

pturmel · October 3, 2024, 8:52pm

Tested. This project script (as jsonUtil):

# Human-friendly and git-friendly (deterministically diffable) JSON encoding.

from java.lang import Throwable
from java.util import TreeMap
from com.inductiveautomation.ignition.common.util import Comparators
from com.inductiveautomation.ignition.common.gson import GsonBuilder

logger = system.util.getLogger(system.util.getProjectName() + '.' + system.reflect.getModulePath())
ciAlnumCmp = Comparators.alphaNumeric(False)

# Deep copy with conversion of maps to key-ordered maps and
# conversion of lists-of-dicts that contain a 'name' key into
# ordered lists.
def ordering(subject):
	if hasattr(subject, 'items'):
		subst = TreeMap(ciAlnumCmp)
		for k, v in subject.items():
			subst[k] = ordering(v)
		return subst
	first = None
	try:
		first = subject[0]
	except:
		pass
	try:
		if first and hasattr(first, 'items'):
			subst = TreeMap(ciAlnumCmp)
			subst.update([(x.get('name', str(i)), ordering(x)) for (i, x) in enumerate(subject)])
			return values()
	except Throwable, t:
		logger.info("Java failure in ordering:", t)
	except Exception, e:
		logger.info("Jython failure in ordering:", system.reflect.asThrowable(e, None))
	return subject
#

#
def orderedJson(source):
	json = system.util.jsonDecode(source)
	ordered = ordering(json)
	gson = GsonBuilder().setPrettyPrinting().create()
	return gson.toJson(ordered)
#

Called from this designer console script:

original = [{'a9': 1, 'A10': 1.1, 'A12':1.2, 'a11': [4, 5, 6]}]
encoded = system.util.jsonEncode(dict(_=original), 2)[6:-1]
print "Using system.util.jsonEncode():"
print encoded
print "Using Gson:"
print jsonUtil.orderedJson(encoded)

Produces this output:

>>> 
Using system.util.jsonEncode():
[{
  "A10": 1.1,
  "A12": 1.2,
  "a11": [
    4,
    5,
    6
  ],
  "a9": 1
}]
Using Gson:
[
  {
    "A10": 1.1,
    "a11": [
      4,
      5,
      6
    ],
    "A12": 1.2,
    "a9": 1
  }
]
>>>

Note that the workaround for pretty-printing JSON lists is not needed.

But not quite right.

michael.flagler · October 3, 2024, 10:20pm

I played with your script some for testing/fun, and I noticed that if I use just encoded = system.util.jsonEncode(original) instead of what you have in your designer console and pass that to the orderedJson function, I get a slightly different result that's closer to your original sort, but using natural ordering for numbers it seems:

[
  {
    "A10": 1.1,
    "A12": 1.2,
    "a9": 1,
    "a11": [
      4,
      5,
      6
    ]
  }
]

pturmel · October 4, 2024, 5:59pm

OK. My ordering() function was buggy. This is better:

# Human-friendly and git-friendly (deterministically diffable) JSON encoding.

from java.util import TreeMap
from com.inductiveautomation.ignition.common.util import Comparators
from com.inductiveautomation.ignition.common.gson import GsonBuilder

ciAlnumCmp = Comparators.alphaNumeric(False)

# Deep copy with conversion of maps to key-ordered maps and
# conversion of lists-of-dicts that contain a 'name' key into
# ordered lists.
# Keys other than "name" may be supplied, or None to disable
# re-ordering lists of dictionaries.
def ordering(subject, listKey='name'):
	if hasattr(subject, 'items'):
		subst = TreeMap(ciAlnumCmp)
		for k, v in subject.items():
			subst[k] = ordering(v)
		return subst
	if hasattr(subject, '__iter__'):
		# Use a generator to exit quickly if any element of the
		# list-like object is *not* a dictionary-like object.
		if listKey and all(hasattr(inner, 'items') for inner in subject):
			reordered = TreeMap(ciAlnumCmp)
			reordered.update([(x.get(listKey, str(i)), ordering(x)) for (i, x) in enumerate(subject)])
			return reordered.values()
		return [ordering(x) for x in subject]
	return subject
#

#
def orderedJson(source):
	json = system.util.jsonDecode(source)
	ordered = ordering(json)
	gson = GsonBuilder().setPrettyPrinting().create()
	return gson.toJson(ordered)
#

def test():
	original = {'a8': 0.9, 'a9': 1, 'A10': 1.1, 'A12':1.2, 'a11': [4, 5, 6]}
	ordered = jsonUtil.ordering(original)
	print "Ordered:"
	print repr(ordered)
	print "Using system.util.jsonEncode():"
	print system.util.jsonEncode(dict(_=ordered), 2)[6:-1]
	print "Using Gson:"
	gson = GsonBuilder().setPrettyPrinting().create()
	print gson.toJson(ordered)
#

Running someScript.test() in the script console yields this:

Ordered:
{u'a8': 0.9, u'a9': 1, u'A10': 1.1, u'a11': [4, 5, 6], u'A12': 1.2}
Using system.util.jsonEncode():
{
  "A10": 1.1,
  "A12": 1.2,
  "a11": [
    4,
    5,
    6
  ],
  "a8": 0.9,
  "a9": 1
}
Using Gson:
{
  "a8": 0.9,
  "a9": 1,
  "A10": 1.1,
  "a11": [
    4,
    5,
    6
  ],
  "A12": 1.2
}
>>>

Much better.

Also lets you disable list re-ordering, or use a different key.

Brian_Ray · October 4, 2024, 10:20pm

Sort of closing the loop. Barely.

We have many versions cached in a Gradle plugin proxy repo in our internal Nexus, as fresh as last month (2.11.0). I'm an unsure if these are from the upstream google/gson, and am unclear why it's in a Gradle plugin proxy repo. Complete mystery.
We have three versions cached in a Maven Central proxy repo in our internal Nexus, as fresh as 2019 (2.8.5). That makes a little more sense as it's a "regular" lib and not a plugin. Provenance: big shrug emoji.
We have two versions published to our hosted "third party" repo in our internal Nexus, as fresh as 2018 (2.8.2.1). Perry believes he may have hand-cranked those up there from a Maven build on his old dev machine.

TL;DR: I don't think we have any pipeline or GH Action publishing this automagically from either the public nor internal GitHub repos.

pturmel · October 5, 2024, 12:24am

My ant builds don't rely on maven or gradle repos, so it isn't urgent, but I would like to start moving to gradle. (Now that signing should work.)

The Gson version that I import in that script is the one packaged as ia-gson-x.y.z.jar in the product. That is 2.8.5 across most of v8.1.x.

nminchin · October 8, 2024, 12:05am

Ok, I finally got some time to test this, and I do have to report that it does suffer from the issues I've mentioned before.

Using these tags:

Original

{
  "tags": [
    {
      "name": "AreaA",
      "tagType": "Folder",
      "tags": [
        {
          "valueSource": "memory",
          "name": "Location",
          "value": 2,
          "tagType": "AtomicTag"
        },
        {
          "valueSource": "memory",
          "name": "Name",
          "value": 3,
          "tagType": "AtomicTag"
        },
        {
          "valueSource": "memory",
          "name": "Detail",
          "value": 1,
          "tagType": "AtomicTag"
        }
      ]
    },
    {
      "name": "AreaB",
      "tagType": "Folder",
      "tags": [
        {
          "valueSource": "memory",
          "name": "Detail",
          "value": 1,
          "tagType": "AtomicTag"
        },
        {
          "valueSource": "memory",
          "name": "Name",
          "value": 3,
          "tagType": "AtomicTag"
        },
        {
          "valueSource": "memory",
          "name": "Location",
          "value": 2,
          "tagType": "AtomicTag"
        }
      ]
    }
  ]
}

New

{
  "tags": [
    {
      "name": "AreaA",
      "tagType": "Folder",
      "tags": [
        {
          "valueSource": "memory",
          "name": "Name",
          "value": 30,
          "tagType": "AtomicTag"
        },
        {
          "valueSource": "memory",
          "name": "Donkey",
          "value": 888,
          "tagType": "AtomicTag"
        },
        {
          "valueSource": "memory",
          "name": "Location",
          "value": 10,
          "tagType": "AtomicTag"
        }
      ]
    },
    {
      "name": "AreaC",
      "tagType": "Folder",
      "tags": [
        {
          "valueSource": "memory",
          "name": "Name",
          "value": 3,
          "tagType": "AtomicTag"
        },
        {
          "valueSource": "memory",
          "name": "Detail",
          "value": 1,
          "tagType": "AtomicTag"
        },
        {
          "valueSource": "memory",
          "name": "Location",
          "value": 2,
          "tagType": "AtomicTag"
        }
      ]
    }
  ]
}

I get these diff results from WinMerge:

and this from diff -U20 ...
EDIT:

Importantly, this is just a super simple example. This obviously gets a lot more complicated, and having it produce diff results like this becomes impossible to read.

On the other hand, using the littletree library which understands the tag json structure and that the "name" keys define the elements of the tag paths, I get:

tagpath	tagtype	change	from	to
.../AreaA/Location.value	AtomicTag	Property modified	2	10
.../AreaA/Name.value	AtomicTag	Property modified	3	30
.../AreaA/Detail	AtomicTag	Tag removed	None	None
.../AreaA/Donkey	AtomicTag	Tag added	None	None
.../AreaB	Folder	Tag removed	None	None
.../AreaC	Folder	Tag added	None	None
.../AreaC/Name	AtomicTag	Tag added	None	None
.../AreaC/Detail	AtomicTag	Tag added	None	None
.../AreaC/Location	AtomicTag	Tag added	None	None

pturmel · October 8, 2024, 12:13am

That diff -U20 looks like it was run against the unordered originals.

nminchin · October 8, 2024, 12:16am

Ah my bad. I originally included the Site folder which obvioulsy produced silly results, then copied the wrong json after re-comparing... let me update...

nminchin · October 8, 2024, 12:19am

The output is definitely much better! Still some issues though e.g. the new "Donkey" tag didn't have a value change from 1 to 888; it was created with value 888. This becomes more significant the more complex the json is

Just for reference, this is how it's displayed in Git:

However, if I change the New's AreaC tags, I then get a bunch of changes that I don't want to see (AreaC was added; tags have not "changed" within it, they were created like that so they shouldn't be flagged as changes from the original, but rather complete and separate additions)

Perhaps the most significant drawback though, is that you don't know what the tagpath of the tag you're comparing is without backtracking up the structure and manually piecing together the folder names. This could be almost impossible for a full tag export, and a significant brain drain when there are many changes.

New v2

{
  "tags": [
    {
      "name": "AreaA",
      "tagType": "Folder",
      "tags": [
        {
          "valueSource": "memory",
          "name": "Name",
          "value": 30,
          "tagType": "AtomicTag"
        },
        {
          "valueSource": "memory",
          "name": "Donkey",
          "value": 888,
          "tagType": "AtomicTag"
        },
        {
          "valueSource": "memory",
          "name": "Location",
          "value": 10,
          "tagType": "AtomicTag"
        }
      ]
    },
    {
      "name": "AreaC",
      "tagType": "Folder",
      "tags": [
        {
          "valueSource": "memory",
          "name": "Name",
          "value": 35,
          "tagType": "AtomicTag"
        },
        {
          "valueSource": "memory",
          "name": "Details",
          "value": 1,
          "tagType": "AtomicTag"
        },
        {
          "valueSource": "memory",
          "name": "Zoo",
          "value": 200,
          "tagType": "AtomicTag"
        }
      ]
    }
  ]
}

pturmel · October 8, 2024, 1:20am

Indeed. This is the reason my recommendation to IA for on-disk config of tags would be to store folders of tags/udt instances in actual filesystem folders. That takes the hierarchy out of JSON where it is too difficult for generic diff to handle. (UDT instances need to be single files, but atomic tags could be consolidated per folder, I think.)

nminchin · October 8, 2024, 2:04am

Hmm, interesting idea. I feel that might bring with it some other issues though. Although I would expect that the existing tag json would still be producable out of the box.

Chris_Bingham · October 29, 2024, 8:39pm

My implementation of a dictionary & list deepSort() recursive function:

*Edit: Keys within dictionaries (and values within lists) are not 'sorted' (alphabetically & numerically ascending) but does appear to organize keys/values of two UDTs in the same order. Hmmm....

def deepSort(obj, ignoreKeys=[]): # Beta
	if isinstance(obj, list):
		result = []
		for value in sorted(obj):
			if hasattr(value, '__iter__'):
				result.append(deepSort(value))
			else:
				result.append(value)
		return result
	elif isinstance(obj, dict):
		result = {}
		for key, value in sorted(obj.items()):
			if not key in ignoreKeys:
				if hasattr(value, '__iter__'):
					result[key] = deepSort(value)
				else:
					result[key] = value
		return result
	else: return sorted(obj)

I wrote this to compare two tag configs (a 'model' UDT vs 'subject' UDT) that exist in two providers. The 'path' parameters were always causing a diff, so I added the ability to ignore specific keys, though decided not to utilize in favor of a less-than-elegant search & replace on a string representation of the config... Actual usage (from script console) resembled:

relTagPath = "_types_/myUDT"
modelProvider = "Provider1"
subjProvider = "Provider2"
ignoreKeys = [] # ['path']

modelTagPath = "[" + modelProvider + "]" + relTagPath
modelCfg = system.tag.getConfiguration(basePath=modelTagPath, recursive=True)[0]
sortedModelCfg = deepSort(modelCfg, ignoreKeys)

subjTagPath = "[" + subjProvider + "]" + relTagPath
subjCfg = system.tag.getConfiguration(basePath=subjTagPath, recursive=True)[0]
sortedSubjCfg = deepSort(subjCfg, ignoreKeys)

replaceString = "[########]" + relTagPath
strSortedModel = str(sortedModelCfg).replace(modelTagPath, replaceString)
strSortedSubj = str(sortedSubjCfg).replace(subjTagPath, replaceString)

print strSortedModel
print strSortedSubj
print (strSortedModel == strSortedSubj)

Still not sure it solves @nminchin's problem...of easily detecting what is different - but appears to recursively sort iterables on the few objects I've tested with.

Chris_Bingham · October 29, 2024, 9:38pm

For an example output, here is a Notepad++ File Compare on your 'Original' vs 'New v2' objects following a deepSort():