Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow to customize internal object creator to allow keeping native order during decode #4

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tst2005
Copy link
Contributor

@tst2005 tst2005 commented Aug 29, 2016

Hello,

A lua simple hash table does not remind the original order.
The only way to remind the write order is to setup a special metatable.
I didn't try to implement this feature directly inside lunajson because I think it is not always usefull.
I made a way to customized it with an external handler.

My test sample with writeorder.lua has a stable result :

DEBUG: writeorder.newtable: rawset      a       aa
DEBUG: writeorder.newtable: rawset      c       cc
DEBUG: writeorder.newtable: rawset      b       bb
a       aa
c       cc
b       bb
1       11
2       22
3       33

Regards,

@grafi-tt
Copy link
Owner

grafi-tt commented Oct 4, 2016

My apologize for the very late reply!

I agree there should be a way to keep the order of objects.
But, allowing a custom object creator seems too general for me.
I think it is enough just having a option to use this ordered table.
If someone need more sophisticated control, then they can use the SAX-style parser.

If you agree, I'd like to include your test sample to the source tree and add the option, and merge it soon.

@grafi-tt grafi-tt added this to the 1.3 milestone Sep 4, 2017
@muhmuhten
Copy link
Contributor

I realise this ticket is two years old at this point, but I had some thoughts.

The specific goal of retaining key order can be achieved with a relatively small change:

diff --git a/decoder.lua b/decoder.lua
index 4478894..2e40aed 100644
--- a/decoder.lua
+++ b/decoder.lua
@@ -403,10 +403,12 @@ local function newdecoder()
 		if rec_depth > 1000 then
 			decode_error('too deeply nested json (> 1000)')
 		end
-		local obj = {}
+		local obj = {[0]=false} -- distinguish empty arrays from objects
 
 		f, pos = find(json, '^[ \n\r\t]*', pos)
 		pos = pos+1
+
+		local i = 0
 		if byte(json, pos) ~= 0x7D then  -- check closing bracket '}' which means the object empty
 			local newpos = pos-1
 
@@ -441,7 +443,8 @@ local function newdecoder()
 					f = dispatcher[byte(json, newpos+1)]
 				end
 				pos = newpos+2
-				obj[key] = f()  -- parse value
+				i = i+2
+				obj[i-1], obj[i] = key, f() -- parse value
 				f, newpos = find(json, '^[ \n\r\t]*,[ \n\r\t]*', pos)
 			until not newpos
 
@@ -452,6 +455,12 @@ local function newdecoder()
 			pos = newpos
 		end
 
+		-- Second pass for kv mapping is empirically faster than doing it
+		-- alongside the pair assignments in the inner loop
+		for j=1,i,2 do
+			obj[obj[j]] = obj[j+1]
+		end
+
 		pos = pos+1
 		rec_depth = rec_depth - 1
 		return obj

Based on informal testing, this imposes a 10~20% performance penalty. There are a few issues to consider, however:

  • The table corresponding to an object looks rather strange: it is a sequence where elements at odd indices are keys, and elements at even indices are values associated with preceding keys. It may seem more reasonable to have {key,value} pairs, but creating a table for each kv-pair is incredibly slow and must be avoided.
  • This approach invalidates checking table[1] to determine whether a table was a nonempty array in JSON. IMO this isn't a sensible thing to do in the first place, since arrays may be empty, but is a common idiom, especially when working with other libraries which don't have arraylen-like functionality.
  • JSON objects keys may have multiplicity. Moreover, different values may be associated with the same key in the same object. (They "SHOULD" not, but in practice sometimes do.) Ignoring this issue would allow us to simplify the second chunk to i=i+1 obj[i], obj[key] =key, f() and omit the third chunk. This reduces the performance penalty slightly. Duplicate keys will still be preserved and need to be considered separately.
  • There's no particularly good place to store the number of kv-pairs: [0] is taken by the arraylen option, and a string key can't be used because those may appear in the actual JSON object. This can be problematic when using nullv=nil, since the presence of null will interrupt the sequence and make # give a strange result. This can be worked around since the keys must never be null, so we can define a helper function as a workaround:
local function kvpairs(t)
	local i = -1
	return function ()
		i = i+2
		if t[i] then
			return t[i], t[i+1]
		end
	end
end
  • The encoder adjustments needed to make this work are left as an exercise to the reader. The resulting table structure is very unnatural to work with in Lua.

@tst2005
Copy link
Contributor Author

tst2005 commented Nov 26, 2018

Hello guys,
Yes, this request is quiet old, but still alive !

I think the ability to remind the keys order is a good feature, but it is only usefull in particular case.
IHMO, due to performance cost, the default should stay at unordered.

Lua still have lot of different interpretor version, lua 5.1/5.2/5.3, luajit, raptorjit, etc.
The pairs's behavior change between versions.

I think lunajson can not provide a way to get the ordered-keys without providing the pairs function to deal with the implementation done.

Finaly I still think that adding an external handler is not a bad choice.

  • if I want a special handler then I also use my own pairs to deal with it.
  • if I don't want a special handler then I use the standard pairs.

If we want make the user's life easier we can add the handler and pairs implementation.
There not only one way to implement the feature.
My first try was done with a lua code made by a thirdparty people (found on lua-users.org). It is not my own code, I don't know his exact LICENSE.

If you really refuse external handling (I can understand that), is it possible to add an argument to toggle this feature ?
I think it can be possible to add some condition in the code to disable the feature and restore performance near the original.

@tst2005
Copy link
Contributor Author

tst2005 commented Nov 26, 2018

@muhmuhten I use your patch and add some if to be able to enable/disable it.
See https://github.com/grafi-tt/lunajson/compare/master...tst2005:keep-order-20181126?diff=unified&expand=1
With this way, we still need a clean way to provide the appropriate pairs (the kvpairs in your sample).

@tst2005
Copy link
Contributor Author

tst2005 commented Nov 28, 2018

Hello,

I made some changes.
I assigned a metatable to the object to provide the custom pairs handler.
the metatable contains the __pairs key with the kvpairs function code.

Only Lua >= 5.2 take care about the mt.__pairs, then I provide a lunajson.pairs to jump to the Lua 5.2 behavior if the current pairs does not follow the correct behavior (checked with a quick self test).
The lunajson.pairs will keep the standard pairs if the behavior is already correct.
I also change the encoder to use this pairs.

We should support to re-encode in the same readed order (not tested yet).

Regards,

@grafi-tt grafi-tt mentioned this pull request Jan 4, 2019
@tst2005 tst2005 mentioned this pull request Jan 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants