Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use an immutable/mutable design #45

Open
mdedetrich opened this issue Nov 14, 2017 · 5 comments
Open

Use an immutable/mutable design #45

mdedetrich opened this issue Nov 14, 2017 · 5 comments

Comments

@mdedetrich
Copy link
Owner

mdedetrich commented Nov 14, 2017

After some thinking in regards to the design tradeoffs of ScalaJson, I am exploring the idea of renaming unsafe to mutable and doing some other changes (which mainly means removing the use of exceptions).

Reasons are

  1. immutable datastructures are great for some things (guarantee of no thread issues and improves reasoning via referential transparency) however when it comes to raw performance, they are still much slower when it compared to mutable data structures.
  2. Expanding on point 1, due to the inherit complexity of immutable datastructures for non trivial collections (i.e. not LinkedList) this performance problem is even worse. In particular the issue is with a Map collection (specifically a Map that retains key insertion order). There is currently no known immutable datastructure which satisfies both having key insertion order and effectively constant amortized lookup times (both by key and by index for the map). This is because the tree structure that most immutable datastructures use (for structural sharing) use branching to maintain either order OR lookup by key. Currently all implementations of such a datastructure is just a compound datastructure which maintains a Vector/HashMap underneath. This for obvious reasons is not very performant.
  3. The performance issues above are amplified even more on the Scala.js distribution. The ScalaJson implementation on Scala.js (for unsafe and also for the proposed change to mutable) uses an internal Javascript array for JArray/JObject where as for immutable we have to use Vector. Since browsers have optimized Javascript array for decodes, its really fast. On the other hand, a Scala.js ported immutable Vector is going to be much slower
  4. Most people don't care about thread safety specifically when dealing with Json AST's. People (or libraries) typically don't parse JSON (to an AST) in parallel mainly because of issues with synchronization overhead and also how noisy JSON data is. Also people don't typically alter the JSON AST directly (often people parse from json data straight to a case class, and they work on this case class instead). As an example, for performance Circe internally used a mutable HashMap for Json Objects and only makes an immutable view when you try to read from the AST directly.
  5. It doesn't seem immediately obvious that people will use the ScalaJson JsonAST just for parsing, its most likely either going to be the final output for after parsing the JSON or its not going to be visible at all (i.e. only used internally)

The proposed changes will be

  1. rename unsafe to mutable
  2. remove the use of exceptions in mutable (so at least the interface is as safe as it is in in the standard implementation)
  3. Try and mirror the collections design, i.e. there is going to be a JsonAST in immutable and in mutable package namespaces with the default constructors aliasing to immutable (just like with List and Map in scala collections)
  4. For unsafe Json Object, going to change from using an array of JField to a Map datastructure. On Scala this will be standard mutable HashMap where as on Scala.js this will be Javascripts own Map (if possible). This is so that we have the ideal data structure both for performance and for usage. This does mean we will lose the ability for parsing and retaining duplicate keys (it seems that in Scala land no one relies on this feature. Only Json4s has this ability, and they actually want to change to a Map).

@sirthias @Ichoran @jvican @dwijnand @eed3si9n @travisbrown @gmethvin @farmdawgnation @SethTisue Thoughts?

@farmdawgnation
Copy link

rename unsafe to mutable

👍

remove the use of exceptions in mutable (so at least the interface is as safe as it is in in the standard implementation)

👍

Try and mirror the collections design, i.e. there is going to be a JsonAST in immutable and in mutable package namespaces with the default constructors aliasing to immutable (just like with List and Map in scala collections)

Can we define the "default" constructors in the parent package of mutable and immutable? So I guess at the scalajson level. Reason being that if I import mutable's AST it would be real strange to suddenly get a class from immutable just because I invoked the wrong constructor.

For unsafe Json Object, going to change from using an array of JField to a Map datastructure. On Scala this will be standard mutable HashMap where as on Scala.js this will be Javascripts own Map (if possible). This is so that we have the ideal data structure both for performance and for usage. This does mean we will lose the ability for parsing and retaining duplicate keys (it seems that in Scala land no one relies on this feature. Only Json4s has this ability, and they actually want to change to a Map).

I care more about deterministic key ordering than I do duplicate keys — this is actually the reason that json4s (which has its roots in lift-json) uses the list data structure. Is work still going on to preserve ordering of keys or has that effort been abandoned at this point?

@mdedetrich
Copy link
Owner Author

mdedetrich commented Nov 14, 2017

Can we define the "default" constructors in the parent package of mutable and immutable? So I guess at the scalajson level. Reason being that if I import mutable's AST it would be real strange to suddenly get a class from immutable just because I invoked the wrong constructor.

Yes this is implied in the transition

I care more about deterministic key ordering than I do duplicate keys — this is actually the reason that json4s (which has its roots in lift-json) uses the list data structure. Is work still going on to preserve ordering of keys or has that effort been abandoned at this point?

Yes, for mutable implementation a HashMap has deterministic key ordering (as does the Scala.js Map). For immutable implementation I want to useVectorMap/LinkedMap, i.e. see #20 and https://github.com/mdedetrich/linked-map

@farmdawgnation
Copy link

This all sounds good to me then!

@SethTisue
Copy link
Contributor

SethTisue commented Nov 14, 2017

I haven't followed this story in enough detail to be sure that my feedback is that useful.

yet, this does sound plausible and appealing. +1 to "mutable" being a much clearer name, IMO, than "safe" and "unsafe"

and, as I've said before:

  • +1 to having a ground rule of always preserving key order
  • +1 to not supporting duplicate keys

@gmethvin
Copy link

gmethvin commented Nov 25, 2017

If we are talking about mirroring the collections API, could we have a common API similar to scala.collection.Seq/scala.collection.Map? In other words, can we have an API that allows traversal but not mutation and is implemented by both the mutable and immutable variants? This would make it easier to write code that works efficiently with both the immutable and mutable types, and would be a much better API for interop with other Scala JSON libraries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants