Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Compact some JSON fields for search #11438

Merged
merged 12 commits into from
Nov 29, 2021

Conversation

rymiel
Copy link
Contributor

@rymiel rymiel commented Nov 12, 2021

work towards #11427

With these changes so far, according to my testing the docs index.json file for the crystal stdlib goes from 12MB to 8.8MB 9.2MB with no change in functionality, as only nil fields or empty arrays are omitted. These are handled gracefully in javascript using null-aware operators (hopefully supporting internet explorer isn't required)

This barely gets under the 10MB limiting factor brought up in the issue but there is more that could be done.
For example, the json contains the body of every method, which I never found references to in the search, however, it could be used by others tools or displayed in the future, so I didn't remove them (yet), I'd like to hear more opinions on if that should be kept around (or put somewhere separate, not included for the search).
Removing method bodies in particular would knock off over a megabyte

There is definitely more that could be done, but this was the most "non-destructive" thing I could think of

@rymiel rymiel changed the title docs: Compact some JSON fields for search [docs] Compact some JSON fields for search Nov 12, 2021
@oprypin oprypin self-requested a review November 12, 2021 08:58
@oprypin
Copy link
Member

oprypin commented Nov 12, 2021

From a quick look, the changes are indeed not so disruptive.
Maybe the most controversial one is external_name: it will be hard for people to realize that this field can even be present at all, causing incorrect implementations that always just use name. But this is purely hypothetical. I think even that change is fine.

Here's how I needed to adapt my reader for this:

diff --git a/mkdocstrings/handlers/crystal/items.py b/mkdocstrings/handlers/crystal/items.py
index c5e3ef84f..5205032aa 100644
--- a/mkdocstrings/handlers/crystal/items.py
+++ b/mkdocstrings/handlers/crystal/items.py
@@ -49 +49 @@ class DocItem(metaclass=abc.ABCMeta):
-    def doc(self) -> str:
+    def doc(self) -> Optional[str]:
@@ -51 +51 @@ class DocItem(metaclass=abc.ABCMeta):
-        return self.data["doc"]
+        return self.data.get("doc")
@@ -172 +172 @@ class DocType(DocItem):
-        return DocMapping([DocConstant(x, self, self.root) for x in self.data["constants"]])
+        return DocMapping([DocConstant(x, self, self.root) for x in self.data.get("constants", ())])
@@ -178 +178 @@ class DocType(DocItem):
-            [DocInstanceMethod(x, self, self.root) for x in self.data["instance_methods"]]
+            [DocInstanceMethod(x, self, self.root) for x in self.data.get("instance_methods", ())]
@@ -184 +184 @@ class DocType(DocItem):
-        return DocMapping([DocClassMethod(x, self, self.root) for x in self.data["class_methods"]])
+        return DocMapping([DocClassMethod(x, self, self.root) for x in self.data.get("class_methods", ())])
@@ -189 +189 @@ class DocType(DocItem):
-        return DocMapping([DocConstructor(x, self, self.root) for x in self.data["constructors"]])
+        return DocMapping([DocConstructor(x, self, self.root) for x in self.data.get("constructors", ())])
@@ -194 +194 @@ class DocType(DocItem):
-        return DocMapping([DocMacro(x, self, self.root) for x in self.data["macros"]])
+        return DocMapping([DocMacro(x, self, self.root) for x in self.data.get("macros", ())])
@@ -199 +199 @@ class DocType(DocItem):
-        return DocMapping([DocType(x, self, self.root) for x in self.data["types"]])
+        return DocMapping([DocType(x, self, self.root) for x in self.data.get("types", ())])
@@ -204 +204 @@ class DocType(DocItem):
-        if self.data["superclass"] is not None:
+        if self.data.get("superclass") is not None:
@@ -210 +210 @@ class DocType(DocItem):
-        return [DocPath(x, self) for x in self.data["ancestors"]]
+        return [DocPath(x, self) for x in self.data.get("ancestors", ())]
@@ -215 +215 @@ class DocType(DocItem):
-        return [DocPath(x, self) for x in self.data["included_modules"]]
+        return [DocPath(x, self) for x in self.data.get("included_modules", ())]
@@ -220 +220 @@ class DocType(DocItem):
-        return [DocPath(x, self) for x in self.data["extended_modules"]]
+        return [DocPath(x, self) for x in self.data.get("extended_modules", ())]
@@ -225 +225 @@ class DocType(DocItem):
-        return [DocPath(x, self) for x in self.data["subclasses"]]
+        return [DocPath(x, self) for x in self.data.get("subclasses", ())]
@@ -230 +230 @@ class DocType(DocItem):
-        return [DocPath(x, self) for x in self.data["including_types"]]
+        return [DocPath(x, self) for x in self.data.get("including_types", ())]
@@ -321 +321 @@ class DocMethod(DocItem):
-        args = [arg["external_name"] for arg in d["args"]]
+        args = [arg.get("external_name", arg["name"]) for arg in d.get("args", ())]
@@ -345 +345 @@ class DocMethod(DocItem):
-        return self.data["abstract"]
+        return bool(self.data.get("abstract"))
@@ -357 +357 @@ class DocMethod(DocItem):
-            html = self.data["args_string"]
+            html = self.data.get("args_string", "")

@rymiel
Copy link
Contributor Author

rymiel commented Nov 12, 2021

it will be hard for people to realize that this field can even be present at all

An alternative would be to go the other way, i.e. an external_name that is always present and an optional internal_name (which is more in like with how that feature is used anyway). Either way, it doesn't "cut" much from the result so it could be reverted

Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks fine and non-destructive. It's still a breaking change in the data format, and I'm not sure if we should just do that. Also, while being a huge and essentially free improvement, there's still way. I'd suggest to discuss further enhancements in #11427.

Ideally, doc and summary should never be nil because we want documentation for everything. But it's still okay to omit the field when it's missing.

src/compiler/crystal/tools/doc/type.cr Outdated Show resolved Hide resolved
src/compiler/crystal/tools/doc/html/js/_search.js Outdated Show resolved Hide resolved
@Blacksmoke16
Copy link
Member

Publishing a https://json-schema.org for it could help implementers have an idea what the data model is without manually testing every scenario.

@straight-shoota
Copy link
Member

@Blacksmoke16 Yeah, the entire index.json is mostly a "just let the doc generator spit out the data because I need that for search and other stuff". It's not so much of a formally architectured data format, as an experimental approach to find out what we even need. Functionally, it serves quite well as it is, but by now it's clear that we need a more refined structure.

@rymiel
Copy link
Contributor Author

rymiel commented Nov 15, 2021

I've included a schema file in json-schema format., it should match the index.json created by the reductions on this branch.

As this is the first json schema i've actually written, I assume it might have some issues, but it does match the source in strict mode

@rymiel rymiel requested a review from oprypin November 15, 2021 18:44
@oprypin
Copy link
Member

oprypin commented Nov 15, 2021

@rymiel Sorry, could you please drop the changes about json-schema from this pull request? I wouldn't be comfortable reviewing that at this time, so let's not block the original, already approved, change.
The rest can come later.

In this particular case, force push is OK.
git reset --hard HEAD~
git push --force

oprypin added a commit to mkdocstrings/crystal that referenced this pull request Nov 15, 2021
Copy link
Member

@sdogruyol sdogruyol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rymiel 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants