Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Website & API Doc site generator using DocFx script #206

Merged
merged 66 commits into from
Feb 26, 2019

Conversation

Shazwazza
Copy link
Contributor

@Shazwazza Shazwazza commented May 16, 2017

This is a PR to build both a new website and the API documentation using DocFx.

There are several changes:

  • build script for both the docs and the website
  • updates to the JavaDocToMarkdownConverter (which is the processor to convert the lucene Java docs to a usable md version for docfx ... this is an ongoing update to deal with all of the quirks)
    • Whenever this is run, it will fix/update many of the md converted doc files
  • website files for docfx

To test the website you can run: websites/site/site.ps1 which will build the site and start a webserver at http://localhost:8080, any changes made to the site just stop the script (ctrl + c) and re-run it and it will do an incremental build. To just build the website for deployment, run websites/site/site.ps1 -ServeDocs 1 -Clean 1 which will clean all temp files and compile the website to a static website at websites/site/_site.

To test the docs you can run: websites/apidocs/docs.ps1 which will build the site and start a webserver at http://localhost:8080, any changes made to the docs just stop the script (ctrl + c) and re-run it and it will do an incremental build. To just build the docs for deployment, run websites/apidocs/docs.ps1 -ServeDocs 1 -Clean 1 which will clean all temp files and compile the website to a static website at websites/apidocs/_site.

(In both cases, the build operation takes a few minutes!)

Website tasks to complete:

Docs tasks to complete:

Additional tasks (nice to have):

@synhershko
Copy link
Contributor

This looks great, thanks for the initiative!

@NightOwl888 I assume many files' code comments are still broken so we will gradually get them fixed so it looks better then

@wwb is there a way we can use our CI to generate the docs for each build (and then as a next step maybe automatically pull them for a static hosting of some kind e.g. github pages)?

@NightOwl888
Copy link
Contributor

@Shazwazza

Thanks for this!

it's also possible to add markdown articles using docfx but I've removed these for now until we might want them.

We definitely want them. Lucene has HTML documents that they add to each package, and often this is where the best code samples and detailed overview of the API can be found. It would be best if we could add the HTML documents unmodified from Lucene to our repo and have the script convert them to be used in the documentation. Then we just need to copy over the files from the next version and that part of the documentation will be automatic. Here is an example of one of those HTML files:

https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.8.0/lucene/highlighter/src/java/org/apache/lucene/search/highlight/package.html

If there is a way to automate converting the code samples (preferably both to C# and VB.NET), it would be ideal, but at least that would likely be the only part of the document that needs to change if converting the code samples is not possible.

It occurred to me that we also need to re-map the namespaces, but we should be able to easily automate that part.

For the home page, we should also aim to provide the same information as the rest of the Java API docs: https://lucene.apache.org/core/4_8_0/.

@synhershko

Yes, many of the files in Lucene.Net and Lucene.Net.Codecs have not been cleaned up yet. Plus there are some other places where the comments need to be fixed up a bit. I have been doing this bit by bit during the hour-long test runs when I can't really do much else.

We could really use some help with this, as it would take one person the better part of a week to get it all done. If 50 people contributed an hour each, we would be done in an hour ;).

@Shazwazza
Copy link
Contributor Author

OK cool, I'm happy to update this PR with what I can and let you know what I get done. Probably isn't going to happy super fast but I can put in some work each week!

@NightOwl888
Copy link
Contributor

NightOwl888 commented Jun 12, 2017

@Shazwazza

Thanks again. I took a look and documentation generated perfectly. The documentation and code samples look great.

I have done all of the grunt work to update the documentation comments to get rid of nearly all of the compile warnings (at least in Visual Studio).

However, there are a few issues/limitations that I found with the generated documentation, as well as some features that would be nice to build in.

Package Breakdown

The Lucene documentation (https://lucene.apache.org/core/4_8_0/) breaks the API down by package first, and then allows you to drill into types.

I am torn between that approach and putting everything into one "bucket" like we currently have, which is similar to MSDN. The filter makes it easy to find something specific, but it is difficult to tell where the core types are vs the specialized add-ons.

The amount of data that you have to wade through is a bit overwhelming. For example, the navigation initializes with mostly obscure analysis packages in view before more useful namespaces. If we could somehow arrange it so the main namespaces show up at the top level, and allow a drill down to the levels below (or at least have an additional navigation feature that does this), that would seem more appropriate.

.NET Standard vs .NET Framework

The APIs for each framework are similar, but there are places where they diverge. Namely, there are several types that are not supported in .NET Standard and therefore don't exist. One such example is ConcurrentMergeScheduler. If you look at that class in the documentation, there is no indication at all that it doesn't exist in .NET Core.

Ideally, the fix for that would be to generate framework/version specific documents with a "drop down" (or similar) navigation feature that allows switching between available frameworks (just like MSDN). Is this (or a workaround) possible?

Missing Links

Some of the documentation I updated have links that are not being generated in the output even though they show up fine in Intellisense. Here are some problematic files:

/api/Lucene.Net.Codecs.Bloom.html
/api/Lucene.Net.Codecs.Lucene46.Lucene46FieldInfosFormat.html

In the first case, several of the links (such as CodecHeader) are not showing up. In the second case, all of them are showing up except for the one after Attributes. I haven't figured out why this is the case.

But actually this is a symptom of another problem. In Lucene, they are able to change the link text to a code reference, but I haven't worked out what the syntax for that is (if it is even possible). You can see here that the link after Attributes has the text Map<String, String> but it links to the documentation for DataOutput.writeStringStringMap().

I tried the obvious way to create that type of link (<see cref="SomeClass">link text</see>), but that just makes the whole thing disappear. If you have any insight how this could be done I would appreciate it.

HTML pages

I mentioned this before, but after looking at this there are more than 250 HTML pages. So this is a huge amount of missing documentation and most of the code samples are in it. I recall reading that some documentation generators allow you to specify "namespace documentation", and if that is the case with DocFx, perhaps we should use that to solve this.

If you could provide a specification as to what format the "package documentation" needs to be in (i.e. Markdown) and what convention it needs to follow (where the documentation needs to be in order to show up under a specific namespace), I would be happy to put together a tool to convert the existing HTML pages to that format and location.

Viewport Width

Minor complaint. On a large monitor, only about 2/3 of the available width is being utilized. I checked and regular MSDN pages are using roughly 10% more width, and some of the newer pages (example) are using about 25% more of the available width. Is there a way to specify the maximum width be wider?

Token Replacement

In Lucene there are a couple of tokens, such as @lucene.experimental and @lucene.internal that are replaced with text such as WARNING: This API is experimental and might change in incompatible ways in the next release. in the generated output.

Worst case, we could just find and replace in Visual Studio, but it seems better maintenance-wise to use similar functionality if it is available in the doc generator.

@Shazwazza
Copy link
Contributor Author

Awesome feedback and questions. I'm currently overseas atm but will see what answers i can provide next week. I know answers to some but others will require a bit of investigation. I'll get back to you in about a week

@NightOwl888
Copy link
Contributor

@Shazwazza - Added another minor issue to the above list. Any chance you will be able to answer some of these questions soon? In particular, I would like to know if there is a spec that the HTML docs can be converted to (and whether there a convention we can use for changing the code links within them into the correct hyperlinks). Even if it is imperfect or still incomplete, it would be nice to have some documentation hosted so people using the beta have somewhere more relevant to turn than the Lucene 4.8.0 docs.

@synhershko - Any particular reason you are suggesting Github pages instead of hosting at http://lucenenet.apache.org/docs/3.0.3/Index.html? I think it would be less confusing if users only have to modify the version number in the URL to get to the latest. Although, since most of the new classes are not in the same location as the old, now would be the ideal time to jump to a different host if that is indeed the plan.

Question: For pre-releases should we be releasing new docs on each release in a new versioned location, or updating the existing 4.8.0 version location until it is fully released? Seems the former would be a better option in terms of legacy usage and automation of deployment, but may end up taking up lots of space if we end up with a lot of pre-releases.

@Shazwazza
Copy link
Contributor Author

Hi all,

Here's some feedback on many of the above questions/comments:

I've pushed some updates to this PR which:

  • Fixed up a couple of cref's
  • Updated to use latest docfx v2.17.x release (be sure to delete the /tools folder) to get the latest version when running the ps1 script. Unfortunatey upgrading to the very latest docfx 2.19.2 causes build errors. I also checked that 2.18.x also causes these build errors. I'm putting together a bug report for this now. I has to do with some YAML parsing errors and things like "quoted scalar" and "orphaned high surrogate" which sound fun ;)
  • Adds some comments to the ps1 script about where other docs live and potentially somehow scraping/automating fetching these docs
  • Update the home page of the docs to mimic https://lucene.apache.org/core/4_8_0/, I've just copied the HTML, then removed the html/body tags and replaced the header tags with Markdown header tags so they are parsed correctly by DocFx to create the side menu. The links throughout this document will need to be updated since they still point to the lucene.apache.org links. I've done this to show that HTML markup works happily side by side and inline with Markdown with DocFx which will probably make it easier to scrape docs.
  • Adds a landing page in the API Docs section which is a copy of this page: https://github.com/apache/lucene-solr/blob/branch_4x/lucene/core/src/java/overview.html (links would need to be fixed), this file is committed to /api/index.md
  • Adds example docs and table of contents: /api/search/highlight which includes the docs taken from the package.html files (i.e. https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.8.0/lucene/highlighter/src/java/org/apache/lucene/search/highlight/package.html)
  • Updated the header menu to point to custom documentation articles and also the API docs

I cannot figure out why docfx is complaining about System cref's such as Invalid cref value "T:IDictionary{string, string}", i think it has to do with a missing config externalReference which is now obsoleted so hoping that the newer docfx version fixes this once i get it building.

If you wish to test this setup without waiting for the entire metadata for all classes to be created, you can update the /docfx.json file metadata/src/files section from "**.csproj" to "**/Lucene.Net.csproj" (which will just generate the API docs for that particular project), or just clear that out entirely if you just want to build the non-api docs for testing.

Currently DocFx does not support the namespace style docuementation that Sandcastle used to support, there's an open issue for that here: dotnet/docfx#952 So for namespace style documentation such as https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.8.0/lucene/highlighter/src/java/org/apache/lucene/search/highlight/package.html we would currently have to host these as documentation articles. Currently I've put documentation articles in the /docs folder but it's possible to have any number of different articles folders if required.

As for changing how the namespaces are shown on the left hand side and ordering by more important ones, this could be achieved by modifying the generated /api/toc.yml file after it is built. This file is autogenerated by docfx when it's building the API docs. As far as I can tell one way to do this would be with a custom Post Processor: https://dotnet.github.io/docfx/tutorial/howto_add_a_customized_post_processor.html but OOTB I don't think this is possible with standard configuration.

I'm not really sure what we can do about the .NET Standard vs .NET Framework, there is some mention of this in this issue: dotnet/docfx#1518 which apparently is fixed in this PR dotnet/docfx#1549 . I will just need to figure out exactly what all this means and what the options are.

For token replacement, i think this could also be achieved with a Post Processor in one way or another https://dotnet.github.io/docfx/tutorial/howto_add_a_customized_post_processor.html, though i did see this feature in later release notes: dotnet/docfx#1737

There's quite a lot of docs on docfx here http://dotnet.github.io/docfx/tutorial/docfx_getting_started.html

Hope this answers a few of your questions. I'll keep researching into the new docfx versions, what support it has and why we can't use it currently.

apidocs/api/index.md Outdated Show resolved Hide resolved
apidocs/docs.ps1 Outdated Show resolved Hide resolved
@NightOwl888
Copy link
Contributor

I have added some comments inline.

I also asked a question about creating code links in DocFx with custom link text that you might find the comments helpful for.

As for changing how the namespaces are shown on the left hand side and ordering by more important ones, this could be achieved by modifying the generated /api/toc.yml file after it is built. This file is autogenerated by docfx when it's building the API docs. As far as I can tell one way to do this would be with a custom Post Processor: https://dotnet.github.io/docfx/tutorial/howto_add_a_customized_post_processor.html but OOTB I don't think this is possible with standard configuration.

Normally when faced with post-build issues such as these I either overwrite the contents of the file by generating it in the Powershell script or use the Powershell script to update the contents of the file, depending on how much of the file I need control over.

But I wasn't referring to the order of them so much as the depth. For example, it would be best if we had a link to Lucene.Net.Analysis in the TOC that when clicked expanded the Lucene.Net.Analysis.Ar, Lucene.Net.Analysis.Bg, etc. instead of having all of the Analysis.Common hierarchy in the initial view that loads.

Anyway, I will wade through the rest of this and get back to you if I have any other questions/comments.

@geobmx540
Copy link
Contributor

@synhershko - Any particular reason you are suggesting Github pages instead of hosting at http://lucenenet.apache.org/docs/3.0.3/Index.html? I think it would be less confusing if users only have to modify the version number in the URL to get to the latest. Although, since most of the new classes are not in the same location as the old, now would be the ideal time to jump to a different host if that is indeed the plan.

I like having them at lucenenet.apache.org/docs I think that's the right solution.

Question: For pre-releases should we be releasing new docs on each release in a new versioned location, or updating the existing 4.8.0 version location until it is fully released? Seems the former would be a better option in terms of legacy usage and automation of deployment, but may end up taking up lots of space if we end up with a lot of pre-releases.

I'd say cross the space issue when it becomes an issue.

…es filter config, updates to latest docfx version, updates to correct LUCENENET TODO
@Shazwazza
Copy link
Contributor Author

Some updated info:

  • I've got the docfx build to work with the latest docfx version, the problem was this: ExtractMetadataException - Error extracting metadata for *** While writing a quoted scalar, found an orphaned high surrogate dotnet/docfx#1817 which means I've had to exclude the SURROGATE constants from the output docs (but i think that is probably fine), they're putting a requested fix for this with the roslyn team
  • It would seem that it may be possible to get namespace documentation working. I've been trying to research this using a docfx feature called 'overwrite' and there's mention that it works so I've asked for feedback on the topic: Question: Why \apidoc folder is needed  dotnet/docfx#229 (comment). I've got overwrites to work at the class level which might be handy in the future but would be great to see if it can work at the namespace level
  • I've added a WIKI section to show how that can be done, I've copied the HTML content from the current wiki, I only did the home page, getting started and mail group pages as examples and update the links between them. The html could/should be formatted to markdown (not sure how the source of the current wiki is?) but that is optional but the links between pages should be updated to markdowns since that's much easier to generate the links.

There's still lots for me to look into based on the previous questions. I'll keep trying out things as I find time.

@NightOwl888
Copy link
Contributor

Much appreciated - keep up the good work 👍

I am almost to the point where I will start documenting the new CLI tool. Originally, I was thinking about making 1 page per command like Microsoft did on their dotnet tool, but there isn't quite enough here for all of that. It would be easier to have 1 document for each of the 4 subcommands and a small section below for each command + 1 overview document describing the tool in general (so 5 pages of docs).

This tool contains all of the index maintenance tasks (checking, fixing, upgrading, splitting, merging, moving segments around, etc.) plus a set of demos that can be run, and source code viewed, or exported. The plan is to put this tool on Chocolatey so it can be easily installed and updated, as well as make it part of the CI release process.

Would building the docs in Markdown and placing them in a subdirectory of tools be appropriate, or would something else work better?

but that is optional but the links between pages should be updated to markdowns since that's much easier to generate the links.

This is one point that is a bit unclear to me. In the past I have tried to make links between pages on GitHub and they didn't always work - I ended up using absolute URLs to avoid the problems (but I don't recall exactly what they were). Do you have a suggestion about the correct way to make relative links between Markdown pages?

@Shazwazza
Copy link
Contributor Author

@NightOwl888 If you put markdown files in the /apidocs/tools folder that should be fine and then i can update any "toc" files to point to them. Currently that folder already exists for download 'tools' that help the docfx build process but i'll move that to a better temporary folder (i.e. 'obj/tools') . The correct way to create links between MD pages can be seen here: Shazwazza@d440348#diff-a121785cab27b808ad3b4d2fbd049bc7R6 and docfx will ensure it's all wired up correctly when it builds. Of course if you want to go up a level it's the standard "../" syntax.

@NightOwl888
Copy link
Contributor

Actually, I was referring to the new "tools" folder under "src" (to keep the docs near the source code the same as they would be by converting the HTML pages from Java).

BTW - There is some discussion about the WIKI happening on the dev mailing list. If you are not already, you should subscribe to stay looped on on this.

@Shazwazza
Copy link
Contributor Author

Ah i see, we can include any md files from anywhere in the solution so wherever you want to put them will work just fine :)

I'm on the list so all good, just haven't had a chance to reply quite yet, will do soon

@Shazwazza Shazwazza changed the title API Doc site generator using DocFx script Website & API Doc site generator using DocFx script Jan 17, 2019
@laimis laimis requested review from laimis and removed request for laimis February 26, 2019 14:11
@laimis
Copy link
Contributor

laimis commented Feb 26, 2019

@Shazwazza I will go ahead and merge this beast. Again, thank you for all your efforts and time getting this together. Well done!

@laimis laimis merged commit 0d56d20 into apache:master Feb 26, 2019
@Shazwazza Shazwazza deleted the docfx-apidocs branch March 4, 2019 04:08
@NightOwl888
Copy link
Contributor

@Shazwazza

Looks great. However, on first pass I was unable to find the docs for the lucene-cli tool. Did they not get included, or are they just hard to find? There probably should be a link to this from the home page, as it contains the demos in both executable and exportable form.

There are also some updates to those docs because I have now setup the deployment so it can be installed using dotnet.exe and run it by simply typing lucene <command> after installation.

@Shazwazza
Copy link
Contributor Author

Hi @NightOwl888 welcome back :)

Yes these are hard to find, the docs site isn't really finished or "live" yet, it's sort of pseudo live. I haven't had time to get this into the correct state but maybe soon i can.

Currently, from the new website if you go to the Documentation tab: https://lucenenet.apache.org/docs.html

This links to the different API doc sites, the first one there is the new 4.8 docs, but this is hosted on my own azure account still since it's still the demo site (but better than nothing for now). The menu on this doc site is broken, so it's a hamburger menu for the desktop site, if you click that, you'll see them https://lucenenetdocs.azurewebsites.net/cli/index.html

I should try to find some time to get the docs site running correctly and then we should get it hosted properly too.

NightOwl888 added a commit to NightOwl888/lucenenet that referenced this pull request Jul 9, 2019
@NightOwl888
Copy link
Contributor

@Shazwazza

I am working on rolling another beta and would like to try getting the docs updated to reflect the install instructions for the lucene-cli tool and the new components in the Lucene.Net.ICU project (which is now feature-complete). Note that the latter contains everything from Lucene's analyzers-icu from the docs, but we moved it to a new library because we didn't want the popular Lucene.Net.Analysis.Common project to depend on the huge ICU dependencies. In Java, the features were readily built into the JDK, but they don't exist in .NET so we had to improvise. Anyway, you can now remove the footnote that it is not done. It would probably be easier to understand if we updated the names on the home page to reflect the package names, not sure how difficult that would be.

I'd like to try to get the doc building functionality hooked into the release build. Maybe we won't have it automated to the point of doc site deployment, but it would be nice to at least get it to the point where running a build produces the docs and main website as build artifacts so they can be manually downloaded and deployed.

Ideally, we would parameterize the version number and base URLs that the docs use and pass them into a command to generate the docs. Those parameters would be put into azure-pipelines.yml where we can change them if website locations change, etc.

If you don't have time right now, we don't necessarily have to do this before the release (the docs could be manually generated and synced up afterward). But if you have time to work on this, it would be great if we could get it partially done.

Thanks again for all your hard work!

@Shazwazza
Copy link
Contributor Author

@NightOwl888 Sorry ran outta time today, i have this pinned in my inbox and will get back to you on monday with hopefully some updates too. Cheers!

@NightOwl888
Copy link
Contributor

@Shazwazza

No problem.

Just out of curiosity, how long does it take to generate docs? We could probably do it on a dedicated server in parallel with one or more of the other jobs so it doesn't add any time to the overall run, but we have a 1 hour cap on time anything can run on a single server. Also, it seems the servers take about 2x the time to do anything I run locally.

@Shazwazza Shazwazza mentioned this pull request Aug 12, 2019
@Shazwazza
Copy link
Contributor Author

Hi @NightOwl888 have pushed a PR here with notes, can discuss the above stuff there #229

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants