-
Notifications
You must be signed in to change notification settings - Fork 111
Human-readable documentation / explicit use cases missing #256
Comments
Strong +1 |
I also have found for the reference graph variation API that the avro docs have become very bloated with documentation, Richard On 12 Mar 2015, at 20:05, Frank Austin Nothaft [email protected] wrote:
The Wellcome Trust Sanger Institute is operated by Genome Research |
I think we should use the wiki on the github page. On Fri, Mar 13, 2015 at 9:17 AM, Richard Durbin [email protected]
|
@ekg |
+1 I support this. Perhaps we can discuss on an upcoming DWG call, Stephen. On 13 Mar 2015, at 09:35, Erik Garrison [email protected] wrote:
The Wellcome Trust Sanger Institute is operated by Genome Research |
+1 I’ll put it on the Agenda for the 25th. We need a person or people need to take ownership of this to make it happen.
|
++1 - This is extremely important! For me it took one month back in May of last year, to completely get up to speed using the issues to support the reasoning in the API, though at the time we kept changing our approaches more frequently. |
+1 on documentation in git separate from avdl files. It's vitally important that full, normative documentation be This is the only approach I have seen in decades of software There have been many systems develop to generate integrated Mark Richard Durbin [email protected] writes:
|
+1, I would have to concur with Mark on this. Technical documentation that stay close to the definitions in the code - Javadoc, R, etc. - definitely streamline the integration and implementation process. https://readthedocs.org/ could provide that nice integration with what we are doing. |
This will not produces as pretty a results but will solve the We can also add a gitdoc -> sphink pipeline in the future, they Benedict Paten [email protected] writes:
|
-0, I opened this issue with "human-readable" for a reason, see http://stevelosh.com/blog/2013/09/teach-dont-tell/#the-reference
I agree that if we're not in a stable enough state or anyone willing to take charge of human readable docs, then at least we should have up-to-date auto-generated docs ( Specifically, right now if someone says to me "How do I ask a Beacon v0.2 for variants that say 'AAGA' at hg19, chr17:3857" neither they nor even many of the github contributors or anyone on any other task team has any idea what the answer is since all they can see is the Avro schema and a lot of meandering conversation on 12 different issues. And I think the answer should be very self-evident and explicitly recorded in our github conversations (if we are sticking with conversations on github). I'm not certain about the status of other
|
@nouyang-curoverse, I agree with you and I think that is why @adamnovak created the following: I think we need to maybe take a step back and just write a concise rough doc with some diagrams where all the pieces fit together. I think it will then sync to an extent all the pieces of the different projects. I have seen in the past where implementations became so divergent that they did not properly talk to each other that a significant rewrite was required. I understand these are API definitions, but it might be good to create a summary report of all the projects that describe the design implementations and motivations of each. This would include the exact implementation state they are in currently with next-step action items. This would helpful to be updated periodically. At that stage careful discussions and implementations of integration would help, to ensure that all the API parts smoothly communicate and integrate with each other. This is how API teams interface with other key members/project-teams on large and complex projects in implementing a complete set of products/framework in industry throughout the software development lifecycle. This will ensure all parts are in sync and on track so no surprises creep up - which does require periodic revisiting all issues and where they stand and they fit properly with all the stated and defined goals. Having said that - which will synchronize the projects - I still think that API-specific docs will ensure implementations are created based on specific descriptions that list the definitions, assumptions and properties, which include examples that can be turned into test-cases. This only follows after what I stated in the previous paragraph is thoroughly vetted periodically. Paul |
I agree with @diekhans about having the docs in the repo, but also agree with @ekg that it should look like a wiki. However, I don't think there's a huge gap between the two. If you just create a At Berkeley, we use pandoc to generate documentation. We keep the documentation in the project git repo, and we then package it up with each release (and optionally with each build).
#242 isn't what I would call human readable documentation. They are generated API docs which are insufficient for understanding how a large system interacts. There need to be higher level human generated, human readable docs. I don't think that API docs are bad—I find them quite useful—but standalone API docs are insufficient. |
Slight digression regarding my own effort to condense the schema docs: At one point I tried dumping the Avro schema into something that could be automatically diagrammed (dot, Gviz, d3, who cares how). Couldn't make it work and gave up after an hour or two. But if the dynamically generated schema figures/doc came after the human-readable rationale for why the schema exists as such, that might address both problems simultaneously. I think a post from Kenna Shaw provoked my own effort; someone very smart who didn't have much time to spend reading AVDLs. If the API built on these schemas is to be adopted, individuals in that sort of position will need a technically correct but concise and persuasive explanation of why they should care. Figures similar to the UML diagrams previously posted, but dynamically generated, could serve as the exploded parts diagram, reserving the text of the docs for relevant discussion of why things are as they are, and what problems they solve. "To retrieve a proper MHC reference structure under the current representation, we do XYZ. This is inefficient and incomplete. With the GA4GH API, the same query returns ABC as an instance of type Foo, obviating the problem and allowing users to ask far more clinically relevant questions of the data (see figure 123 for specifics of Foo)" Basically a vignette, in Bioconductor terms. That aspect of BioC -- all code MUST have a substantial example of its application that is successfully run at build time every night -- is perhaps the best user-facing advertisement for that particular project, not least because google then indexes the generated vignettes every night. Some of the vignettes have more citations than the refereed journal articles describing the same software. If people are voluntarily going to adopt the API as service providers, it would help if its documentation revealed what problems (unsolved by current approaches) are addressed by the GA4GH schemas. Haussler's epic tome, for example, lives in a Git pull request comment. That is probably not optimal for widespread adoption... JMHO --t
|
By human readable documentation, I guess we mean users' manual or perhaps cookbook. I agree we need that. As to the format, I am also -1 on wiki. I'd prefer something that is more explicitly versioned and can be converted to a well-formatted PDF book. Pandoc seems good, though I haven't used it before. |
+1 Tim +1 Heng on the PDF and cookbook. Reading PDFs brings me happiness :) |
+1 to cookbook. Re: pulling out "use case examples, discussion" @pgrosu and @richarddurbin -- I believe MFiume and I plan on doing this for Beacon sometime next week. I'll check back in when that's done, and perhaps it can serve as a template for the other APIs, and after that we can work on achieving API sync across the task teams. p.s. Can someone please add a Beacon Label / who can assign labels to issues? I still can't assign labels, it seems, and we're in sore need of more issue categorization. Aside: I personally find more happiness reading well-designed websites than PDFs, PDFs are hard to search, copy-paste, or maintain multiple "tab" views, and take a long time to load, among other issues. The Atmel PDFs [1] are well-formatted, but leave something to be desired from the beginner's perspective). So I'm happy so long as the source files for our human-readable cookbook are some plaintext doc format and PDF is just one output. |
@nouyang-curoverse, sounds good and look forward to it. After having taken both FPGA and OS courses, to me this manual doesn't look so bad, but this is for a different domain. It's all about context. Sometimes reading papers (as PDFs) in machine learning or genetics I would find relaxing, but some good beer with a great soccer game on TV will always be more fun :) ~p |
In many ways, the documentation is far more important than the Paul Grosu [email protected] writes:
|
@adamnovak I guess I still don't understand #242 . It seems to be a lot of syntax for generating SVG files automatically and a random documentation file (for graphs)... instead of the actual human-readable SVG files and a skeleton of doc topics that will facilitate discussion. Perhaps adding a "Purpose of this /doc Folder" would be useful. |
I attempted to find out how to implement the "current" beacon API. I genuinly am unable to work it out. There is an avro schema, and no documentation that explains how I should use it, nor any examples. As someone looking into beacon for the first time, I'd be at a total loss as to what I should do. |
I guess we should just use standard techniques for communicating these. For On Wed, Jul 15, 2015 at 2:10 PM, Ben Hutton [email protected]
|
@ekg Beacon is the simplest of API's that GA4GH has specified. Yet, the average developer on the street probably wouldn't know where to start. I feel this is a bad position to be in. I'd be happy to help out with documentation, but I actually don't know myself! |
Since Beacon is currently completely independent of the DWG API, Right now, it's just confusing to everyone. Erik Garrison [email protected] writes:
|
Sounds reasonable to me. Who can make that happen? |
We are working on this. On Wed, Jul 15, 2015 at 7:36 AM, Ben Hutton [email protected]
|
I'll reiterate what I said long ago: if the foundational avro schema (or I tried, and failed, to generate accurate figures from the schemas, because If it can't be solved gracefully, maybe there is a deeper issue involved. best, --t On Wed, Jul 15, 2015 at 7:29 AM, Mark Diekhans [email protected]
|
@benedictpaten OK. Who is "we" and is there opportunity for others to help? There doesn't seem to be much visibility on what is happening and what isn't. On a side-bar, I'm starting to feel the new ga4gh technical website is a step backwards in terms of usefulness. |
@benhutton: does this help? On Thu, Jul 16, 2015 at 11:00 AM, Ben Hutton [email protected]
|
We is UCSC, a preliminary version of documentation is here: http://hgwdev.cse.ucsc.edu/~jeltje/build/html/introduction.html The documentation is the schema is also being updated. We will We are waiting on the linear branch to be accepted before Once we have that in place, we will be cornering the task teams Ben Hutton [email protected] writes:
|
@diekhans OK, thankyou. That makes sense. @maximilianh A bit, but I guess I'm coming from the false assumption that people will be writing APIs into existing systems as opposed to creating standalone systems. We (Decipher) would want to make it part of our existing code base, not run a new process. It is still useful to be able to view an implementation though, so thanks for the link! =] |
@benhutton: my assumption was that people have usually apache running On Thu, Jul 16, 2015 at 1:59 PM, Ben Hutton [email protected]
|
hey @maximilianh I am @benhutton but I am NOT the person you are trying to talk to. I think you're trying to talk to @Relequestual, a different Ben Hutton (nice to meet you!) |
sorry @benhutton (was replying by email where usernames are not shown), tagging the other ben hutton now @Relequestual |
haha classic. @maximilianh Maybe that's a safe assumption for most cases (I don't know, I'm still new ish to the biology field), but that isn't the case for us (Decipher). We have variants in a database format. Not ALL of a patients vairants, only the ones of interest. Deposited by various projects around the world. |
The mass of auto-generated API doc, http://ga4gh.org/documentation/api/v0.5.1/ga4gh_api.html#/, while infinitely better than nothing, is now hindering distilling / reaching clarity and consensus about some simple issues and edge cases (in my opinion).
Can we move to include more human-readable documentation along with the Avro schemas, or at least in our Github Issues discussions?
I think this will resolve a lot of confusion being generated because people are submitting solutions/pull-requests that resolve specific use cases. Without making the use cases we have in mind explicit to each other, the fact that our solutions are conflicting and need a step back to resolve may not be readily apparent.
Additionally, explicit use cases will easily allow us to write end-user oriented documentation in the future (where here the end-users are actually developers for various institutions / software packages).
The text was updated successfully, but these errors were encountered: