What's the citation model for DTS Resource ? #101

PonteIneptique · 2018-05-17T15:07:31Z

In the discussion we had, we discussed having tei:refsDecl in the metadata of Resource in the Collection API.
Basically, my examples covered that in this way :

{
    "@id" : "urn:cts:latinLit:phi1103.phi001.lascivaroma-lat1",
    "@type": "Resource",
    "...": "...",
    "tei:refsDecl": [
        {
            "tei:matchPattern":  "(\w+)",
            "tei:replacementPattern": "#xpath(/tei:TEI/tei:text/tei:body/tei:div/tei:div[@n='$1'])",
            "@type": "poem"
        },
        {
            "tei:matchPattern":  "(\w+)\.(\w+)",
            "tei:replacementPattern": "#xpath(/tei:TEI/tei:text/tei:body/tei:div/tei:div[@n='$1']//tei:l[@n='$2'])",
            "type": "line"
        }
    ]
}

Which would result in having the object

capable of citation by "poem" or by "line" (because of type)
passages should always be matched by either of the matchPattern
the replacementPattern should lead to the beginning (in case of milestones ?) or container(s) of the element

I open this issue because we skipped really quickly over it in talks, agreed upon it only generally.

I think I'd have the following question :

Should we actually clarify level of citation : example, the second citation in at depth 2 (lines within poem). This is something that we cannot capture by the simple expression of match pattern actually. In the CapiTainS draft guidelines, I used the attribute tei:corresp for that but maybe we should use something like dts:depth ?
Should we make the tei:replacementPattern optional ? Or actually is there anything in there that we might feel is going too far ? (Noting that at least the citation structure is important for CTS compatibility)
I actually also think we should move @type to tei:type in the examples.
We could also make thing more complicated (but more straight forward) and allow people to build "graph" of citation system (properties name were chosen to be expressive for the example) :

[{
   "dts:citation_id": "1",
   "tei:matchPattern":  "(\w+)",
   "tei:replacementPattern": "#xpath(/tei:TEI/tei:text/tei:body/tei:div/tei:div[@n='$1'])",         
   "@type": "poem"
},
{
   "dts:citation_id": "2",
   "dts:citation_parent": "1",
   "tei:matchPattern":  "(\w+)\.(\w+)",
   "tei:replacementPattern": "#xpath(/tei:TEI/tei:text/tei:body/tei:div/tei:div[@n='$1']//tei:l[@n='$2'])",
   "type": "line"
}]

Note that while some of that might seem to be too much, they all are partial responses to real problems in the understanding from the parser standpoint of the structure of the text...

The text was updated successfully, but these errors were encountered:

PonteIneptique · 2018-05-17T15:51:20Z

I'd actually had that I would recommend moving the namespace to "http://www.tei-c.org/ns/1.0#" instead of "http://www.tei-c.org/ns/1.0" otherwise prefix extension produces "http://www.tei-c.org/ns/1.0matchPattern" and that's awful.

Or "http://www.tei-c.org/ns/1.0/" btw. Both are good to me.

Opened an issue regarding this in distributed-text-services/specifications#101

balmas · 2018-05-17T20:08:41Z

I think it might be good to be explicit about level, but the @corresp attribute doesn't really seem appropriate for that for me. Using dts:level might be better, if it validates. But I think we could also infer this from the location in the refsDecl list. Either way I think would probably be ok.
If tei:replacementPattern is optional then the matchPattern seems a little meaningless to me.
agree (assuming it validates)
I don't understand where this would be declared.

PonteIneptique · 2018-05-18T06:41:01Z

I updated number 4 so that it's clearer.

My answer here is mostly targeted at your 1 : we actually can't infer because there is a possibility people have a citation complex tree rather than a citation "line". Most of CTS texts have wonderful "book->poem->line" but what about

book
- poem
  - stanza
    - line
- paragraph
  - segment

Here, the CTS model would fail most probably. You can have different match pattern (let say poems are numbered while paragraph are [a-zA-Z]+ in regexp). Here, you would not be able to infer the level of the citation. While we could with wonderful CTS DTSIzed object, because the dot . means hierarchy, for any other text that would go with a complex tree, we would be powerless to understand the relationship between citation nodes.

emmamorlock · 2018-05-18T09:03:20Z

My 2 cents:

Don't you think a "paragraph" could contain anything and not just [a-zA-Z]+?
in question 4, isn't the example less a graph than a straightforward hierarchy (declared via "dts:citation_parent")?
what I have is:
- div
  - ab with mixed content with two types of milestones:
  - lb (with @n)
  - milestones (with @Unit and @corresp)
NB: the @corresp is essential to establish a relation with the abstracts textual corresponding units that are declared in msContents/msItem...

PonteIneptique · 2018-05-18T09:25:12Z

Quick answers :

That was only an example to show that passages could be numbers for lines while letters could be paragraph identifiers. Just showing that we might have this kind of diversity.
Technically, a hierarchy is a graph, but I don't think this is the question . Yes, I definitely gave a simple example in ex.4 but What's the citation model for DTS Resource ? #101 (comment) shows that we might have more complex ones.
Noted. Unfortunately, I have not seen in TEI any attributes that could cover depth of citation scheme or type actually, and this is also an issue for the future capitains guidelines.

hcayless · 2018-05-18T12:23:52Z

I'm confused about what this is meant to achieve (possibly I just haven't had enough coffee yet). Canonical References in TEI allow you to construct a custom URI referencing system, which is fine and good. But I'm missing the point of them here. Shouldn't the Reference API just tell you what sorts of references you can have? Why should the Collection API bother telling you how they're constructed?

balmas · 2018-05-18T13:34:01Z

and @hcayless 's comment makes me realize I misunderstood the point of this issue. I thought we were talking about the TEI refsDecl structure ... I clearly had either had not enough or too much coffee myself at that point :-)

To respond to Hugh's point, I could see the DTS API making this information available being useful for purposes of a chain of provenance or reproducibility.

To reframe my answers to the above in the correct context:

dts:depth makes the most sense to me here, in the context of the DTS API.
if I am correct that the point of this is for reproducibility, then I think replacementPattern should be present.
Does using tei:type make too many assumptions about the textual markup? What if the citation doesn't correspond to something that was identified that way?
The graph approach is tempting, but I'm a little worried it would increase the complexity of implementation

PonteIneptique · 2018-05-18T14:22:46Z

The issue with the reference API is that it throws at you references, but for example, one of the very common thing I do with CTS APIs is : Retrieve Text Metadata -> Retrieve all References at Deepest Level (thanks to Text Metadata) -> Retrieve passages based on the last results.

Right now, our system cannot provide this kind of workflow because we do not have a space to state how the references of the text are structured.

hcayless · 2018-05-18T15:06:55Z

Ok. I see the point of that use case, but I don't see yet how having the Collection API give you TEI cRefPatterns helps. Maybe I'm being dense. I see the problem, but I don't see how this is a solution.

Wouldn't it be better to come up with some declarative representation of the available levels and how citations to them are constructed? Put another way, I see the point of the matchPattern, but not the replacementPattern. As a client, I don't care how you're getting the chunk of text I want, and I wouldn't care unless I wanted to grab the document and do it myself.

What about something like:

{
    "@id" : "urn:cts:latinLit:phi1103.phi001.lascivaroma-lat1",
    "@type": "Resource",
    "...": "...",
    "dts:citeStructure": [
        {
            "dts:citePattern":  "(\\w+)",
            "dts:level": 1,
            "label": "poem"
        },
        {
            "dts:citePattern":  "(\\w+)\\.(\\w+)",
            "dts:level": 2,
            "label": "line"
        }
    ]
}

Seems like IRI templates might be better for this than regex patterns though...

PonteIneptique · 2018-05-18T15:26:54Z

I'd be totally for it. It's just that we talked about it being based on cRefPattern, but your proposed structure is good to me.

hcayless · 2018-05-21T12:42:11Z

I probably failed to properly think through the implications when we talked bout it, but now I think it's better to just tell the client how citations are constructed than to give it implementation details it can't really use.

PonteIneptique · 2018-05-21T12:44:09Z

I am not completely certain of the match pattern and replacement pattern use (whatever the namespace or implementation is). On the other end, having information about the "citation graph" structure and metadata about it seems to me important as well :) I think we have an agreement here right ?

jonathanrobie · 2018-06-07T13:22:34Z

How about a URI template along these lines:

 {
  "tei:replacementPattern": "#xpath(/tei:TEI/tei:text/tei:body/tei:div/tei:div[@n='{&n}])",
 }

PonteIneptique · 2018-06-07T13:24:54Z

Recursivity and graph description of citation scheme :

{
    "@id" : "urn:cts:latinLit:phi1103.phi001.lascivaroma-lat1",
    "@type": "Resource",
    "...": "...",
    "dts:citeStructure": [
        {
            "label": "poem",
            "dts:citeStructure": [
                {
                    "label": "line"
                }
            ]
        }
    ]
}

jonathanrobie · 2018-06-07T13:41:51Z

I need to understand the requirements and use case better. If I am in a client, what are the sequence of steps I am taking when I encounter this data, and what do I want to do with it? I assume we have to be able to handle any kind of reference the same way, supporting CTS and other references that may be quite different.

Are you looking for a way to describe the citation structure for a given resource? What do you want the client to do with it?

A set of use cases written down in this issue would be helpful.

PonteIneptique · 2018-06-07T13:47:50Z

    "dts:citeStructure": [
        {
            "dts:level": 1,
            "label": ["poem", "section"]
        },
        {
            "dts:level": 2,
            "label": ["line", "paragraph"]
        }
    ]

PonteIneptique · 2018-06-07T14:14:34Z

Three simple use cases :

As a presenting app, I want to be able to take general decisions about how the text should be shown to the client depending on its structure. ie, if a text is book-poem|chapter-line|paragraph, I want to show the text by poem|chapter, so at level 2
As a collection curator, I want to be able to specify the structure of my text (which is just another metadata).
As a corpus researcher, I want to be able to know where my narratives cut my occur, ie where cooccurence of words is irrelevant at passage boundaries (last word of poem 1 is not a relevant co-occurence of first word of poem 2)

mromanello · 2018-06-08T10:06:08Z

I'd like to add a further use case, coming from a citation matching perspective which directly derives from what I'm doing with the CTS API via Capitains resolvers to build HuCit a knowledge base of classical texts and citable text units.

as a citation matching system, I want to retrieve information about text structures from a DTS collection. Knowing how many hierarchical levels a given text has, and what these are, it's a useful information that can be exploited when resolving ambiguous references.

I give a concrete example of this use case at p. 108 of my PhD dissertation:

hcayless · 2018-06-21T12:31:52Z

I still have some misgivings about this. The example I mentioned in our last meeting was Ovid's Tristia, where you have a general structure of book, poem, line, but Book 2 is a single, almost 600-line poem. You'll note if you go to Book 2 in Perseus, that it doesn't bother to chunk it the way it does (e.g.) the Aeneid Book 1 (despite their similar length).

I understand wanting to tell a client what the levels are, but I'd want to be able to do that in a useful way. As an API client, If I was deciding how to chunk things, I could certainly do it on Book / poem for most of the Tristia, but I'd want to (maybe) do it on Book / 20-30 lines for Book 2.

PonteIneptique · 2018-06-21T12:46:22Z

This becomes more and more complicated right :)
One option for this would be to allow to display schemes

  "dts:citeStructure": [
       {"@value": ["book", "poem", "line"]},
       {"@value": ["book", "line"]}
    ]

But it would definitely start to make things complicated if you have more than - say - 3 or 4 different schemes. Again, if we want to have full details, maybe this would be up to the Navigation endpoint ?

PonteIneptique · 2018-07-19T15:02:19Z

Option I back for next week is #101 (comment)

PonteIneptique · 2018-07-26T14:44:02Z

Action item : do a pull request with comment on top with citeDepth on top of it ?

PonteIneptique · 2018-08-02T15:12:42Z

Fixed in #104

PonteIneptique added Implementation Detail Collection Endpoint Issues that deal with the Collection Endpoint labels May 17, 2018

PonteIneptique added a commit to Capitains/MyCapytain that referenced this issue May 17, 2018

Starting implementation of DTS Citation parsing

7c48754

Opened an issue regarding this in distributed-text-services/specifications#101

hcayless mentioned this issue Jun 26, 2018

Citation Model and TEI #110

Closed

hcayless mentioned this issue Jul 20, 2018

Table of Contents Data #112

Closed

PonteIneptique closed this as completed Aug 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the citation model for DTS Resource ? #101

What's the citation model for DTS Resource ? #101

PonteIneptique commented May 17, 2018 •

edited

Loading

PonteIneptique commented May 17, 2018 •

edited

Loading

balmas commented May 17, 2018

PonteIneptique commented May 18, 2018

emmamorlock commented May 18, 2018 •

edited

Loading

PonteIneptique commented May 18, 2018

hcayless commented May 18, 2018

balmas commented May 18, 2018 •

edited

Loading

PonteIneptique commented May 18, 2018

hcayless commented May 18, 2018

PonteIneptique commented May 18, 2018 via email •

edited

Loading

hcayless commented May 21, 2018

PonteIneptique commented May 21, 2018

jonathanrobie commented Jun 7, 2018 •

edited

Loading

PonteIneptique commented Jun 7, 2018 •

edited

Loading

jonathanrobie commented Jun 7, 2018 •

edited

Loading

PonteIneptique commented Jun 7, 2018

PonteIneptique commented Jun 7, 2018

mromanello commented Jun 8, 2018 •

edited

Loading

hcayless commented Jun 21, 2018

PonteIneptique commented Jun 21, 2018 •

edited

Loading

PonteIneptique commented Jul 19, 2018

PonteIneptique commented Jul 26, 2018

PonteIneptique commented Aug 2, 2018

What's the citation model for DTS Resource ? #101

What's the citation model for DTS Resource ? #101

Comments

PonteIneptique commented May 17, 2018 • edited Loading

PonteIneptique commented May 17, 2018 • edited Loading

balmas commented May 17, 2018

PonteIneptique commented May 18, 2018

emmamorlock commented May 18, 2018 • edited Loading

PonteIneptique commented May 18, 2018

hcayless commented May 18, 2018

balmas commented May 18, 2018 • edited Loading

PonteIneptique commented May 18, 2018

hcayless commented May 18, 2018

PonteIneptique commented May 18, 2018 via email • edited Loading

hcayless commented May 21, 2018

PonteIneptique commented May 21, 2018

jonathanrobie commented Jun 7, 2018 • edited Loading

PonteIneptique commented Jun 7, 2018 • edited Loading

jonathanrobie commented Jun 7, 2018 • edited Loading

PonteIneptique commented Jun 7, 2018

PonteIneptique commented Jun 7, 2018

mromanello commented Jun 8, 2018 • edited Loading

hcayless commented Jun 21, 2018

PonteIneptique commented Jun 21, 2018 • edited Loading

PonteIneptique commented Jul 19, 2018

PonteIneptique commented Jul 26, 2018

PonteIneptique commented Aug 2, 2018

PonteIneptique commented May 17, 2018 •

edited

Loading

PonteIneptique commented May 17, 2018 •

edited

Loading

emmamorlock commented May 18, 2018 •

edited

Loading

balmas commented May 18, 2018 •

edited

Loading

PonteIneptique commented May 18, 2018 via email •

edited

Loading

jonathanrobie commented Jun 7, 2018 •

edited

Loading

PonteIneptique commented Jun 7, 2018 •

edited

Loading

jonathanrobie commented Jun 7, 2018 •

edited

Loading

mromanello commented Jun 8, 2018 •

edited

Loading

PonteIneptique commented Jun 21, 2018 •

edited

Loading