Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new guidelines for dates #575

Open
1 task done
nschneid opened this issue Feb 8, 2025 · 8 comments
Open
1 task done

Implement new guidelines for dates #575

nschneid opened this issue Feb 8, 2025 · 8 comments

Comments

@nschneid
Copy link
Contributor

nschneid commented Feb 8, 2025

https://universaldependencies.org/en/dep/nmod-unmarked.html#dates

Image

Queries:

@nschneid
Copy link
Contributor Author

nschneid commented Feb 8, 2025

Grew rule to deal with M+D+Y dates:

% change date to head, month/year as modifiers
rule r {
  pattern {
    M.lemma=/.*(January|February|March|April|May|June|July|August|September|October|November|December).*/;
    D.lemma=/^\d\d?$/;
    e: M -[nummod]-> D ; 
    e2: M -[E:nummod]-> D ; 
    y: M -[nummod]-> Y; 
    y2: M -[E:nummod]-> Y; 
}
  commands { 
    del_edge e;
    del_edge e2;
    del_edge y;
    del_edge y2;
    
    add_edge D -[nmod:unmarked]-> M;
    add_edge D -[E:nmod:unmarked]-> M;
    add_edge D -[nmod:unmarked]-> Y;
    add_edge D -[E:nmod:unmarked]-> Y;
    
    shift_in M ==> D;
    shift_in Y ==> D;
    
    shift_out M ==> D;
    shift_out Y ==> D;
  }
}

@nschneid
Copy link
Contributor Author

nschneid commented Feb 9, 2025

How to handle "5:00-8:30 pm"? Currently "pm" is the head. Intuitively "pm" modifies both parts of the range (it distributes; cf. "fresh apples and pears" = fresh applies and fresh pears). With coordination we do this in edeps. Should we do it with range-PPs as well? @amir-zeldes

@nschneid
Copy link
Contributor Author

nschneid commented Feb 9, 2025

Another issue is what to do about "Date, Time" collocations. I had been thinking that the time should be the head as it is more specific. But I am seeing some of these with "on", compatible with the date, whereas the time by itself would license "at".

  • rescheduled for Friday, Sept. 17 at 9 am
    • I take it the date + time combination is a single phrase marked by "for", and the date heads the time. As the day of the week occurs first, that stands for the date: nmod(Friday, 9)
  • Peter Percell <[email protected]> on 04/30/2001 11:16:58 AM
    • "on", and sort of an implicit "at" before the time, so again I think it makes sense to treat the date as head of the time
  • WHEN: JUNE 12, 2005, 2:00 P.M. TO 2:00 A.M.
    • "on", not "in" or "at" can be inserted before "June" (this is part of the argument for why the date and not the month is the head)
  • the 11/2 8:30 a.m. meeting
    • attributive position so no preposition can be inserted before the date, but again feels like date-refined-with-time
  • 01/26/2001 11:36 AM
    • a lot of "sentences" of this form occur in the Enron emails dataset, so we need a convention
  • Date: Thu, 11 Mar 2004 02:39:27 -0800
  • Posted by Hidden Nook to Hidden Nook at 2/7/2005 01:09:32 AM
    • here it is "at", suggesting time is the head. This is the only one I'm finding though of "at" + Date + Time (and it is probably computer-generated).
  • Game A: Saturday, May 12 @ 7:30PM
    • explicit preposition on the time. Headed by the date, or sister to the date?

nschneid added a commit that referenced this issue Feb 10, 2025
nschneid added a commit that referenced this issue Feb 10, 2025
nschneid added a commit that referenced this issue Feb 10, 2025
nschneid added a commit that referenced this issue Feb 10, 2025
nschneid added a commit that referenced this issue Feb 10, 2025
…es where month was attaching to date as compound
@amir-zeldes
Copy link
Contributor

Should we do it with range-PPs as well?

I don't mind the multiple edeps, why not? In basic deps, formally the head is the first number, so that's where pm should attach IMO.

01/26/2001 11:36 AM

If that's the only linguistic material you could also consider list for these cases (for the rest I agree viewing it as a phrase with subordination makes more sense)

explicit preposition on the time. Headed by the date, or sister to the date?

This particular case feels like a sister, but admittedly these are all corner cases for considering them a kind of implicit subordination

@nschneid
Copy link
Contributor Author

Yeah I guess the ones that are machine-readable formats should be list.

nschneid added a commit that referenced this issue Feb 12, 2025
@nschneid
Copy link
Contributor Author

Main rules used for the above (also did manual cleanup of some noisy cases)

% "a.m."/"p.m." as modifiers
rule r1 {
  pattern {
    M[lemma="a.m."|"p.m."];
    e: M -[nummod]-> T ; 
    e2: M -[E:nummod]-> T ; 
}
  commands { 
    del_edge e;
    del_edge e2;
    
    add_edge T -[nmod:unmarked]-> M;
    add_edge T -[E:nmod:unmarked]-> M;
    
    shift_in M ==> T;
    
    shift_out M ==> T;
  }
}

% in a range of times, time zone modifiers should attach to first item in basic deps, both in enhanced deps
rule r2 {
  pattern {
    M1[lemma="a.m."|"p.m."];
    M2[lemma="a.m."|"p.m."];
    e1: T-[nmod:unmarked]->M1;
    e1e: T-[E:nmod:unmarked]->M1;
    e2: T2-[nmod:unmarked]->M2;
    e2e: T2-[E:nmod:unmarked]->M2;
    e2x: T2-[nmod:unmarked]->X;
    T-[nmod]->T2; % e.g. 3:00 p.m. - 4:00 p.m. EST
}
  commands { 
    del_edge e2x;
    % keep the enhanced edge
    
    add_edge T -[nmod:unmarked]-> X;
    add_edge T -[E:nmod:unmarked]-> X;
  }
}

% a.m./p.m. after a range of times should attach to first one in basic deps, both in enhanced deps
rule r3 {
  pattern {
    M2[lemma="a.m."|"p.m."];
    e2: T2-[nmod:unmarked]->M2;
    e2e: T2-[E:nmod:unmarked]->M2;
    T-[nmod]->T2; % e.g. 3:00 - 4:00 p.m.
    T2-[case]->P; P[lemma="to"|"-"];
} without {
    M1[lemma="a.m."|"p.m."]; T-[nmod:unmarked]->M1;
}
  commands { 
    del_edge e2;
    % keep the enhanced edge
    
    add_edge T -[nmod:unmarked]-> M2;
    add_edge T -[E:nmod:unmarked]-> M2;
  }
}

% machine-readable date+time collocations should be connected as `list`
rule r4 {
    pattern {
      D[upos=NUM]; D.lemma=/\d\d\/\d\d\/\d\d(\d\d)?/;
      T[upos=NUM]; T.lemma=/.*:.*/;
      D < T;
      e: D-[nmod:unmarked]-> T;
      e2: D-[E:nmod:unmarked]->T;
    }
    commands { 
      del_edge e;
      del_edge e2;
      add_edge D-[list]->T;
      add_edge D-[E:list]->T;
    }
}

@nschneid
Copy link
Contributor Author

I think I'm done with the dates/times in EWT! If anyone wants to do some spot checks: https://universal.grew.fr/?custom=67ac098e07e79

@nschneid
Copy link
Contributor Author

nschneid commented Feb 12, 2025

How to tag ordinal numerals in dates ("January 3rd")?

→ moved to #267

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants