Back to Nlp Progress

Temporal Processing

english/temporal_processing.md

0.37.6 KB
Original Source

Temporal Processing

Document Dating (Time-stamping)

Document Dating is the problem of automatically predicting the date of a document based on its content. Date of a document, also referred to as the Document Creation Time (DCT), is at the core of many important tasks, such as, information retrieval, temporal reasoning, text summarization, event detection, and analysis of historical text, among others.

For example, in the following document, the correct creation year is 1999. This can be inferred by the presence of terms 1995 and Four years after.

Swiss adopted that form of taxation in 1995. The concession was approved by the govt last September. Four years after, the IOC….

Datasets

Datasets# DocsStart YearEnd Year
APW675k19952010
NYT647k19871996

Comparison on year level granularity:

APW DatasetNYT DatasetPaper/Source
NeuralDater (Vashishth et. al, 2018)64.158.9Document Dating using Graph Convolution Networks
Chambers (2012)52.542.3Labeling Documents with Timestamps: Learning from their Time Expressions
BurstySimDater (Kotsakos et. al, 2014)45.938.5A Burstiness-aware Approach for Document Dating

Temporal Information Extraction

Temporal information extraction is the identification of chunks/tokens corresponding to temporal intervals, and the extraction and determination of the temporal relations between those. The entities extracted may be temporal expressions (timexes), eventualities (events), or auxiliary signals that support the interpretation of an entity or relation. Relations may be temporal links (tlinks), describing the order of events and times, or subordinate links (slinks) describing modality and other subordinative activity, or aspectual links (alinks) around the various influences aspectuality has on event structure.

The markup scheme used for temporal information extraction is well-described in the ISO-TimeML standard, and also on www.timeml.org.

<?xml version="1.0" ?>

<TimeML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://timeml.org/timeMLdocs/TimeML_1.2.1.xsd">
<TEXT>


 PRI20001020.2000.0127 
 NEWS STORY 
 <TIMEX3 tid="t0" type="TIME" value="2000-10-20T20:02:07.85">10/20/2000 20:02:07.85</TIMEX3> 


 The Navy has changed its account of the attack on the USS Cole in Yemen.
 Officials <TIMEX3 tid="t1" type="DATE" value="PRESENT_REF" temporalFunction="true" anchorTimeID="t0">now</TIMEX3> say the ship was hit <TIMEX3 tid="t2" type="DURATION" value="PT2H">nearly two hours </TIMEX3>after it had docked.
 Initially the Navy said the explosion occurred while several boats were helping
 the ship to tie up. The change raises new questions about how the attackers
 were able to get past the Navy security.


 <TIMEX3 tid="t3" type="TIME" value="2000-10-20T20:02:28.05">10/20/2000 20:02:28.05</TIMEX3> 



<TLINK timeID="t2" relatedToTime="t0" relType="BEFORE"/>
</TEXT>
</TimeML>

To avoid leaking knowledge about temporal structure, train, dev and test splits must be made at document level for temporal information extraction.

TimeBank

TimeBank, based on the TIMEX3 standard embedded in ISO-TimeML, is a benchmark corpus containing 64K tokens of English newswire, and annotated for all asepcts of ISO-TimeML - including temporal expressions. TimeBank is freely distributed by the LDC: TimeBank 1.2

Evaluation is for both entity chunking and attribute annotation, as well as temporal relation accuracy, typically measured with F1 -- although this metric is not sensitive to inconsistencies or free wins from interval logic induction over the whole set.

ModelF1 scorePaper / Source
Catena0.511CATENA: CAusal and TEmporal relation extraction from NAtural language texts
CAEVO0.507Dense Event Ordering with a Multi-Pass Architecture

TempEval-3

The TempEval-3 corpus accompanied the shared TempEval-3 SemEval task in 2013. This uses a timelines-based metric to assess temporal relation structure. The corpus is fresh and somewhat more varied than TimeBank, though markedly smaller. TempEval-3 data

ModelTemporal awarenessPaper / Source
Ning et al.67.2A Structured Learning Approach to Temporal Relation Extraction
ClearTK30.98Cleartk-timeml: A minimalist approach to tempeval 2013

Timex normalisation

Temporal expression normalisation is the grounding of a lexicalisation of a time to a calendar date or other formal temporal representation.

Example: <TIMEX3 tid="t0" type="TIME" value="2000-10-18T21:01:00.65">10/18/2000 21:01:00.65</TIMEX3> Dozens of Palestinians were wounded in scattered clashes in the West Bank and Gaza Strip, <TIMEX3 tid="t1" type="DATE" value="2000-10-18" temporalFunction="true" anchorTimeID="t0">Wednesday</TIMEX3>, despite the Sharm el-Sheikh truce accord.

Chuck Rich reports on entertainment <TIMEX3 tid="t11" type="SET" value="XXXX-WXX-7">every Saturday</TIMEX3>

TimeBank

TimeBank, based on the TIMEX3 standard embedded in ISO-TimeML, is a benchmark corpus containing 64K tokens of English newswire, and annotated for all asepcts of ISO-TimeML - including temporal expressions. TimeBank is freely distributed by the LDC: TimeBank 1.2

ModelF1 scorePaper / Source
TIMEN0.89TIMEN: An Open Temporal Expression Normalisation Resource
HeidelTime0.876A baseline temporal tagger for all languages

PNT

The Parsing Time Normalizations corpus in SCATE format allows the representation of a wider variety of time expressions than previous approaches. This corpus was release with SemEval 2018 Task 6.

ModelF1 scorePaper / Source
Laparra et al. 20180.764From Characters to Time Intervals: New Paradigms for Evaluation and Neural Parsing of Time Normalizations
HeidelTime0.74A baseline temporal tagger for all languages
Chrono0.70Chrono at SemEval-2018 task 6: A system for normalizing temporal expressions

Go back to the README