-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
193 lines (158 loc) · 7.39 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
Diplomacy Betrayal Dataset
==========================
This dataset contains a collection of interaction sequences between allies in
online Diplomacy [1] games. A sequence consists of consecutive game seasons
during which the two players exchange messages and help each other in the game.
Half of the sequences end with betrayal, while the other half are part of
lasting friendships.
URL: http://vene.ro/betrayal
License: Open Data Commons Attribution (ODC-By 1.0)
Summary: http://opendatacommons.org/licenses/by/summary/
See LICENSE.txt for more details.
Authors: Vlad Niculae <[email protected]>
Srijan Kumar <[email protected]>
Jordan Boyd-Graber <[email protected]>
Cristian Danescu-Niculescu-Mizil <[email protected]>
Version: 1.0 (02/23/2016)
The dataset is further described in our paper:
Vlad Niculae, Srijan Kumar, Jordan Boyd-Graber
and Cristian Danescu-Niculescu-Mizil.
Linguistic harbingers of betrayal: A case study on an online strategy game.
In: Proceedings of ACL, 2015.
Description
-----------
Diplomacy [1] is a popular and engaging strategic board game that is often
played online [2, 3]. It is based heavily on communication between the
players. Due to its military domination setting, Diplomacy is a well suited
environment for studying naturally occurring betrayal and deception.
From a collection of Diplomacy game logs, we identified and extracted *ongoing,
established, and reciprocal* friendships: relationships that contain at least
two consecutive and reciprocated acts of support that span at least three
seasons in game time, with no more than five seasons passing between two acts
of friendship.
We then identified 250 *betrayals*: the subset of friendships described above
that are followed by at least two attacks. To match each betrayal, we selected
a friendship that is not followed by any offensive action, but is otherwise
nearly identical (in terms of length and relative time within the game).
The current dataset consists of these selected betrayals and friendships only.
Each relationship contains a sequence of seasons. Within each season, we
provide features extracted from the messages sent by each player. More
details about the message representation are available in the "Data format"
section of this README.
Files
-----
* diplomacy_data.json - a JSON file containing the dataset;
* imbalance_plot.py - a sample Python script to reproduce one of the
images from the website (http://vene.ro/betrayal/imba.png);
* README.txt - this readme;
* LICENSE.txt - full text of the ODC-BY-1.0 license.
If redistributing any subset of the dataset, please include the README.txt
and LICENSE.txt files.
Data format
-----------
The dataset is a UTF-8 encoded JSON file:
>>> import json
>>> from io import open
>>> with open("diplomacy_data.json", "r") as f:
... diplomacy = json.load(f)
...
It is structured as a list of dictionaries, one for each of the 500 sequences.
>>> len(diplomacy)
500
This is an example of one such entry, with the fields explained:
>>> entry = diplomacy[0]
>>> entry
{
'idx': 0, # unique identifier of the dataset entry
'game': 74, # unique identifier of the game it comes from
'betrayal': True, # whether the friendship ended in betrayal
'people': u'AT', # the countries represented by the two players
# (in this case, Austria and Turkey)
'seasons': ...
}
The 'seasons' field is again a list of dictionaries, one for each game season
in the friendship sequence. In the example below, there are 8 seasons, each
identified by the game year. Decimal notation is used to denote the season in
each year. For example, 1906.0 is the spring of 1906 and 1906.5 is the fall of
1906. Each season is also marked with what interaction the two players have
at the end of the discussion: whether the players supported one another
('support'), attacked one another ('attack'), or did not have explicit military
interactions (null).
>>> seasons = entry['seasons']
>>> len(seasons)
8
>>> seasons[0]
{
'season': 1906.5, # fall of the year 1906 (game time)
'interaction': {
'victim': u'support', # the victim supported the betrayer
'betrayer': u'support' # the betrayer supported the victim
},
'messages': {
'victim': ...,
'betrayer': ...
}
}
The ['messages']['victim'] and ['messages']['betrayer'] fields are lists of
features of each message sent by the victim to the betrayer, and by the
betrayer to the victim, respectively:
>>> msgs = seasons[0]['messages']['betrayer']
>>> len(msgs)
6
>>> msgs[0]
{
"n_words": 146, # number of words in the message
"n_sentences": 9, # number of sentences in the message
"n_requests": 7, # number of request sentences
"politeness": 0.8320, # politeness of the requests (from 0 to 1)
# (using the Stanford Politeness
# Classifier available at [4])
"sentiment": {
"positive": 1, # no. sentences with positive sentiment
"neutral": 3, # " " neutral sentiment
"negative": 5 # " " negative sentiment
}, # (using Stanford Sentiment Analysis [5])
"lexicon_words": { # words and phrases matching several
"disc_expansion": [ # linguistic and psycholinguistic lexicons
"until", # (see below for details)
"yet",
"instead"
],
"premise": [
"for",
"for"
],
...
},
"frequent_words": [ # frequent words in the message
"more", # (occurring in at least 50 messages
"let", # and 5 friendships overall)
"keep",
"...
]
}
The words in each list are in random order. The order of messages within a
season is also randomized. This measure is in place to protect the privacy of
the players and of their conversations.
The lexicons used to construct the "lexicon_words" field are:
* 'claim', 'premise': Argumentation structure markers [6]
* 'allsubj': Subjective markers [7]
* 'disc_*': Discourse markers from the Penn Discourse Treebank. [8]
Includes 'disc_comparison', 'disc_expansion', 'disc_contingency',
'disc_temporal_future' and 'disc_temporal_rest' (we manually split
'temporal' from PDT into 'temporal_future' and 'temporal_rest' to
capture planning).
References
----------
[1] https://en.wikipedia.org/wiki/Diplomacy_%28game%29
[2] http://www.floc.net/dpjudge/
[3] http://usak.asciiking.com/
[4] http://politeness.mpi-sws.org/
[5] http://nlp.stanford.edu/sentiment/
[6] C. Stab and I. Gurevych. Identifying Argumentative Discourse Structures in
Persuasive Essays. In: Proceedings of EMNLP, 2014.
https://www.ukp.tu-darmstadt.de/data/argumentation-mining/
[7] E. Riloff and J. Wiebe. Learning extraction patterns for subjective
expressions. In: Proceedings of EMNLP, 2003.
http://www.anthology.aclweb.org/W/W03/W03-1014.pdf
[8] https://www.seas.upenn.edu/~pdtb/