Publications
[expand]
Abstract:
We present Chirpy Cardinal, an open-domain social chatbot. Aiming to be both informative and conversational, our bot chats with users in an authentic, emotionally intelligent way. By integrating controlled neural generation with scaffolded, hand-written dialogue, we let both the user and bot take turns driving the conversation, producing an engaging and socially fluent experience. Deployed in the fourth iteration of the Alexa Prize Socialbot Grand Challenge, Chirpy Cardinal handled thousands of conversations per day, placing second out of nine bots with an average user rating of 3.58/5.
[expand]
Abstract:
Current spoken dialogue systems initiate their turns after a long period of silence (700-1000ms), which leads to little real-time feedback, sluggish responses, and an overall stilted conversational flow. Humans typically respond within 200ms and successfully predicting initiation points in advance would allow spoken dialogue agents to do the same. In this work, we predict the lead-time to initiation using prosodic features from a pre-trained speech representation model (wav2vec 1.0) operating on user audio and word features from a pre-trained language model (GPT-2) operating on incremental transcriptions. To evaluate errors, we propose two metrics w.r.t. predicted and true lead times. We train and evaluate the models on the Switchboard Corpus and find that our method outperforms features from prior work on both metrics and vastly outperforms the common approach of waiting for 700ms of silence.
[expand]
Abstract:
Many text generation systems benefit from retrieving passages from a textual knowledge corpus (e.g., Wikipedia) and using them to generate the output. For open-ended generation tasks, like generating informative utterances in conversations, many varied passages z are relevant to the context x but few are relevant to the observed next utterance y (label). For such tasks, existing methods (that jointly train the retriever and generator) underperform: during training the top-k context-relevant retrieved passages might not contain the label-relevant passage and the generator may hence not learn a preference to ground its generated output in them. We propose using an additional guide-retriever that also conditions on the observed label y and “in hindsight” retrieves label-relevant passages during training. We maximize the evidence lower bound (ELBo) to jointly train the guide-retriever Q(z|x,y) with the standard retriever P_η(z|x) and the generator P_θ (y|x,z) and find that ELBo has better inductive biases than prior work. For informative conversations from the Wizard of Wikipedia dataset, with our posterior-guided training, the retriever finds passages with higher relevance in the top-10 (23% relative improvement), the generator’s responses are more grounded in the retrieved passage (19% relative improvement) and the end-to-end system produces better overall output (6.4% relative improvement).
[expand]
Abstract:
In this paper, we present the second iteration of Chirpy Cardinal, an open-domain dialogue agent developed for the Alexa Prize SGC4 competition. Building on the success of the SGC3 Chirpy, we focus on improving conversational flexibility, initiative, and coherence. We introduce a variety of methods for controllable neural generation, ranging from prefix-based neural decoding over a symbolic scaffolding, to pure neural modules, to a novel hybrid infilling-based method that combines the best of both worlds. Additionally, we enhance previous news, music and movies modules with new APIs, as well as make major improvements in entity linking, topical transitions, and latency. Finally, we expand the variety of responses via new modules that focus on personal issues, sports, food, and even extraterrestrial life! These components come together to create a refreshed Chirpy Cardinal that is able to initiate conversations filled with interesting facts, engaging topics, and heartfelt responses.
[expand]
Abstract:
Many existing chatbots do not effectively support mixed initiative, forcing their users to either respond passively or lead constantly. We seek to improve this experience by introducing new mechanisms to encourage user initiative in social chatbot conversations. Since user initiative in this setting is distinct from initiative in human-human or task-oriented dialogue, we first propose a new definition that accounts for the unique behaviors users take in this context. Drawing from linguistics, we propose three mechanisms to promote user initiative: back-channeling, personal disclosure, and replacing questions with statements. We show that simple automatic metrics of utterance length, number of noun phrases, and diversity of user responses correlate with human judgement of initiative. Finally, we use these metrics to suggest that these strategies do result in statistically significant increases in user initiative, where frequent, but not excessive, back-channeling is the most effective strategy.
[expand]
Abstract:
The goal of this work is to build a dialogue agent that can weave new factual content into conversations as naturally as humans. We draw insights from linguistic principles of conversational analysis and annotate human-human conversations from the Switchboard Dialog Act Corpus, examinining how humans apply strategies for acknowledgement, transition, detail selection and presentation. However, when current chatbots (explicitly provided with new factual content) introduce facts in a conversation, their generated responses do not acknowledge the prior turns. This is because, while current methods are trained with two contexts, new factual content and conversational history, we show that their generated responses are not simultaneously specific to both the contexts and in particular, lack specificity w.r.t conversational history. We propose using pointwise conditional mutual information (pcmi) to measure specificity w.r.t. conversational history. We show that responses that have a higher pcmi_h are judged by human evaluators to be better at acknowledgement 74% of the time. To show its utility in improving overall quality, we compare baseline responses that maximize pointwise mutual information (Max. PMI) with our alternative responses (Fused-PCMI) that trade off pmi for pcmi_h and find that human evaluators prefer Fused-PCMI 60% of the time.
[expand]
Abstract:
We present Chirpy Cardinal, an open-domain dialogue agent, as a research plat- form for the 2019 Alexa Prize competition. Building an open-domain socialbot that talks to real people is challenging – such a system must meet multiple user expectations such as broad world knowledge, conversational style, and emotional connection. Our socialbot engages users on their terms – prioritizing their interests, feelings and autonomy. As a result, our socialbot provides a responsive, person- alized user experience, capable of talking knowledgeably about a wide variety of topics, as well as chatting empathetically about ordinary life. Neural generation plays a key role in achieving these goals, providing the backbone for our con- versational and emotional tone. At the end of the competition, Chirpy Cardinal progressed to the finals with an average rating of 3.6/5.0, a median conversation duration of 2 minutes 16 seconds, and a 90th percentile duration of over 12 minutes.
[expand]
Abstract:
Knowledge base population (KBP) systems take in a large document corpus and extract entities and their relations. Thus far, KBP evaluation has relied on judgements on the pooled predictions of existing systems. We show that this evaluation is problematic: when a new system predicts a previously unseen relation, it is penalized even if it is correct. This leads to significant bias against new systems, which counterproductively discourages innovation in the field. Our first contribution is a new importance-sampling based evaluation which corrects for this bias by annotating a new system’s predictions on-demand via crowdsourcing. We show this eliminates bias and reduces variance using data from the 2015 TAC KBP task. Our second contribution is an implementation of our method made publicly available as an online KBP evaluation service. We pilot the service by testing diverse state-of-the-art systems on the TAC KBP 2016 corpus and obtain accurate scores in a cost effective manner.
[expand]
Abstract:
We describe Stanford’s entries in the TACKBP 2017 Cold Start Knowledge Base Population and Slot Filling challenges.Our biggest contribution is an entirely new Spanish entity detection and relation extraction system for the cross-lingual relation extraction tracks. This new Spanish system is a simple system that uses CRF-based entity recognition supplemented by gazettes followed by several rules-based relation extractors, some using syntactic structure. We make further improvements to our systems for other languages, including improved named entity recognition, a new neural relation extractor, and better support for nested mentions and discussion forum documents. We also experimented with data fusion with entity linking systems from entrants in the TACKBP Entity Discovery and Linking challenge. Under the official 2017 macro-averaged MAP all hops score measure, Stanford’s 2017 English, Chinese, Spanish and cross-lingual submissions achieved overall scores of 0.202, 0.124, 0.123, and 0.073, respectively. Under the macro-averaged LDC-MEAN all hops F1 measure used in previous years, the corresponding scores were 0.254, 0.188, 0.186,and 0.117 respectively.
[expand]
Abstract:
Networks are a fundamental tool for modeling complex systems in a variety of domains including social and communication networks as well as biology and neuroscience. Small subgraph patterns in networks, called network motifs, are crucial to understanding the structure and function of these systems. However, the role of network motifs in temporal networks, which contain many timestamped links between the nodes, is not yet well understood. Here we develop a notion of a temporal network motif as an elementary unit of temporal networks and provide a general methodology for counting such motifs. We define temporal network motifs as induced subgraphs on sequences of temporal edges, design fast algorithms for counting temporal motifs, and prove their runtime complexity. Our fast algorithms achieve up to 56.5x speedup compared to a baseline method. Furthermore, we use our algorithms to count temporal motifs in a variety of networks. Results show that networks from different domains have significantly different motif counts, whereas networks from the same domain tend to have similar motif counts. We also find that different motifs occur at different time scales, which provides further insights into structure and function of temporal networks.
[expand]
Abstract:
We describe Stanford’s entries in the TACKBP 2016 Cold Start Slot Filling and Knowledge Base Population challenge. Our biggest contribution is an entirely new Chinese entity detection and relation extraction system for the new Chinese and cross-lingual relation extraction tracks. This new system consists of several rules-based relation extractors and a distantly supervised extractor. We also analyze errors produced by our existing mature English KBP system, which leads to several fixes, notably improvements to our patterns-based extractor and neural network model, support for nested mentions and inferred relations. Stanford’s 2016 English, Chinese andcross-lingual submissions achieved an over-all (macro-averaged LDC-MEAN) F1 of 22.0,14.2, and 11.2 respectively on the 2016 evaluation data, performing well above the median entries, at 7.5, 13.2 and 8.3 respectively.
[expand]
Abstract:
Good websites should be easy to navigate via hyperlinks, yet main- taining a link structure of high quality is difficult. Identifying pairs of pages that should be linked may be hard for human editors, es- pecially if the site is large and changes are frequent. Further, given a set of useful link candidates, the task of incorporating them into the site can be expensive, since it typically involves humans edit- ing pages. In the light of these challenges, it is desirable to de- velop data-driven methods for partly automating the link placement task. Here we develop an approach for automatically finding useful hyperlinks to add to a website. We show that passively collected server logs, beyond telling us which existing links are useful, also contain implicit signals indicating which nonexistent links would be useful if they were to be introduced. We leverage these signals to model the future usefulness of as yet nonexistent links. Based on our model, we define the problem of link placement under budget constraints and propose an efficient algorithm for solving it. We demonstrate the effectiveness of our approach by evaluating it on Wikipedia, a large website for which we have access to both server logs (used for finding useful new links) and the complete revision history (used as ground truth). As our method is based exclusively on standard server logs, it may also be applied to any other website, as we show at the example of the biomedical research site Simtk.
[expand]
Abstract:
Here we propose a novel approach to identifying missing links in Wikipedia. We build on the fact that the ultimate purpose of Wikipedia links is to aid navigation. Rather than merely suggesting new links that are in tune with the structure of existing links, our method finds missing links that would immediately enhance Wikipedia’s navigability. We leverage data sets of navigation paths collected through a Wikipedia-based human-computation game in which users must find a short path from a start to a target article by only clicking links encountered along the way. We harness human navigational traces to identify a set of candidates for missing links and then rank these candidates. Experiments show that our procedure identifies missing links of high quality
[expand]
Abstract:
Word Sense Disambiguation is a difficult problem to solve in the unsupervised setting. This is because in this setting inference becomes more dependent on the interplay between different senses in the context due to unavailability of learning resources. Using two basic ideas, sense dependency and selective dependency, we model the WSD problem as a Maximum A Posteriori (MAP) Inference Query on a Markov Random Field (MRF) built using WordNet and Link Parser or Stanford Parser. To the best of our knowledge this combination of dependency and MRF is novel, and our graph-based unsupervised WSD system beats state-of-the-art system on SensEval-2, SensEval-3 and SemEval-2007 English all-words datasets while being over 35 times faster.