Ashwin Paranjape

I'm the Founding AI Lead at Samaya AI; we build general purpose agents that empower help domain experts reason over large sets of knowledge-intensive documents and perform complex workflows. I help define and execute our company's AI strategy, which starts from research ideas that are 6-12 months ahead of the curve and culminates in valuable applications for our users. I'm broadly interested in reasoning, retrieval, agentic systems and RL. We're a fun team of competent and kind people and we are always looking for amazing people (like you). Send me an email if you're interested.

Prior to Samaya, I graduated from Stanford University with a PhD in CS. I was advised by Prof. Christopher Manning in the NLP group. My research focus was broadly in open-domain dialogue systems and using retrieval to generate language. More specifically I created informative dialogue systems based on an understanding how humans talk informatively by training neural retrievers to find passages containing world knowledge and then paraphrasing it into conversational utterances. I also like to think about speech interfaces of the future, where humans and virtual agents take turns more naturally: with backchannels, interjections and clarification questions.

In 2020, I co-led Stanford's team in the Alexa Prize Socialbot competition. It was great fun leading a team of 10 people with weekly sprints, 1-1s, research discussions and a reading group. That was our first year participating and our bot Chirpy Cardinal that we built from the ground up, placed 2nd^{a b c} out of 10 teams! All of our code is open-sourced and here's a live demo. All the real-life issues that couldn't be fixed with existing methods became my research agenda. I was a research mentor for the next iteration and our team went on to publish 5 peer-reviewed conference papers and 2 technical articles (find them here).

Before the PhD, I got master's degree in CS(AI) at Stanford broadening my understanding with a variety of courses and conducting research with Prof. Jure Leskovec, Robert West and Austin Benson. For my bachelor's degree, I was at Indian Institute of Technology Bombay and majored in Computer Science and Engineering with a minor degree in Electrical Engineering.

Presented at ICLR 2022

Selected Publications

Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
Orion Weller, Benjamin Van Durme, Dawn Lawrie, Ashwin Paranjape, Yuhao Zhang, Jack Hessel
ICLR 2025
[paper]

[expand]

Abstract:

Instruction-tuned language models (LM) are able to respond to imperative commands, providing a more natural user interface compared to their base counterparts. In this work, we present Promptriever, the first retrieval model able to be prompted like an LM. To train Promptriever, we curate and release a new instance-level instruction training set from MS MARCO, spanning nearly 500k instances. Promptriever not only achieves strong performance on standard retrieval tasks, but also follows instructions. We observe: (1) large gains (reaching SoTA) on following detailed relevance instructions (+14.3 p-MRR / +3.1 nDCG on FollowIR), (2) significantly increased robustness to lexical choices/phrasing in the query+instruction (+12.9 Robustness@10 on InstructIR), and (3) the ability to perform hyper-parameter search via prompting to reliably improve retrieval performance (+1.4 average increase on BEIR). Promptriever demonstrates that retrieval models can be controlled with prompts on a per-query basis, setting the stage for future work aligning LM prompting techniques with information retrieval.

Lost in the middle: How language models use long contexts
Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang...
TACL 2024
[paper]

[expand]

Abstract:

While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context. We analyze the performance of language models on two tasks that require identifying relevant information in their input contexts: multi-document question answering and key-value retrieval. We find that performance can degrade significantly when changing the position of relevant information, indicating that current language models do not robustly make use of information in long input contexts. In particular, we observe that performance is often highest when relevant information occurs at the beginning or end of the input context, and significantly degrades when models must access relevant information in the middle of long contexts, even for explicitly long-context models. Our analysis provides a better understanding of how language models use their input context and provides new evaluation protocols for future long-context language models.

Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent
Ethan A. Chi, Ashwin Paranjape, Abigail See, Caleb Chiam, Trenton Chang, Kathleen Kenealy, Swee Kiat...
SIGDIAL 2022
[paper] [talk]

[expand]

Abstract:

We present Chirpy Cardinal, an open-domain social chatbot. Aiming to be both informative and conversational, our bot chats with users in an authentic, emotionally intelligent way. By integrating controlled neural generation with scaffolded, hand-written dialogue, we let both the user and bot take turns driving the conversation, producing an engaging and socially fluent experience. Deployed in the fourth iteration of the Alexa Prize Socialbot Grand Challenge, Chirpy Cardinal handled thousands of conversations per day, placing second out of nine bots with an average user rating of 3.58/5.

When can I Speak? Predicting initiation points for spoken dialogue agents
Siyan Li, Ashwin Paranjape, Christopher Manning
SIGDIAL 2022
[paper] [talk] [code]

[expand]

Abstract:

Current spoken dialogue systems initiate their turns after a long period of silence (700-1000ms), which leads to little real-time feedback, sluggish responses, and an overall stilted conversational flow. Humans typically respond within 200ms and successfully predicting initiation points in advance would allow spoken dialogue agents to do the same. In this work, we predict the lead-time to initiation using prosodic features from a pre-trained speech representation model (wav2vec 1.0) operating on user audio and word features from a pre-trained language model (GPT-2) operating on incremental transcriptions. To evaluate errors, we propose two metrics w.r.t. predicted and true lead times. We train and evaluate the models on the Switchboard Corpus and find that our method outperforms features from prior work on both metrics and vastly outperforms the common approach of waiting for 700ms of silence.

Hindsight: Posterior-guided training of retrievers for improved open-ended generation
Ashwin Paranjape, Omar Khattab, Christopher Potts, Matei Zaharia, Christopher D. Manning
ICLR 2022
[paper]

[expand]

Abstract:

Many text generation systems benefit from retrieving passages from a textual knowledge corpus (e.g., Wikipedia) and using them to generate the output. For open-ended generation tasks, like generating informative utterances in conversations, many varied passages z are relevant to the context x but few are relevant to the observed next utterance y (label). For such tasks, existing methods (that jointly train the retriever and generator) underperform: during training the top-k context-relevant retrieved passages might not contain the label-relevant passage and the generator may hence not learn a preference to ground its generated output in them. We propose using an additional guide-retriever that also conditions on the observed label y and “in hindsight” retrieves label-relevant passages during training. We maximize the evidence lower bound (ELBo) to jointly train the guide-retriever Q(z|x,y) with the standard retriever P_η(z|x) and the generator P_θ (y|x,z) and find that ELBo has better inductive biases than prior work. For informative conversations from the Wizard of Wikipedia dataset, with our posterior-guided training, the retriever finds passages with higher relevance in the top-10 (23% relative improvement), the generator’s responses are more grounded in the retrieved passage (19% relative improvement) and the end-to-end system produces better overall output (6.4% relative improvement).

Human-like informative conversations via conditional mutual information
Ashwin Paranjape and Christopher D. Manning
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
[paper]

[expand]

Abstract:

The goal of this work is to build a dialogue agent that can weave new factual content into conversations as naturally as humans. We draw insights from linguistic principles of conversational analysis and annotate human-human conversations from the Switchboard Dialog Act Corpus, examinining how humans apply strategies for acknowledgement, transition, detail selection and presentation. However, when current chatbots (explicitly provided with new factual content) introduce facts in a conversation, their generated responses do not acknowledge the prior turns. This is because, while current methods are trained with two contexts, new factual content and conversational history, we show that their generated responses are not simultaneously specific to both the contexts and in particular, lack specificity w.r.t conversational history. We propose using pointwise conditional mutual information (pcmi) to measure specificity w.r.t. conversational history. We show that responses that have a higher pcmi_h are judged by human evaluators to be better at acknowledgement 74% of the time. To show its utility in improving overall quality, we compare baseline responses that maximize pointwise mutual information (Max. PMI) with our alternative responses (Fused-PCMI) that trade off pmi for pcmi_h and find that human evaluators prefer Fused-PCMI 60% of the time.

Neural Generation Meets Real People: Towards Emotionally Engaging Mixed-Initiative Conversations
Ashwin Paranjape*, Abigail See*, Kathleen Kenealy, Haojun Li, Amelia Hardy, Peng Qi, Kaushik Ram Sadagopan,...
Alexa Prize Proceedings 2020
[paper] [supplementary]

[expand]

Abstract:

We present Chirpy Cardinal, an open-domain dialogue agent, as a research plat- form for the 2019 Alexa Prize competition. Building an open-domain socialbot that talks to real people is challenging – such a system must meet multiple user expectations such as broad world knowledge, conversational style, and emotional connection. Our socialbot engages users on their terms – prioritizing their interests, feelings and autonomy. As a result, our socialbot provides a responsive, person- alized user experience, capable of talking knowledgeably about a wide variety of topics, as well as chatting empathetically about ordinary life. Neural generation plays a key role in achieving these goals, providing the backbone for our con- versational and emotional tone. At the end of the competition, Chirpy Cardinal progressed to the finals with an average rating of 3.6/5.0, a median conversation duration of 2 minutes 16 seconds, and a 90th percentile duration of over 12 minutes.

Motifs in Temporal Networks.
Ashwin Paranjape*, Austin Benson*, Jure Leskovec
Tenth ACM International Conference on Web Search and Data Mining (WSDM), 2017.
[paper] [poster] [code] [data]

[expand]

Abstract:

Networks are a fundamental tool for modeling complex systems in a variety of domains including social and communication networks as well as biology and neuroscience. Small subgraph patterns in networks, called network motifs, are crucial to understanding the structure and function of these systems. However, the role of network motifs in temporal networks, which contain many timestamped links between the nodes, is not yet well understood. Here we develop a notion of a temporal network motif as an elementary unit of temporal networks and provide a general methodology for counting such motifs. We define temporal network motifs as induced subgraphs on sequences of temporal edges, design fast algorithms for counting temporal motifs, and prove their runtime complexity. Our fast algorithms achieve up to 56.5x speedup compared to a baseline method. Furthermore, we use our algorithms to count temporal motifs in a variety of networks. Results show that networks from different domains have significantly different motif counts, whereas networks from the same domain tend to have similar motif counts. We also find that different motifs occur at different time scales, which provides further insights into structure and function of temporal networks.

* implies equal contribution