Published:
Author: Jaimie Patterson
Keyboard keys spelling out "SAY NO".

Interactive search systems can miss crucial cues like the words “not” and “don’t,” leading to sometimes dangerous misunderstandings—but Johns Hopkins computer scientists may have found a solution. Called Rank1, their new information retrieval system is trained to “think” through user queries and available documents before it responds, improving the quality—and quantity—of the information it presents to users.

“Search systems often can’t tell the difference between a regular sentence—‘I like ice cream’—and a negated one—‘I don’t like ice cream,’” explains Orion Weller, a computer science PhD student and first author of the Rank1 study.

For instance, a famous Google search error once omitted the words “do not” from the answer to a medical query, resulting in dangerous advice for people experiencing seizures.

“Our model is the first document-ranking model trained to ‘think’ over the user’s query and the documents it can choose from before giving its final answer,” Weller says. “So it’s really good at following the meaning of what you want, rather than doing simple keyword matching like Google Search—which potentially leaves off key words like ‘no’ or ‘don’t.’”

Using a simple but powerful technique called distillation, the researchers took examples of  a Deepseek-based reasoning language model’s “thought process” and trained Rank1 to imitate these reasoning chains.

As a result, Rank1 is much better at understanding negation, following instructions, and even finding additional sources that other models may have missed, the researchers say.

“I think one of the coolest things is that Rank1 finds different documents than any other retrieval system,” Weller says. “Because Rank1 thinks before it decides if a document is relevant, it finds many more documents that other models don’t.”

Weller worked on the project with a larger team including his advisors Benjamin Van Durme, an associate professor of computer and cognitive science, and Dawn Lawrie, a senior research scientist at the Human Language Technology Center of Excellence (HLTCOE); Kathryn Ricci, a graduate student in the university’s Center for Language and Speech Processing; and HLTCOE researchers Eugene Yang and Andrew Yates.

“As language models and information retrieval systems become more intertwined and regularly used, understanding and improving these failure cases is crucial for both companies and users,” Weller says.

This work builds directly on a previous study in which Weller and his advisors gathered real-world examples of negation and created the first evaluation dataset designed to test information retrieval systems’ performance in these cases. Crucially, they found that no existing information retrieval systems correctly understand the nuances of negation, often ignoring it altogether and choosing to use negated documents just as often as non-negated ones when answering a user query.

A recent reproduction paper by researchers from the University of Amsterdam confirms that this is still a problem, although both groups note that using larger language models can help. But the amount of time and computing power it takes to use these bigger models means this wasn’t a feasible solution for everyday searches—that is, until Rank1 came along.

Despite its smaller size, the researchers were able to demonstrate that Rank1 works just as well as larger, slower, and more expensive information retrieval models. The Hopkins team plans to continue testing Rank1 on other tasks, languages, and setups to see how it might best be used in the future.

In the meantime, Weller advises that search engine and chatbot users continue to rephrase negated queries as positive ones—for example, rewriting “What should I not do…?” into “What should I do…?”—especially when using interactive information retrieval systems that rely on smaller language models than Google’s or OpenAI’s products do.

“We hope our analysis will spur increased attention to the problem of negation in information retrieval and encourage model developers to perform additional training and evaluation,” he says.

Weller is supported by the National Science Foundation Graduate Research Fellowship Program.