Predicting Article Impact and Building a Better AI

Once a month, the J&J Professional Development Committee chooses an industry article to discuss with the greater J&J team. The topic can range from industry-spanning threads such as Open Access to specific matters such as the careful use of inclusive language in review. In this space, J&J employees—from new hires to executive leaders—can pick each other’s brains and discuss the evolving industry from many points of view. Here are some of the highlights from this May’s discussion of “Frosty reception for algorithm that predicts research papers’ impact.”

On June 16, J&J Editorial’s Professional Development Committee hosted an internal Professional Development Knowledge Exchange discussion delving into the ever-approaching usage of AI in peer review. With a massive volume of papers being written, edited, rejected, and published daily, a frequent researcher and funder pain point is the difficulty of finding and funding the most impactful and useful science.

A published paper can sit in the depths of online indexes or on the shelves of university libraries for years without serious consideration from the greater scientific community. Take, for instance, the work of Hungarian researcher Katalin Karikó, whose rejected papers from the 1990s mentioned the use of synthetic mRNA strands to produce disease-fighting proteins in the human body. Almost 30 years later, she is a senior vice president of BioNTech, a company that worked closely with Pfizer and Moderna to produce a recent vaccine you may have heard of (and hopefully received).

It’s clear there are flaws in our current paper evaluation and promotion methods. Challenges include innate bias, economic constraints, the inability to trust novel ideas, and the current metrics themselves, which favor established scientists and prestige journals. What if there was an unbiased way to determine a paper’s scientific importance without manually sifting through an entire body of literature?

Enter the AI revealed in Nature Biotechnology this May.

The AI itself is fairly simple in principle. After training on 1.7 million existing papers, the mathematical model calculates “impact” of a new manuscript using a combination of 29 metrics, including common metrics such as the number of unique researchers citing a paper, changes in the authors h-indices (a measure of productivity), as well as other author metrics. Lead researcher and co-author James Weis, a computational biologist at MIT, mentioned that the new model will work with more criteria than previous models and aims to “…use data-driven methods to help uncover ‘hidden gem’ research, which would go on to be impactful, but which may not benefit from the high out-of-the-box citation counts that are typical of well-known, highly established groups.”

In short, the AI will use more metrics than any other AI before it to find exceptional papers that wouldn’t necessarily get attention using the current citation-focused method, with the goal of guiding funders toward the most “impactful” lines of research.

Employees from J&J Editorial and members of the J&J Professional Development Committee got together to discuss the article and the use of this type of AI in the scholarly publishing industry in a conversation moderated by Elena Durvas, Managing Editor.

Two road signs reading: "Pros" and "Cons"

Limitations and reservations

The J&J Professional Development Knowledge Exchange discussion began with a concern about forms of AI that use this “preexisting materials” training model. If the AI is learning from the results of human review, is it reliable?

Sarah Mills commented, “I am skeptical… computers per se might not be biased, but the algorithms are created by humans, so any human bias that went into creating them is part of the program.” This point was echoed and expanded on by multiple co-workers, like Joanna Helms: “There are actually many documented instances of predictive algorithms repeating or even amplifying human biases, since humans program and set the constraints of algorithms. So, the risk here is that the biases that already exist in determination of research impact…”

The point was made loud and clear that any AI tool that had to learn from humans would only exacerbate the biases innate in said humans. In addition, the measurement was a topic of much discussion. How does one really calculate “impact”, or can such a thing be done? SJ Griffin gave their idea of an impactful paper, saying, “I think my idea of ‘impactful’ would be the opposite of what this algorithm would measure–novelty, discovery, new inventions, etc.” Let’s not forget that there could be many interpretations of the term impactful, and should funders put their money toward something that falls under the vague umbrella of “impactful”? The issue was summed up well by Margret Silvers: “I also see mainly the disadvantages, unfortunately, and not only from the algorithm but from the framing—I think it’s very difficult to know what the long-term impacts or usefulness of any kind of research can be, and that it’s a poor idea to fund something only because it’s promised to be ‘impactful’…”

This vague framing lead to a number of difficult questions, like “…if the pandemic hadn’t been a thing, would mRNA vaccines be getting the same amount of hype and interest right now?” or whether the AI could target “breakthrough” discoveries and recognize its value before its introduction to mainstream science? Would that same AI also be able to recognize the value of any science that added to the body of learning but was not considered a groundbreaking paper?

Three miniatures modeled like mechanics working on a full sized computer.

More evolution than revolution

The creators of the AI are aware of the potential shortcomings: “Our work should be understood as part of a broader scientific-analysis toolkit, to be used in combination with human expertise and intuition to ensure we are indeed broadening the scope of research.” If this process could be implemented correctly, the discussion group also found a number of great advantages of using such a tool.

Comparing this AI to the current system of impact, Sarah Mills said, “[The] authors of the algorithm aren’t necessarily saying that they’ve found the answer, but are trying to improve upon outdated existing metrics…” Citation metrics aside, Joanna Helms spoke from personal experience about the difficulties of publishing when you are new to a field, mentioning, “The only experience with this that I personally have is in humanities research, but I have often thought that PhD students are at a major disadvantage with regards to producing high-impact research, because it can take 5-7 years (in the field I trained in) to complete your research and have it approved by your committee. By that point, the hot topics will have substantially changed.”

If the AI works correctly alongside human review, we could see an uptick in papers coming from smaller institutions, earlier career researchers, and developing nations. Ideally, this algorithm would drive funders to the next Katalin Karikó at the same time Open Access drives faster and wider dissemination of high-impact research.

In the end, the J&J discussion group felt a cautious optimism for the future of this technology. While the developers do need to find ways to remove human biases and to be more specific about the term “impactful”, the eventual outcome, i.e., the efficient and unbiased allocation of resources to the most beneficial research, is one worth striving for.

Let us know what you think about this new model and stay tuned for next month’s J&J Professional Development Knowledge Exchange!

Article written and edited by J&J Communications Assistant Wyatt Miller and J&J Managing Editor Kellyanna Bussell. This article is based on an internal discussion by our Professional Development Committee and included contributions from Elena Durvas, Joanna Helms, SJ Griffin, Sarah Mills, and Margaret Silvers.