“As the scientific community continues to evolve, it is essential to leverage the latest technologies to improve and streamline the peer-review process. One such technology that shows great promise is artificial intelligence (AI). AI-based peer review has the potential to make the process more efficient, accurate, and impartial, ultimately leading to better quality research.”
I suspect many of you were not fooled into thinking that was me who wrote that statement. A well-known AI tool wrote those words after I prompted it to discuss using AI in the peer review process. More and more, we are hearing stories about how researchers may use these tools when reviewing others’ applications, and even writing their own applications.
Even if AI tools may have “great promise,” do we allow their use?
Reviewers are trusted and required to maintain confidentiality throughout the application review process. Thus, using AI to assist in peer review would involve a breach of confidentiality. In a recently released guide notice, we explain that NIH scientific peer reviewers are prohibited from using natural language processors, large language models, or other generative AI technologies for analyzing and formulating peer review critiques for grant applications and R&D contract proposals.
Reviewers have long been required to certify and sign an agreement that says they will not share applications, proposals, or meeting materials with anyone who has not been officially designated to participate in the peer review process. Yes, this also means websites, apps, or other AI platforms too. As part of standard pre-meeting certifications, all NIH peer reviewers and members of NIH advisory councils will be required to certify a modified Security, Confidentiality and Nondisclosure Agreement signifying that they fully understand and will comply with the confidential nature of the review process, including complete abstention from artificial intelligence tools in analyzing and critiquing NIH grant applications and contract proposals.
In other words, grant application and contract proposal materials and other privileged information cannot be shared or disseminated through any means or entity. Let’s explore this issue further with some hypothetical examples.
After agreeing to serve, Dr. Hal was assigned several grant applications to review. Hal has had a lot of experience writing grant applications before and knows how much effort and time they require. Even with that in mind, they were daunted when trying to give each of these applications their attention and appropriate review.
So, they turned to AI. They rationalized it would provide an unbiased assessment of the research proposed because it should be able to pull from numerous references and resources fairly quickly and distill the relevant information. And, to top it off, Hal even found a platform that was trained on publicly available biomedical research publications and NIH-funded grants.
Not seeing a problem, Hal fed the relevant information from the applications into the AI. Moments later, it gave an assessment, which Hal used as a starting point for their critique.
Here is another scenario:
Dr. Smith just finished reading, what seems to be, way too many grant applications. Exhausted they may be as an NIH peer reviewer, their job is not done until those critiques are also written. Tired, a bit hungry, and ready to just get home, they wonder if any of these new AI chat bots might be able to help. They rationalized it would just be used to create a first draft, and then they would go back to clean up the review critique before submitting.
Smith copied the abstract, specific aims, and research strategy sections of the applications. They uploaded to one of the AI systems that is publicly available, and widely used by many people for numerous different reasons.
A few minutes later, Ta-Da! There was some narrative that could be used for the first draft. Getting those initial critique drafts going saved hours of time.
To be clear, both of these situations are not allowed. Everybody involved with the NIH peer review process shares responsibility in maintaining and upholding the integrity of review. A breach of confidentiality, such as those described above, could lead to terminating a peer reviewer’s service, referring them for government-wide suspension or debarment, as well as possibly pursuing criminal or civil actions based on the severity. NIH Guide Notice NOT-OD-22-044, our Integrity and Confidentiality in NIH Peer Review page, and this NIH All About Grants podcast explain more.
Ensuring confidentiality means that scientists will feel comfortable sharing their candid, well-designed, and thorough research ideas with us. Generative AI tools need to be fed substantial, privileged, and detailed information to develop a peer reviewer critique on a specific grant application. Moreover, no guarantee exists explaining where AI tools send, save, view, or use grant application, contract proposal, or critique data at any time. Thus, using them absolutely violates our peer review confidentiality expectations.
NIH peer reviewers are selected and invited to review applications and proposals specifically for their individual expertise and opinion. The data that generative AI are trained on is limited to what exists, what has been widely published, and what opinions have been written for posterity. Biases are built into this data; the originality of thought that NIH values is lost and homogenized with this process and may even constitute plagiarism.
We take this issue seriously. Applicants are trusting us to protect their proprietary, sensitive, and confidential ideas from being given to others who do not have a need to know. In order to maintain this trust and keep the research process moving forward, reviewers are not allowed to share these applications with anybody or any entity.
Circling back to the beginning for a moment, we wanted to say a few words about using AI in writing one’s application. We do not know, or ask, who wrote an application. It could have come from the principal investigator, a postdoc, a professional grant writer, or involved the wider research team. If you use an AI tool to help write your application, you also do so at your own risk.
This is because when we receive a grant application, it is our understanding that it is the original idea proposed by the institution and their affiliated research team. Using AI tools may introduce several concerns related to research misconduct, like including plagiarized text from someone else’s work or fabricated citations. If we identify plagiarized, falsified, or fabricated information in a grant write-up, we will take appropriate actions to address the non-compliance. In my example above, we ran the AI-generated text through a well-known online tool which did not detect any plagiarism. Though we included it here for illustrative purposes, you should always be mindful about these concerns when putting together your application.
Using AI to judge novel ideas and hypotheses should obviously be a no-no. Please be aware that even if any of these trained AI models provides a very intellectual output, it IS in many cases completely nonsense. There are multiple breaches being done even if an abstract from a grant is posted in a AI chatbot. I don’t want my proposal to be feed for the training algorithms. Not to mention, science is driven by addressing the unknown and assigning relevance and significance to the question. The human brain is trained for this. These AI chatbots are not.
That is a very poor excuse for disallowing such tools. From what I understand most of the current GTP tools do not store the information that is fed into them beyond very short-term memory so the rationale for invoking a confidentiality clause is not clear. Is the cloud-based nature of AI tools the rationale? Is storing the proposal on a google drive a breach of confidentiality? What if in a few months, there will be a possibility of a locally run pre-trained GPT-4 model that can run on your laptop with the same functionality? There are better reasons to prohibit this for now but this is a poor argument.
Although I don’t advocate using “AI” (Machine Learning, natural language modals) to review, I am curious whether your revised agreement wording deals explicitly with the possibility of a natural language model that does not require uploading or sharing any text.
Even when you write in MS word and it automatically generates sentence completion suggestions or grammar corrections you are using NL models of a kind. At least in principle, one could download a NL model trained on public data and then input text strictly locally to generate summaries etc without ever sharing or uploading content, which does not seem to entail breach of confidentiality. A blurrier case might be federated AI in which training data (weight gradients) are shared with the central AI (ML) for training without sharing any actual data. Proponents claim this protects privacy but it’s hard to predict what future algorithms might be able to reconstruct.
Thanks for this cautionary advice. One could easily imagine turning AI generated text in order to prepare a review especially when overworked. It is important to realize that so doing constitutes a breach of confidentiality.
You write: “Reviewers have long been required to certify and sign an agreement that says they will not share applications, proposals, or meeting materials with anyone who has not been officially designated to participate in the peer review process. Yes, this also means websites, apps, or other AI platforms too.”
Please note that Microsoft Word is an app. Thus, if a reviewer types their critique into Microsoft Word, strictly speaking, they are sharing meeting materials (assuming the critique is a meeting material) with something not officially designated to participate in the review process. I realize this is a rather strict interpretation, but I am trying to encourage you to refine your language. This concern becomes more plausible when you consider that Microsoft is embedding AI large language models (LLMs) into Word. What guarantees does Microsoft provide that the text that users enter into Word does not find its way into LLMs?
I cannot more agree with this. I have been on study section for 14 years and still review grants. and despite all the hard work, fatigue, stress, this is a rewarding experience as long as it builds on our own expertise. As reviewers we will also need tools to detect the use of AI in grant application we are reviewing.
According to the privacy policy of OpenAI the conversations are not stored or made public. If the conversation is not public or stored, how does this violate the reviewer policy?
Please expand on this aspect, since it is a legal concern.
OpenAI tools-like are here to stay, we are already changing classes structure and student evaluation. Would a strategy to use it in a safe and ethical way would be a better approach?
Thank you for the question and comment. We have a short list of FAQs (https://grants.nih.gov/faqs#/use-of-generative-ai-in-peer-review.htm) and will continue to add to them, based on questions such as yours. We’re still thinking through all the issues and the technology and its use is evolving. Current and future policy will be driven by the need to maintain confidentiality in the peer review process and to protect the intellectual property of investigators.
This should be obvious. Not only is confidential information being disclosed, but we as reviewers need to check the veracity of data from preliminary studies or prior publications. Biomedicine has been made keenly aware of how much research is not reproducible. Our charge is to add rigor to the review process. AI will certainly save us time, but at the cost of quality.
What is the policy on using a search engine like Google? I have sometimes wondered when doing searches that juxtapose unusual terms, to see if anyone else has explored the connection in papers or abstracts, that I might be disclosing the kernel of the idea or concept to those maintaining the search engine. I have not worried too much about this as it would likely not be statistically significant to the search engine, but if the connected terms came from a grant application, would you apply the same concern about breach of confidentiality? Does one need to be thoughtful about the strategy for such a search to only use public information when doing research related to a grant application? Perhaps search engines already us some AI elements.
How does this apply to using tools like Grammarly, which are AI tools which integrate into one’s computer operating system to improve clarity of writing? Similarly, does writing a critique in Google Docs or online versions of Word, which many of us use for simplicity of backup and access to our work between home and office count as sharing confidential information with a third party? What about Dropbox? This policy somehow doesn’t seem well thought-out.
Hurray! I don’t want to be judged by a language model, no matter how sophisticated. As one who as delved into machine learning, I know that AI will tend to score poorly any language that is unexpected given the bulk of the language in the training set. The more novel the proposal, the less like the training set it would be.
Request for clarification: if you run a local copy of an LLM that doesn’t require uploading data to a website, nor use any grant data you feed to it to refine a publicly available model, is it ok to use the LLM to aid with peer review? In this case, no data would be transmitted to third parties, so it is not expected to breach confidentiality.
If using a local LLM/AI is a breach of confidentiality, how does opening a grant application in Acrobat, Word or some other app (that often has hidden reporting to the company enabled, and/or can run macros) comply with that same policy?
Good question. The concern is with sharing confidential information – use of tools that would require uploading information to a website is not allowed. We have some FAQs posted (https://grants.nih.gov/faqs#/use-of-generative-ai-in-peer-review.htm) and we’ll continue to add to them as we receive questions like this.
After reading this column on the use AI in peer review, I asked myself if NIH would not consider to reform their peer review system through study sections where reviewers are currently asked to review “what seems to be, way too many grant applications” in too little time. Less can be more. And the exhausted reviewer would not be tempted to use AI tools to get the reviews finalized.
Thank you for this important advice. I believe it is also worth emphasizing that at least one well-known AI platform, when addressing a hypothetical research question will generate completely false references written in a form that looks perfectly correct. In our own examination of this technology made earlier this year when AI platforms became publicly available, we learned that the citations offered did not exist. This is extremely worrisome and suggests that AI can be a very dangerous tool to use, whether exploring ideas or writing or reviewing applications.