Skip to main content

Your web browser is out of date. Please update it for greater security, speed and the best experience on this site.

Choose a different browser

Discussion

Views on the use and future impact of generative AI in peer review remain polarized among physical sciences researchers.  

When it comes to predicting the future impact of generative AI tools on peer review, we asked the same question as we did in our 2024 survey, and found that fewer respondents were neutral in 2025 compared with 2024, with more respondents indicating that they foresaw a positive impact (29% in 2024, vs 41% in 2025). It is worth noting that the sample size for this survey was substantially smaller than that for last year’s survey, and the populations are different, so making direct comparisons is tricky. However, it seems that views remain polarized, with a smaller proportion of people ‘on the fence’ regarding the impact of AI in peer review. This may be explained by greater awareness of AI in 2025 compared with 2024, or diverging viewpoints. 

Gender and career stage divides

When we analyzed responses by gender, we found that women tended to have more negative views about the impact of AI on the future of peer review compared with men. This finding is consistent with other studies showing a gender gap in perceptions and usage of generative AI, including a 2024 study by Møgelvang and colleagues, who surveyed 2692 students in Norway. They found that men used generative AI chatbots more frequently and across a wider range of tasks, while women expressed greater concern about critical thinking and trust in AI. The gender gap in perceptions of AI is further supported by a meta-analysis by Otis and colleagues who analysed data from 18 studies covering over 140,000 individuals worldwide. They found that men use generative AI more than women across nearly all regions, sectors and occupations, even when access to the technology is equal.

A 2024 study by Oxford University Press of 2345 academic researchers found distinct clusters with regards to attitudes about AI, ranging from enthusiastic adopters to those fundamentally opposed to its use. Researchers in the humanities were much more likely to be labelled as ‘Challengers’, researchers who expressed strong reservations about AI’s role in scholarly work. It is notable that the humanities tend to have a higher proportion of female researchers compared with other fields. These overlapping patterns suggest that both disciplinary culture and gender may influence attitudes toward AI. 

Similarly, when we analyzed the responses by career level, we found that more junior researchers tended to have a more positive view of the impact of generative AI compared with their more senior colleagues. A 2025 survey by Mohammadi and colleagues showed similar findings. In their sample of approximately 2000 researchers, they found that PhD students were most likely to report using generative AI tools in their academic work, with more senior researchers using it less.

This generational difference may be due to a gap in perceptions of technology in general, with younger generations of researchers more likely to be ‘digital natives’ than their more senior colleagues. Junior researchers may also have limited experience of the peer review process and may lack a full understanding of what a good peer review report looks like or the impact it can have on improving a manuscript.  

Publishers considering building or implementing tools for AI-assisted peer review should take the gender and seniority gap in perceptions and use of AI seriously. If AI tools are designed or adopted without accounting for these differences, they risk reinforcing existing inequities in the system.   

Reviewer community’s use of AI

When respondents were asked to give their thoughts about the impact of AI in peer review via a free-text box, the comments were varied. Many respondents see generative AI as a tool that has the potential to support reviewers and increase efficiency, but a high proportion of respondents also had concerns about AI reducing the depth and integrity of the peer review process, emphasizing the importance of expert judgment and ethical considerations.  

Several comments seemed to suggest that generative AI could be used for logical reasoning and technical analysis, rather than to simply generate and edit text. For example, this comment: “I do not use AI for peer review. Only reading, checking reference content and reproducibility. Then analysing methods and results.” Comments such as this seem to be based on misconceptions of how generative AI works. Current, consumer-accessible LLMs are not capable of high-level logical reasoning, but instead work by predicting the next most likely word in a sequence, with text outputs that make it appear that they have engaged in logical reasoning or analysis. When designing policies around the use of generative AI in peer review, publishers should be clear and transparent about the true capacity of consumer-accessible AI tools.  

Around one third (32%) of respondents reported using generative AI when acting as a peer reviewer. This is higher than the figure of 7 to 17% of AI-written or augmented reports identified by Liang and colleagues in their analysis of peer review reports submitted to AI conferences. It is also higher than the 12% reported in Wiley’s 2024 global survey of researchers who said that they used AI for “assistance in peer reviewing articles from other researchers”.  The elevated rate in our survey may reflect an AI adoption pattern specific to the physical sciences. It may also be a result of timing – our data were collected in 2025 at a time when AI adoption has increased drastically.  

When asked how exactly they had used generative AI, around half (48%) selected more than one answer, indicating that there is a sizeable proportion of researchers who are already using generative AI in the peer review process in different ways. The most common way in which respondents report using AI is writing a review and then putting it into an AI tool to improve flow and grammar. This highlights an important issue – fully AI-generated peer reviews (where the ‘reviewer’ has not actually read or critiqued the manuscript) are very different to AI-augmented reviews which are being edited for flow and grammar.

Many academic publishers have introduced increasingly nuanced policies that reflect the delicate balance between efficiency, innovation and research integrity. Most publisher policies prohibit the use of consumer-facing generative AI tools to conduct analysis or evaluation of submitted manuscripts, or to upload all or part of submitted manuscripts to AI tools, due to the associated privacy, confidentiality and integrity concerns. On the other hand, many do permit the use of AI to improve the clarity of reviews, provided the reviewer discloses this AI use and accepts full accountability for the review content.  

The second most cited way of using AI tools in peer review was to digest or summarise an article under review – which suggests that reviewers are regularly putting manuscripts under review into LLM chatbots either in part or in their entirety. This poses challenges around copyright and confidentiality and is something that all parties – publishers, reviewers, and authors – should consider. In fact, one of the main themes among responses to the free-text question, “What (if any) do you think are the ethical issues surrounding use of AI in peer review?” was concerns about the risks to confidentiality and data privacy that come with reviewers uploading unpublished manuscripts to AI tools. Respondents also noted concerns around accountability; human reviewers are accountable for their comments, but AI is not. Another concern is that AI might be biased due to its training inputs being biased. Some respondents were worried that more original, creative or esoteric works would be automatically dismissed by AI. Of the respondents who felt that AI does have a place in the peer review process, most felt that it could be used ethically for grammar and language checking, but that it should not be a replacement for human expertise. 

While 32% of respondents said they had used generative AI in some form when acting as a peer reviewer, 57% of respondents (195 of 345) said they agreed or strongly agreed with the statement “I would be unhappy if peer reviewers used AI to write their peer review reports on a manuscript that I had co-authored” and 42% said they either agreed or strongly agreed with the statement “I would be unhappy if peer reviewers used AI to augment their peer review reports on a manuscript that I had co-authored”. 

This highlights the balancing act that publishers face: navigating the differing expectations of reviewers and authors, even though they are often the same individuals with different ‘hats’ on. The data suggest that researchers may be more comfortable using AI when reviewing others’ work than having it used to assess their own. 

Recognizing AI generated reviews

The question, “What (if any) do you think are the hallmarks of AI-generated peer review reports?” yielded a range of interesting responses. Many respondents stated that AI-generated peer review reports use generic language and lack the depth of subject knowledge that an expert reviewer can provide. Others cited a mechanical or unnatural tone, excessive or unusual punctuation or formatting, verbose and overly elaborate language, and a large number of dashes (-) used. These hallmarks match those identified in studies of AI-generated peer review reports such as that by Zhu and colleagues.  

Given that AI tools and their outputs are rapidly changing, it is difficult for publishers and concerned authors to stay ahead of the specific hallmarks of AI-generated reviews. Nevertheless, well-designed studies comparing AI-generated reviews to those written by human experts would be welcome. A comparison of the scientific utility of AI-generated reviews and the ability of authors and editors to differentiate between the two would be illuminating.