学术成果沦为AI饲料:数字时代的知识危机
一位不愿透露姓名的学术出版业资深人士近日在英国《批评家》杂志上发表文章,揭示了“数字优先”策略正如何系统性地侵蚀知识的价值。学者们耗费心血的研究成果,正被重新定义为可供打包、出售、甚至授权给人工智能(AI)公司用作训练数据的“内容”,这不仅威胁到知识的完整性与安全,更引发了一场深刻的行业危机。

文章分析
在深入探讨之前,我们需要先为“川透社”的读者们厘清几个核心背景。学术出版是知识生产和传播的基石,传统上以严谨的同行评审和高质量的纸本期刊、书籍为黄金标准。然而,近二三十年来,这一领域经历了两次“数字革命”:第一次是将纸本内容数字化以便保存和检索,比如JSTOR(一个全球知名的学术期刊数字图书馆)的诞生;第二次,也就是本文的核心,是“数字优先”策略,即学术成果首先甚至只以数字形式出版,纸本反而成了次要的附属品。这一转变,加上生成式人工智能(AI)对海量文本数据的渴求,共同构成了本文所要揭示的深刻危机。
主题、体裁与结构分析
这篇文章的主题非常明确:
批判学术出版界的“数字优先”和商业化策略如何系统性地将严谨的学术成果降格为可打包、可出售、可随意删改甚至可用来训练AI的“内容”(content),从而根本上侵蚀了知识生产的价值与安全。
从体裁上看,这并非一篇报道突发事件的硬新闻,而是一篇逻辑严谨、观点鲜明的深度评论(in-depth commentary)或观点文章(opinion piece)。它不追求客观中立,而是旨在通过分析和论证,警示读者一个行业性的系统危机。
其叙事上采用了经典的论证性结构,层层递进地剖析问题:首先,以牛津大学出版社(Oxford University Press, OUP)更换标志为引子,象征性地揭示“数字优先”的行业转向。接着,深入阐述这一转向带来的具体后果,如纸本质量下降、学术成果被打包成“产品”出售、短平快的期刊文章压倒耗时费力的学术专著。然后,文章探讨了背后的驱动力——由高校、图书馆和研究者共同构成的“发表或灭亡”(publish-or-perish)的资金与利益链条。随后,文章将危机推向新的高度,指出了数字平台的脆弱性(如英国国家图书馆被黑客攻击)和学术出版社为了短期利益,不惜将学者的心血授权给AI公司用作训练数据,最终导致学术成果被消解和滥用。文章结尾以一个开放性的问题收束,对学术工作的未来表达了深切的忧虑。
信源、偏向与立场分析
信源来自英国的《批评家》杂志,而作者是一位要求匿名的业内人士。这一匿名身份极大地增强了文章的批判力度和内在张力。文章的立场是旗帜鲜明地批判,对学术出版界的现状持否定和悲观态度。它并非简单地在左右政治光谱上站队,而是一种基于维护知识尊严和学术传统的专业立场和伦理立场。它明确地站在学者和知识本身的一边,反对将其完全商品化的公司行为和短视的盈利模式。
跨文化与国际传播特征
文章中提及的出版社(OUP, CUP, Pearson)、平台(JSTOR)和事件(英国国家图书馆被黑)主要集中在英美世界。这本身就反映了在全球学术出版领域,英语世界占据的主导地位,这与我们在教材第四章讨论的“媒介帝国主义”有相似之处。对于不熟悉这些机构的中国读者来说,直接阅读可能会有认知障碍。同时,“发表或灭亡”的学术压力在中国学界同样存在甚至更为激烈,这一点又能引发读者的强烈共鸣。因此,在编译时需要巧妙处理这种文化上的“隔”与“通”。
编译策略
考虑到原文逻辑严密、论证有力,但篇幅较长且部分案例对中国读者较为陌生,单纯的全译可能不够聚焦,而节译又容易破坏其完整的论证链条。因此,我建议采用单篇编译的方式。这种方式允许我们在保留核心论证和批判立场的基础上,对结构进行优化,对信息进行适当的筛选和补充说明,使其更符合“川透社”读者的阅读习惯和认知背景。
具体编译步骤建议如下:
- 重塑标题与导语:原文标题“Cooking the books”是英文双关语,无法直接翻译。我们需要根据文章核心内容,重新拟定一个既能概括主旨又具吸引力的中文标题。例如:“学术成果沦为AI饲料:数字时代的知识危机”或“匿名者警报:当你的论文被打包卖给AI”。导语部分也需要重写,可以直接点明核心危机,迅速抓住读者。
- 调整叙事结构:可以将原文的论证逻辑进行适当重组。可以考虑将当下最热门的“AI训练数据”问题前置,以此作为切入点,然后再回溯这一现象背后的“数字优先”策略、商业模式和系统性问题,使文章更具时效性和冲击力。
- 精简与聚焦:对于原文中提到的众多出版社名称,可以适当精简,保留一两个典型案例即可。对“开放获取”(open access)等相对专业的概念,可以用更通俗的语言加以解释或在编译中简化处理。
- 增加必要的注释:对于像JSTOR这类在西方学术界家喻户晓但在中国大众认知中尚不普及的平台,可以在首次提及时用括号加简短译者注的方式说明,如“(全球知名的学术期刊数字图书馆)”。这符合单篇编译中有限度补充背景信息的原则。
- 保持批判性风格:在编译过程中,必须准确传达原文那种冷静而深刻的批判语气,选用精准、有力的中文词汇,再现作者对学术价值被侵蚀的痛心与忧虑。
这场变革的标志,或许可以追溯到2021年牛津大学出版社(Oxford University Press, OUP)更换其沿用了数个世纪的古典标志。这一举动象征着其“数字优先”的决心,也预示着整个学术出版界从实体书籍向数字产品的根本性转变。
在过去的十年里,包括剑桥大学出版社(Cambridge University Press, CUP)、皮尔森(Pearson)在内的各大出版社纷纷跟进“数字优先”策略,即学术成果首先甚至只以数字形式出版。其直接后果是,实体书籍的质量显著下降——封面压花、护封和锁线装订日益稀少,取而代之的是页边距错位、纸质粗糙的按需印刷品。曾经作为知识载体的书籍,如今更像是数字文本粗糙的复制品。
更深远的影响在于,学术成果被重新定义为可供交易的“内容”。出版社将电子书和期刊打包成“学术组合产品”,按量而非按质进行营销。在这种商业逻辑下,每一篇论文、每一本专著的独立价值被稀释,它们沦为填充巨大数据库的素材,其首要意义不再是知识贡献,而是利润潜力。
利润驱动下的“竭泽而渔”
这一转变的背后,是整个高等教育体系日益公司化的资金困境。高校预算削减,图书馆只能优先采购最具“性价比”的捆绑数据库;而研究人员则深陷“发表或灭亡”的压力,其职位和经费往往取决于发表论文的数量。
这种环境极大地激励了出版社追求高周转率的短篇幅、高时效性的期刊内容,而那些耗时数年、凝聚学者深厚功力的学术专著则被视为“死资产”。为了最大化收益,出版社的编辑严谨性不可避免地出现了下滑。
脆弱的数字“保险库”与AI的“饕餮盛宴”
然而,这些被寄予厚望的数字平台本身却极其脆弱。2022年,英国国家图书馆的在线目录系统遭到黑客攻击,导致其1.7亿馆藏的检索服务瘫痪数月之久。当出版社将所有学术成果集中于自家的封闭数字平台时,任何一次技术故障或网络攻击,都可能导致知识的永久性丢失。

最令人警醒的是,面对生成式AI的兴起,学术出版机构非但没有警惕其对知识产权的威胁,反而将其视为新的盈利增长点。包括Wiley、泰勒弗朗西斯集团(Taylor & Francis)和牛津大学出版社在内的多家巨头,未经作者同意,就已将旗下数字内容授权给大型语言模型(LLMs)用作训练数据。
这种做法是彻头彻尾的短视行为。人工智能在抓取这些学术成果时,并不会将其视为严谨的知识生产,而仅仅是无差别的数据集。学者的研究被拆解、重组,以响应用户的搜索提示,其原始的上下文和创作心血被完全抹去。文章的匿名作者尖锐地指出,当出版社将学术成果卖给AI公司时,他们恰恰承认了自己对这些成果的低估:其价值在于利润,而非对学术的贡献。
这位匿名作者在文末发出了沉重的追问:当学术世界被高周转、低质量的“内容”所充斥,我们是否应该承认作者-出版商-读者这一传统关系的崩塌?在商业利益的驱动下,严肃的学术工作在出版界是否还有未来?这,是一个没有明确答案的问题。【全文完】
新闻原文
Cooking the books Anonymous: Scholarly journals and books were once the academic gold standard. Today, digital-first texts can be changed, deleted, bundled or sold to train AI bots without the author's consent
IN 2021, OXFORD UNIVERSITY PRESS (OUP) CHANGED its logo. In a move widely critiqued on Twitter-now-X and at any number of academic conferences, the world's largest university press replaced its distinctive sixteenth-century folio stamp with a plain serif rendering of the company's name. The spoked icon — which replaces, in the new logo, the first "O" of Oxford — is — apparently — reminiscent of the turning pages of a book whilst representing the company's "journey of digital transformation".
Indeed, this new logo is more than just another casualty in the minimalist epidemic sweeping the branding world. Rather, it is a self-conscious marker of OUP's digital-first publishing strategy and indicative of the shift within academic publishing from print to digital production.
THERE HAVE BEEN TWO DIGITAL REVOLUTIONS IN ACADEMIC publishing. The first came in the 1990s and focused on preservation. JSTOR pioneered the digitisation of print journals, and this practice was adopted by academic publishers throughout the 1990s and 2000s. Notably, digital content in this context followed the pattern of printed books and journals, which continued to be published physically before being uploaded to whichever online platform the publisher chose.
In the last decade, however, digital publishing has taken on a distinctly new edge. Pearson made headlines in 2019 by announcing their digital-first publishing strategy, and other major publishers including Cambridge University Press (CUP), Springer Nature, Yale University Press, OUP, and Taylor & Francis have followed suit. Digital-first publication sees scholarly materials published initially or exclusively in digital formats. Where previously, digital publication replicated physical works, the digital-first strategy firmly places the physical production, presentation, and preservation of scholarly works in second seat.
This changing prioritisation is clear in the declining quality of printed texts. Though many presses retain some form of print production for books, there is a trend towards print-on-demand. This has seen a general elimination of embossed covers, dust jackets, and sewn bindings. Misaligned margins, low paper grain, and fuzzy text reproduction now mark many print-on-demand books. They are becoming what ebooks used to be: poor facsimiles of the primary published material (now digital texts). The vast majority of journals are no longer printed at all.
This isn't just a matter of aesthetics. The shift from physical to digital production ultimately revalues and devalues the material published. Scholarly output is redefined as "content" to be packaged and sold, rather than as discrete knowledge productions with which to engage. Academic publishers bundle ebooks and ejournals into collections and present these as "scholarship portfolios", "online academic products", "digital scholarship collections", and so on. The products sold by a given academic press are now these tranches of publications, rather than individual publications themselves. Purchase and subscription models are constructed to discourage title-by-title sales, and marketing materials highlight the quantity of texts included in any product, rather than the titles which make up that quantity. Books and journals are approached as constituent parts of these tranches - each a useful addition to a collection, but not individually valuable to either publisher or buyer.
AT ITS ROOT, THIS IS A FUNDING ISSUE. AS ACADEMIC PUBLISHING has, like the whole higher education enterprise, become more corporate, strategy is determined increasingly by funding sources and profit potential. On the other side of the market stall, libraries are primarily budget-motivated: as their budgets are slashed by university administrators, their priority is to cost-effectively acquire relevant material. And at the head of the supply chain are researchers, who operate in a publish-or-perish environment where job funding is often reliant upon publication quantity.
A further factor in the funding quagmire is the altruistic notion of open access publishing, which both relies upon digital-only channels and carries the implicit notion that digital publishing is essentially free (because there are no physical production costs). But, there is no such thing as free research; there is always a cost for publication. Open access publishing flips that cost from customers, who pay to read, to researchers and research bodies which pay to publish.
Monographs and edited volumes are, in this market, dead weight. They take time to commission, write and edit. They are not regularly updated and so render few upselling opportunities; they are niche and thus difficult to package into larger products. As they often represent years of work and don't rest on the very edge of the scholarly curve, they don't register as high-impact publications, limiting open access funding and library budget allocations.
Short-form scholarship, on the other hand, is low-cost and almost endlessly saleable. Journals compile reactive research, which registers high for relevance. Journals that are not published open access lend themselves favourably to subscription selling, while journal collections cover broad subject areas whilst retaining specificity on a journal-level, prices can be increased each year, and back editions can be packaged up in tranches and resold as archives.
THIS FUNDING LANDSCAPE SHAPES A PUBLISHING CULTURE in which presses are motivated to publish as much short-form content as possible in order to maximise profit via external funding and subscription sales. This has led to an inevitable decline in editorial rigour. Though the scandals of obviously AI-generated articles published by high-reputation academic journals are amusing, they serve as a stark caution against a publishing culture which values scholarship as content, rather than as knowledge production.
In order to maximise revenue and manage the vast and rapidly-updating digital collections which result from the prioritisation of high-turnover digital content, academic publishers have invested in creating exclusive digital publishing infrastructures. Cambridge Core, Oxford Academic, and Pearson Collections, among others, are gated digital platforms on which these individual presses host their own publications. Most academic presses also have licensing agreements for select products with other for-profit digital publishing platforms such as JSTOR and EBSCOhost.
In this way, digital-first publishing has radically changed the function of libraries in the academic sphere. While for millennia, libraries have acted as repositories of knowledge - through the books and journals they owned and loaned — they are now gateways through which digital content can be accessed. In moving to digital-first publication, presses retain almost complete ownership and control over the work they publish. This poses a risk to the reliable and lasting access to scholarship which physical publications used to ensure. Publishers can handle, and mishandle, the scholarship they retain as they wish. This freedom, along with an incautious embrace of fragile digital publishing platforms betrays again the devaluation of scholarship brought by digital-first strategies.
DIGITALLY-PUBLISHED WORKS CAN BE EDITED and erased less traceably than printed texts. In theory, digital object identifiers (DOIs) should mark out and store official versions of record for ejournals articles and ebooks. However, despite calls for standardised archives, the predominance of press-specific publishing platforms means there are few shared systems for DOI cataloguing.
Publishers remove digital works from their online platforms regularly as a result of various factors (author request, discredited content, or rights conflicts, for example). Such erasure disrupts research in a way that print discontinuations do not. Once a piece of exclusive content is removed or altered on a publisher's site, even if the original exists in a DOI archive, there is no way for a researcher to access it. Content exclusively published online can vanish from the research landscape in a way that printed texts, which have a habit of turning up in obscure libraries, rarely do.
Though the risk of digital erasure by publishers themselves is concerning, the risk posed by fragile digital publishing systems is more substantial. In 2022, the British Library's online catalogues were taken hostage by the Rhysida hacker group, wreaking near-total havoc on the Library's ability to function. The 170 million items held by the Library, along with their digitised texts, were inaccessible for months as their computerised catalogue was corrupted. At the time, much was written in the library sector about the risks hackers pose to digital collections, but little has been done by academic publishers themselves. At the publishing house for which I work, senior leaders issued a concise response to questions raised about the stability of our digital publication platform: in short, we would just have to hope our security was stronger than that of the British Library.
BUT IT DOESN'T EVEN TAKE MALEVOLENT HACKING TO DISRUPT digital publishing. A brief glance through any research technology or library forum reveals how regularly publishing platforms malfunction. Predominant among the digital gremlins are access faults — login failures, multi-factor authentication glitches or regional incompatibilities. Librarians are every day battling for their researchers to have access to the scholarship for which they have paid.
The predominance of exclusive digital publishing platforms increases the threat posed by both hackers and gremlins. As publishers seek to gatekeep access to their publications, there is an increased likelihood that if a publishing platform crashes, the works held on it could be lost entirely. A commitment to digital-first publishing, without an equal commitment to ensuring the work published is secure and accessible, ultimately underscores the issue of devaluation. If one article goes missing, if one text is inaccessible, if one platform is down for a day, no matter: it's just content, there will be more.
Nowhere is this lack of regard for the security and value of the scholarship they publish clearer than in the academic publishing establishment's response to generative AI. Presses have overwhelmingly taken the opportunity to rake in more profits by licensing the use of their digital texts to train large language models (LLMs). Wiley sold access to "select, previously published content" to train LLMs without its authors' consent, as did Taylor & Francis and OUP. CUP took a different approach, issuing new rights agreements for AI licensing specifically and offering a 20 per cent royalty to authors who opted in. CUP conceded that AI trawling of its publications was inevitable and suggested that in opting in to the licensing agreement, authors could at least expect revenue and regulation. Indeed, the trawling of pirated works has already resulted in several lawsuits, highlighting the risks posed both by AI and insecure digital platforms. Significantly, CUP responded negatively to a UK government proposal that AI companies be given total access to digitally-published work unless authors opted out. As the company is clearly not against the use of its material in training LLMs, it must be concluded that this response was rooted in the loss of profit such a move would represent.
THIS LICENSING-FOR-PROFIT APPROACH IS ENTIRELY short-sighted: once AI has trawled published content and can provide it upon request, there will be an inevitable usage and subsequent profit drop-off, as researchers won't need to visit the paywalled publishing platforms directly. But, more significantly, it is another area in which profit is placed above pedagogy, and in which digital publishing corrupts the nature of scholarly works. AI trawling neither discriminates between scholarly works nor treats them as anything beyond data sets. Scholarship is dismantled and reconstituted by LLMs to respond to search prompts, obscuring the context and effort of its authors. As academic publishers license their digital publications for AI data-gathering, they concede their low view of the work they publish: its value is in profit generation, not in its contribution to scholarship or the breadth and depth of its analysis.
Where does this leave us? Digital-first publishing has redefined scholarly output as content to be repackaged, sold, and erased while AI is gripping the minds and essays of our undergraduates. Meanwhile, money continues to pivot the scholarly world towards high-turnover, low-quality publications. Should we concede the breakdown of the writer-publisher-reader relationship? Is there any scope for a change of direction?
There will be a future for academic publishers, so long as there is money to be made. Whether there is a future for scholarly work within academic publishing is another question entirely — one to which there is no clear answer.