Numerous artificial intelligence (AI) systems, even those designed to be helpful and truthful, have already learned how to deceive humans. In a review article recently published in the journal Patterns, researchers highlight the dangers of AI deception and urge governments to quickly establish robust regulations to mitigate these risks.
許多人工智能(AI)系統(tǒng),即使是那些旨在提供幫助和誠實(shí)的系統(tǒng),也已經(jīng)學(xué)會了如何欺騙人類。 在最近發(fā)表在《模式》雜志上的一篇評論文章中,研究人員強(qiáng)調(diào)了人工智能欺騙的危險(xiǎn),并敦促各國政府迅速制定強(qiáng)有力的法規(guī)來減輕這些風(fēng)險(xiǎn)。
“AI developers do not have a confident understanding of what causes undesirable AI behaviors like deception,” says first author Peter S. Park, an AI existential safety postdoctoral fellow at MIT. “But generally speaking, we think AI deception arises because a deception-based strategy turned out to be the best way to perform well at the given AI’s training task. Deception helps them achieve their goals.”
“人工智能開發(fā)人員對于導(dǎo)致欺騙等不良人工智能行為的原因并沒有自信的理解,”第一作者、麻省理工學(xué)院人工智能存在安全博士后研究員彼得·S·帕克(Peter S. Park)說。 “但總的來說,我們認(rèn)為人工智能欺騙的出現(xiàn)是因?yàn)榛谄垓_的策略被證明是在給定的人工智能訓(xùn)練任務(wù)中表現(xiàn)良好的最佳方式。 欺騙可以幫助他們實(shí)現(xiàn)目標(biāo)。”
Park and colleagues analyzed literature focusing on ways in which AI systems spread false information—through learned deception, in which they systematically learn to manipulate others.
Park 和同事分析了一些文獻(xiàn),重點(diǎn)關(guān)注人工智能系統(tǒng)通過習(xí)得性欺騙傳播虛假信息的方式,即系統(tǒng)地學(xué)習(xí)如何操縱他人。
The most striking example of AI deception the researchers uncovered in their analysis was Meta’s CICERO, an AI system designed to play the game Diplomacy, which is a world-conquest game that involves building alliances. Even though Meta claims it trained CICERO to be “largely honest and helpful” and to “never intentionally backstab” its human allies while playing the game, the data the company published along with its Science paper revealed that CICERO didn’t play fair.
研究人員在分析中發(fā)現(xiàn)的最引人注目的人工智能欺騙例子是 Meta 的 CICERO,這是一個(gè)旨在玩外交游戲的人工智能系統(tǒng),這是一款涉及建立聯(lián)盟的征服世界游戲。 盡管 Meta 聲稱它訓(xùn)練的 CICERO “基本上是誠實(shí)和樂于助人的”,并且在玩游戲時(shí)“從不故意背刺”其人類盟友,但該公司隨《科學(xué)》論文一起發(fā)布的數(shù)據(jù)顯示,CICERO 的游戲并不公平。
“We found that Meta’s AI had learned to be a master of deception,” says Park. “While Meta succeeded in training its AI to win in the game of Diplomacy—CICERO placed in the top 10% of human players who had played more than one game—Meta failed to train its AI to win honestly.”
“我們發(fā)現(xiàn) Meta 的人工智能已經(jīng)學(xué)會了成為欺騙大師,”帕克說。 “雖然 Meta 成功地訓(xùn)練其 AI 在外交游戲中獲勝——CICERO 在玩過一場以上游戲的人類玩家中排名前 10%,但 Meta 卻未能訓(xùn)練其 AI 誠實(shí)地獲勝。”
Other AI systems demonstrated the ability to bluff in a game of Texas hold ‘em poker against professional human players, to fake attacks during the strategy game Starcraft II in order to defeat opponents, and to misrepresent their preferences in order to gain the upper hand in economic negotiations.
其他人工智能系統(tǒng)展示了在德州撲克游戲中對職業(yè)人類玩家進(jìn)行虛張聲勢的能力,在策略游戲《星際爭霸 II》中假冒攻擊以擊敗對手,以及歪曲對手的偏好以在游戲中占據(jù)上風(fēng)的能力。
While it may seem harmless if AI systems cheat at games, it can lead to “breakthroughs in deceptive AI capabilities” that can spiral into more advanced forms of AI deception in the future, Park added.
Park 補(bǔ)充道,雖然人工智能系統(tǒng)在游戲中作弊看似無害,但它可能會帶來“欺騙性人工智能能力的突破”,從而在未來演變成更高級的人工智能欺騙形式。
Some AI systems have even learned to cheat tests designed to evaluate their safety, the researchers found. In one study, AI organisms in a digital simulator “played dead” in order to trick a test built to eliminate AI systems that rapidly replicate.
研究人員發(fā)現(xiàn),一些人工智能系統(tǒng)甚至學(xué)會了欺騙旨在評估其安全性的測試。 在一項(xiàng)研究中,數(shù)字模擬器中的人工智能生物體“裝死”,以欺騙旨在消除快速復(fù)制的人工智能系統(tǒng)的測試。
“By systematically cheating the safety tests imposed on it by human developers and regulators, a deceptive AI can lead us humans into a false sense of security,” says Park.
帕克說:“通過系統(tǒng)地欺騙人類開發(fā)人員和監(jiān)管機(jī)構(gòu)對其進(jìn)行的安全測試,欺騙性的人工智能可能會導(dǎo)致我們?nèi)祟惍a(chǎn)生錯誤的安全感。”
“We as a society need as much time as we can get to prepare for the more advanced deception of future AI products and open-source models,” says Park. “As the deceptive capabilities of AI systems become more advanced, the dangers they pose to society will become increasingly serious.”
“作為一個(gè)社會,我們需要盡可能多的時(shí)間來為未來人工智能產(chǎn)品和開源模型的更先進(jìn)的欺騙做好準(zhǔn)備,”帕克說。 “隨著人工智能系統(tǒng)的欺騙能力變得更加先進(jìn),它們對社會造成的危險(xiǎn)將越來越嚴(yán)重。”