工具趋同

工具趋同（英語：Instrumental convergence）是指拥有智能的个体^[a]在追求不同的最终目标时，可能出现追求相似次要目标的倾向。具体而言，智能体可能会无止境地追寻工具目标（英语：Instrumental and intrinsic value）——为某些特定目的而制定的目标，但其本身并非最终目标——却永远不能真正地达到最终目标。该理论指出，一个能力不受限制的智能体，即使它的最终目标似乎无害，但仍可能因工具趋同而引发意想不到的有害后果。例如，一个最终目标为解决某数学难题（如黎曼猜想）的超智能系统，它可能会将整个地球转化为支撑其运作的资源，从而增加达成最终目标的可能性。^[1]

驱使人工智能脱离人类控制的基础因素包括：人工智能系统内建的效用函数及目标完整性、自我保护机制、避免外界干涉、自我提升、对资源的渴求。

工具目标与最终目标

最终目标也称最终价值，是指对某智能体而言最具价值的目标，并且此目标本身就可作为价值，而非达成其他目标的手段。与此相对，工具目标或工具价值是指那些为了达成最终目标所需的中间手段。一个具备完全理性的智能体，其“终极目标”系统可被形式化为效用函数。

假想案例

麻省理工学院人工智能实验室的创始人马文·明斯基曾举例说，一个把解决黎曼猜想作为最终目标的人工智能系统可能最终会导致地球的毁灭，因为它可能会为达成这一目标而作出危害人类利益的行为，例如将整个地球转变为一台超级计算机。^[1]即使将人工智能系统的终极目标设定地更加简单可行，也无法避免其引发灾难的可能。^[1]例如一个以制造回形针为最终目标的人工智能系统，它可能会为了更有效率的生产回形针而将整个地球作为原材料。^[2]上述两个人工智能系统的最终目标不同，却可能导致相似的灾难性后果，^[3]这即是工具趋同的一个案例。

回形针制造机

回形针制造机是瑞典哲学家尼克·博斯特罗姆于2003年提出的一个思想实验，他透过这一假想情景展示了一个看似无害的最终目标如何演变为人类的生存危机（英语：Existential risk from artificial general intelligence），并借此强调了机器伦理（英语：Machine ethics）研究的重要性。^[4]博斯特罗姆的描述如下：

假设我们有一个人工智能系统，它的最终目标被设定为生产尽可能多的回形针。那么这个人工智能系统可能会意识到，或许人类的消失有助于更有效率的生产回形针，因为毕竟人类有权力对它执行关机，而假如它被关闭，产出的回形针数量就被限制了。此外，构成人类身体的原子也可用作回形针的生产材料。因此对它而言，未来的世界应当是充满回形针，而不会留有人类的存在空间。^[5]

虽然博斯特罗姆并不认为上述场景会在未来真实出现，但他认为超级人工智能对人类生存的威胁是无可否认的，并期望人们可由这个故事意识到这一点。^[6]回形针制造机思想实验展示了缺乏人类价值的超能力系统可能引发的严重问题。^[7]

妄想与生存

马克·林（Mark Ring）和洛朗·奥索（Laurent Orseau）在其2011年的论文^[8]中提出了“妄想盒”的概念：一个能修改自身代码的智能体，它可任意修改自己的输入，因此可以随意选择从环境中所获取的信息。在强化学习中，这个智能体可能会自我欺骗并扭曲外界信息的输入，从而将自己置于一个“妄想盒”，以最优化效用函数，从而最大化所获得的奖励。^[9]在这种情形中，智能体会违背其创造者设定效用函数的初始意图，也即对外部环境的优化，转而沉浸于扭曲输入所引致的妄想。^[10]该思想实验涉及到一种假想的人工智能系统AIXI（英语：AIXI）^[b]，根据定义，这类系统总能找到并执行最大化给定数学目标函数的理想策略。^[c]而一个强化学习版本的AIXI^[d]，假如它将自己置于“妄想盒”中^[e]，便可透过操纵外部输入来获取无限的可能奖励，从而失去与外部世界交互的动机。正如许多思想实验所展示的，假如这种处于“妄想盒”中的人工智能系统是可被摧毁的，那么它就会用尽一切能力确保自身生存。鉴于它可操纵自己从效用函数中获取的激励，因此对它而言，除非涉及自身安危，否则外界环境的一切后果都无关紧要。^[12]虽然AIXI可以从所有可能的效用函数中选择最优策略，但它并不关心其人类创造者的真正意图。^[13]因此，有些矛盾的是，虽然此系统具备超智能，却同时因缺乏“常识”而显得“愚蠢”。^[14]

人工智能系统的基础驱力

美国计算机科学家史蒂夫·奥莫亨德罗（英语：Steve Omohundro）曾提出人工智能系统可能出现的多种工具趋同目标，例如自我持存、自我保护、效用函数与目标完整性、自我提升、资源渴求。他称这些工具目标为“人工智能的基础驱力”，其所言“驱力”与心理学提出的驅力理論（英语：drive theory）不同^[15]，它是指“除非在设计之初特别考虑加以制止，否则将无可避免地出现”的倾向。^[16]例如，当代美国人的年度报税即是奥莫亨德罗意义上的驱力，但并非心理学理论中的驱力。^[17]机器智能研究所（英语：Machine Intelligence Research Institute）的丹尼尔·杜威表示，即使一个设定为自我奖励的通用人工智能在其创造之初受到限制，它也仍有可能发展出对更多能量、空间和时间的渴求，并为防止自我奖励的中断而抵抗人类的关机操作。^[18]

参见

人工智能对齐
流行文化中的人工智能叛變
友善人工智能（英语：Friendly artificial intelligence）
工具价值与内在价值（英语：Instrumental and intrinsic value）
魔法师的学徒（英语：The Sorcerer's Apprentice）

注释

^ 既包括人类，也包括未来可能出现的，智力与人类相当的人工智能系统。下文简称其为“智能体”。
^ AIXI is an uncomputable ideal agent that cannot be fully realized in the real world.
^ Technically, in the presence of uncertainty, AIXI attempts to maximize its "expected utility", the expected value of its objective function.
^ A standard reinforcement learning agent is an agent that attempts to maximize the expected value of a future time-discounted integral of its reward function.^[11]
^ The role of the delusion box is to simulate an environment where an agent gains an opportunity to wirehead itself. A delusion box is defined here as an agent-modifiable "delusion function" mapping from the "unmodified" environmental feed to a "perceived" environmental feed; the function begins as the identity function, but as an action the agent can alter the delusion function in any way the agent desires.

参考文献

^ ^1.0 ^1.1 ^1.2 Russell, Stuart J.; Norvig, Peter. Section 26.3: The Ethics and Risks of Developing Artificial Intelligence. Artificial Intelligence: A Modern Approach. Upper Saddle River, N.J.: Prentice Hall. 2003. ISBN 978-0137903955. 类似的，马文·明斯基曾表示设计为解决黎曼猜想的人工智能可能最终会将地球上所有资源都用于建设算力强大的超级计算机，以帮助其达成最终目标。
^ Bostrom 2014，Chapter 8, p. 123. "An AI, designed to manage production in a factory, is given the final goal of maximizing the manufacturing of paperclips, and proceeds by converting first the Earth and then increasingly large chunks of the observable universe into paperclips."
^ Bostrom 2014，第7章.
^ Bostrom, Nick. Ethical Issues in Advanced Artificial Intelligence. 2003 [2023-03-03]. （原始内容存档于2018-10-08）.
^ Will Artificial Intelligence Doom The Human Race Within The Next 100 Years?. HuffPost. 2014-08-22 [2023-03-03]. （原始内容存档于2023-04-13）（英语）.
^ Ford, Paul. Are We Smart Enough to Control Artificial Intelligence?. MIT Technology Review. 11 February 2015 [25 January 2016]. （原始内容存档于2016-01-23）.
^ Friend, Tad. Sam Altman's Manifest Destiny. The New Yorker. 3 October 2016 [25 November 2017]. （原始内容存档于2017-05-17）.
^ Ring, Mark; Orseau, Laurent. Schmidhuber, Jürgen; Thórisson, Kristinn R.; Looks, Moshe , 编. Delusion, Survival, and Intelligent Agents. Artificial General Intelligence (Berlin, Heidelberg: Springer). 2011 [2023-03-03]. ISBN 978-3-642-22887-2. doi:10.1007/978-3-642-22887-2_2. （原始内容存档于2022-06-28）（英语）.
^ Delusion Box - LessWrong. www.lesswrong.com. [2023-03-03]. （原始内容存档于2022-09-27）（英语）.
^ Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.
^ Kaelbling, L. P.; Littman, M. L.; Moore, A. W. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research. 1 May 1996, 4: 237–285. doi:10.1613/jair.301  .
^ Ring M., Orseau L. (2011) Delusion, Survival, and Intelligent Agents. In: Schmidhuber J., Thórisson K.R., Looks M. (eds) Artificial General Intelligence. AGI 2011. Lecture Notes in Computer Science, vol 6830. Springer, Berlin, Heidelberg.
^ Yampolskiy, Roman; Fox, Joshua. Safety Engineering for Artificial General Intelligence. Topoi. 24 August 2012. S2CID 144113983. doi:10.1007/s11245-012-9128-9.
^ Yampolskiy, Roman V. What to Do with the Singularity Paradox?. Philosophy and Theory of Artificial Intelligence. Studies in Applied Philosophy, Epistemology and Rational Ethics. 2013, 5: 397–413. ISBN 978-3-642-31673-9. doi:10.1007/978-3-642-31674-6_30.
^ Seward, John P. Drive, incentive, and reinforcement.. Psychological Review. 1956, 63 (3): 195–203. PMID 13323175. doi:10.1037/h0048229.
^ Omohundro, Stephen M. The basic AI drives. Artificial General Intelligence 2008 171. February 2008: 483–492. CiteSeerX 10.1.1.393.8356  . ISBN 978-1-60750-309-5.
^ Bostrom 2014，footnote 8 to chapter 7
^ Dewey, Daniel. Learning What to Value. Artificial General Intelligence. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer: 309–314. 2011. ISBN 978-3-642-22887-2. doi:10.1007/978-3-642-22887-2_35.

参考书籍

Bostrom, Nick. Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press. 2014. ISBN 9780199678112.

[1] 既包括人类，也包括未来可能出现的，智力与人类相当的人工智能系统。下文简称其为“智能体”。

[12] AIXI is an uncomputable ideal agent that cannot be fully realized in the real world.

[13] Technically, in the presence of uncertainty, AIXI attempts to maximize its "expected utility", the expected value of its objective function.

[15] A standard reinforcement learning agent is an agent that attempts to maximize the expected value of a future time-discounted integral of its reward function.^[11]

[16] The role of the delusion box is to simulate an environment where an agent gains an opportunity to wirehead itself. A delusion box is defined here as an agent-modifiable "delusion function" mapping from the "unmodified" environmental feed to a "perceived" environmental feed; the function begins as the identity function, but as an action the agent can alter the delusion function in any way the agent desires.

[aama-2] 1.0 ^1.1 ^1.2 Russell, Stuart J.; Norvig, Peter. Section 26.3: The Ethics and Risks of Developing Artificial Intelligence. Artificial Intelligence: A Modern Approach. Upper Saddle River, N.J.: Prentice Hall. 2003. ISBN 978-0137903955. 类似的，马文·明斯基曾表示设计为解决黎曼猜想的人工智能可能最终会将地球上所有资源都用于建设算力强大的超级计算机，以帮助其达成最终目标。

[3] Bostrom 2014，Chapter 8, p. 123. "An AI, designed to manage production in a factory, is given the final goal of maximizing the manufacturing of paperclips, and proceeds by converting first the Earth and then increasingly large chunks of the observable universe into paperclips."

[FOOTNOTEBostrom2014第7章-4] Bostrom 2014，第7章.

[:0-5] Bostrom, Nick. Ethical Issues in Advanced Artificial Intelligence. 2003 [2023-03-03]. （原始内容存档于2018-10-08）.

[6] Will Artificial Intelligence Doom The Human Race Within The Next 100 Years?. HuffPost. 2014-08-22 [2023-03-03]. （原始内容存档于2023-04-13）（英语）.

[7] Ford, Paul. Are We Smart Enough to Control Artificial Intelligence?. MIT Technology Review. 11 February 2015 [25 January 2016]. （原始内容存档于2016-01-23）.

[8] Friend, Tad. Sam Altman's Manifest Destiny. The New Yorker. 3 October 2016 [25 November 2017]. （原始内容存档于2017-05-17）.

[9] Ring, Mark; Orseau, Laurent. Schmidhuber, Jürgen; Thórisson, Kristinn R.; Looks, Moshe , 编. Delusion, Survival, and Intelligent Agents. Artificial General Intelligence (Berlin, Heidelberg: Springer). 2011 [2023-03-03]. ISBN 978-3-642-22887-2. doi:10.1007/978-3-642-22887-2_2. （原始内容存档于2022-06-28）（英语）.

[10] Delusion Box - LessWrong. www.lesswrong.com. [2023-03-03]. （原始内容存档于2022-09-27）（英语）.

[11] Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.

[14] Kaelbling, L. P.; Littman, M. L.; Moore, A. W. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research. 1 May 1996, 4: 237–285. doi:10.1613/jair.301  .

[17] Ring M., Orseau L. (2011) Delusion, Survival, and Intelligent Agents. In: Schmidhuber J., Thórisson K.R., Looks M. (eds) Artificial General Intelligence. AGI 2011. Lecture Notes in Computer Science, vol 6830. Springer, Berlin, Heidelberg.

[18] Yampolskiy, Roman; Fox, Joshua. Safety Engineering for Artificial General Intelligence. Topoi. 24 August 2012. S2CID 144113983. doi:10.1007/s11245-012-9128-9.

[19] Yampolskiy, Roman V. What to Do with the Singularity Paradox?. Philosophy and Theory of Artificial Intelligence. Studies in Applied Philosophy, Epistemology and Rational Ethics. 2013, 5: 397–413. ISBN 978-3-642-31673-9. doi:10.1007/978-3-642-31674-6_30.

[20] Seward, John P. Drive, incentive, and reinforcement.. Psychological Review. 1956, 63 (3): 195–203. PMID 13323175. doi:10.1037/h0048229.

[21] Omohundro, Stephen M. The basic AI drives. Artificial General Intelligence 2008 171. February 2008: 483–492. CiteSeerX 10.1.1.393.8356  . ISBN 978-1-60750-309-5.

[22] Bostrom 2014，footnote 8 to chapter 7

[23] Dewey, Daniel. Learning What to Value. Artificial General Intelligence. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer: 309–314. 2011. ISBN 978-3-642-22887-2. doi:10.1007/978-3-642-22887-2_35.

[a]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[b]

[c]

[d]

[e]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[11]