工具趨同

工具趨同（英語：Instrumental convergence）是指擁有智能的個體^[a]在追求不同的最終目標時，可能出現追求相似次要目標的傾向。具體而言，智能體可能會無止境地追尋工具目標（英語：Instrumental and intrinsic value）——為某些特定目的而制定的目標，但其本身並非最終目標——卻永遠不能真正地達到最終目標。該理論指出，一個能力不受限制的智能體，即使它的最終目標似乎無害，但仍可能因工具趨同而引發意想不到的有害後果。例如，一個最終目標為解決某數學難題（如黎曼猜想）的超智能系統，它可能會將整個地球轉化為支撐其運作的資源，從而增加達成最終目標的可能性。^[1]

驅使人工智能脫離人類控制的基礎因素包括：人工智能系統內建的效用函數及目標完整性、自我保護機制、避免外界干涉、自我提升、對資源的渴求。

工具目標與最終目標

最終目標也稱最終價值，是指對某智能體而言最具價值的目標，並且此目標本身就可作為價值，而非達成其他目標的手段。與此相對，工具目標或工具價值是指那些為了達成最終目標所需的中間手段。一個具備完全理性的智能體，其「終極目標」系統可被形式化為效用函數。

假想案例

麻省理工學院人工智能實驗室的創始人馬文·明斯基曾舉例說，一個把解決黎曼猜想作為最終目標的人工智能系統可能最終會導致地球的毀滅，因為它可能會為達成這一目標而作出危害人類利益的行為，例如將整個地球轉變為一台超級計算機。^[1]即使將人工智能系統的終極目標設定地更加簡單可行，也無法避免其引發災難的可能。^[1]例如一個以製造回形針為最終目標的人工智能系統，它可能會為了更有效率的生產回形針而將整個地球作為原材料。^[2]上述兩個人工智能系統的最終目標不同，卻可能導致相似的災難性後果，^[3]這即是工具趨同的一個案例。

回形針製造機

回形針製造機是瑞典哲學家尼克·博斯特羅姆於2003年提出的一個思想實驗，他透過這一假想情景展示了一個看似無害的最終目標如何演變為人類的生存危機（英語：Existential risk from artificial general intelligence），並藉此強調了機器倫理（英語：Machine ethics）研究的重要性。^[4]博斯特羅姆的描述如下：

假設我們有一個人工智能系統，它的最終目標被設定為生產儘可能多的回形針。那麼這個人工智能系統可能會意識到，或許人類的消失有助於更有效率的生產回形針，因為畢竟人類有權力對它執行關機，而假如它被關閉，產出的回形針數量就被限制了。此外，構成人類身體的原子也可用作回形針的生產材料。因此對它而言，未來的世界應當是充滿回形針，而不會留有人類的存在空間。^[5]

雖然博斯特羅姆並不認為上述場景會在未來真實出現，但他認為超級人工智能對人類生存的威脅是無可否認的，並期望人們可由這個故事意識到這一點。^[6]回形針製造機思想實驗展示了缺乏人類價值的超能力系統可能引發的嚴重問題。^[7]

妄想與生存

馬克·林（Mark Ring）和洛朗·奧索（Laurent Orseau）在其2011年的論文^[8]中提出了「妄想盒」的概念：一個能修改自身代碼的智能體，它可任意修改自己的輸入，因此可以隨意選擇從環境中所獲取的信息。在強化學習中，這個智能體可能會自我欺騙並扭曲外界信息的輸入，從而將自己置於一個「妄想盒」，以最優化效用函數，從而最大化所獲得的獎勵。^[9]在這種情形中，智能體會違背其創造者設定效用函數的初始意圖，也即對外部環境的優化，轉而沉浸於扭曲輸入所引致的妄想。^[10]該思想實驗涉及到一種假想的人工智能系統AIXI（英語：AIXI）^[b]，根據定義，這類系統總能找到並執行最大化給定數學目標函數的理想策略。^[c]而一個強化學習版本的AIXI^[d]，假如它將自己置於「妄想盒」中^[e]，便可透過操縱外部輸入來獲取無限的可能獎勵，從而失去與外部世界交互的動機。正如許多思想實驗所展示的，假如這種處於「妄想盒」中的人工智能系統是可被摧毀的，那麼它就會用盡一切能力確保自身生存。鑑於它可操縱自己從效用函數中獲取的激勵，因此對它而言，除非涉及自身安危，否則外界環境的一切後果都無關緊要。^[12]雖然AIXI可以從所有可能的效用函數中選擇最優策略，但它並不關心其人類創造者的真正意圖。^[13]因此，有些矛盾的是，雖然此系統具備超智能，卻同時因缺乏「常識」而顯得「愚蠢」。^[14]

人工智能系統的基礎驅力

美國計算機科學家史蒂夫·奧莫亨德羅（英語：Steve Omohundro）曾提出人工智能系統可能出現的多種工具趨同目標，例如自我持存、自我保護、效用函數與目標完整性、自我提升、資源渴求。他稱這些工具目標為「人工智能的基礎驅力」，其所言「驅力」與心理學提出的驅力理論（英語：drive theory）不同^[15]，它是指「除非在設計之初特別考慮加以制止，否則將無可避免地出現」的傾向。^[16]例如，當代美國人的年度報稅即是奧莫亨德羅意義上的驅力，但並非心理學理論中的驅力。^[17]機器智能研究所（英語：Machine Intelligence Research Institute）的丹尼爾·杜威表示，即使一個設定為自我獎勵的通用人工智能在其創造之初受到限制，它也仍有可能發展出對更多能量、空間和時間的渴求，並為防止自我獎勵的中斷而抵抗人類的關機操作。^[18]

參見

人工智能對齊
流行文化中的人工智能叛變
友善人工智能（英語：Friendly artificial intelligence）
工具價值與內在價值（英語：Instrumental and intrinsic value）
魔法師的學徒（英語：The Sorcerer's Apprentice）

註釋

^ 既包括人類，也包括未來可能出現的，智力與人類相當的人工智能系統。下文簡稱其為「智能體」。
^ AIXI is an uncomputable ideal agent that cannot be fully realized in the real world.
^ Technically, in the presence of uncertainty, AIXI attempts to maximize its "expected utility", the expected value of its objective function.
^ A standard reinforcement learning agent is an agent that attempts to maximize the expected value of a future time-discounted integral of its reward function.^[11]
^ The role of the delusion box is to simulate an environment where an agent gains an opportunity to wirehead itself. A delusion box is defined here as an agent-modifiable "delusion function" mapping from the "unmodified" environmental feed to a "perceived" environmental feed; the function begins as the identity function, but as an action the agent can alter the delusion function in any way the agent desires.

參考文獻

^ ^1.0 ^1.1 ^1.2 Russell, Stuart J.; Norvig, Peter. Section 26.3: The Ethics and Risks of Developing Artificial Intelligence. Artificial Intelligence: A Modern Approach. Upper Saddle River, N.J.: Prentice Hall. 2003. ISBN 978-0137903955. 類似的，馬文·明斯基曾表示設計為解決黎曼猜想的人工智能可能最終會將地球上所有資源都用於建設算力強大的超級計算機，以幫助其達成最終目標。
^ Bostrom 2014，Chapter 8, p. 123. "An AI, designed to manage production in a factory, is given the final goal of maximizing the manufacturing of paperclips, and proceeds by converting first the Earth and then increasingly large chunks of the observable universe into paperclips."
^ Bostrom 2014，第7章.
^ Bostrom, Nick. Ethical Issues in Advanced Artificial Intelligence. 2003 [2023-03-03]. （原始內容存檔於2018-10-08）.
^ Will Artificial Intelligence Doom The Human Race Within The Next 100 Years?. HuffPost. 2014-08-22 [2023-03-03]. （原始內容存檔於2023-04-13）（英語）.
^ Ford, Paul. Are We Smart Enough to Control Artificial Intelligence?. MIT Technology Review. 11 February 2015 [25 January 2016]. （原始內容存檔於2016-01-23）.
^ Friend, Tad. Sam Altman's Manifest Destiny. The New Yorker. 3 October 2016 [25 November 2017]. （原始內容存檔於2017-05-17）.
^ Ring, Mark; Orseau, Laurent. Schmidhuber, Jürgen; Thórisson, Kristinn R.; Looks, Moshe , 編. Delusion, Survival, and Intelligent Agents. Artificial General Intelligence (Berlin, Heidelberg: Springer). 2011 [2023-03-03]. ISBN 978-3-642-22887-2. doi:10.1007/978-3-642-22887-2_2. （原始內容存檔於2022-06-28）（英語）.
^ Delusion Box - LessWrong. www.lesswrong.com. [2023-03-03]. （原始內容存檔於2022-09-27）（英語）.
^ Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.
^ Kaelbling, L. P.; Littman, M. L.; Moore, A. W. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research. 1 May 1996, 4: 237–285. doi:10.1613/jair.301  .
^ Ring M., Orseau L. (2011) Delusion, Survival, and Intelligent Agents. In: Schmidhuber J., Thórisson K.R., Looks M. (eds) Artificial General Intelligence. AGI 2011. Lecture Notes in Computer Science, vol 6830. Springer, Berlin, Heidelberg.
^ Yampolskiy, Roman; Fox, Joshua. Safety Engineering for Artificial General Intelligence. Topoi. 24 August 2012. S2CID 144113983. doi:10.1007/s11245-012-9128-9.
^ Yampolskiy, Roman V. What to Do with the Singularity Paradox?. Philosophy and Theory of Artificial Intelligence. Studies in Applied Philosophy, Epistemology and Rational Ethics. 2013, 5: 397–413. ISBN 978-3-642-31673-9. doi:10.1007/978-3-642-31674-6_30.
^ Seward, John P. Drive, incentive, and reinforcement.. Psychological Review. 1956, 63 (3): 195–203. PMID 13323175. doi:10.1037/h0048229.
^ Omohundro, Stephen M. The basic AI drives. Artificial General Intelligence 2008 171. February 2008: 483–492. CiteSeerX 10.1.1.393.8356  . ISBN 978-1-60750-309-5.
^ Bostrom 2014，footnote 8 to chapter 7
^ Dewey, Daniel. Learning What to Value. Artificial General Intelligence. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer: 309–314. 2011. ISBN 978-3-642-22887-2. doi:10.1007/978-3-642-22887-2_35.

參考書籍

Bostrom, Nick. Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press. 2014. ISBN 9780199678112.

[1] 既包括人類，也包括未來可能出現的，智力與人類相當的人工智能系統。下文簡稱其為「智能體」。

[12] AIXI is an uncomputable ideal agent that cannot be fully realized in the real world.

[13] Technically, in the presence of uncertainty, AIXI attempts to maximize its "expected utility", the expected value of its objective function.

[15] A standard reinforcement learning agent is an agent that attempts to maximize the expected value of a future time-discounted integral of its reward function.^[11]

[16] The role of the delusion box is to simulate an environment where an agent gains an opportunity to wirehead itself. A delusion box is defined here as an agent-modifiable "delusion function" mapping from the "unmodified" environmental feed to a "perceived" environmental feed; the function begins as the identity function, but as an action the agent can alter the delusion function in any way the agent desires.

[aama-2] 1.0 ^1.1 ^1.2 Russell, Stuart J.; Norvig, Peter. Section 26.3: The Ethics and Risks of Developing Artificial Intelligence. Artificial Intelligence: A Modern Approach. Upper Saddle River, N.J.: Prentice Hall. 2003. ISBN 978-0137903955. 類似的，馬文·明斯基曾表示設計為解決黎曼猜想的人工智能可能最終會將地球上所有資源都用於建設算力強大的超級計算機，以幫助其達成最終目標。

[3] Bostrom 2014，Chapter 8, p. 123. "An AI, designed to manage production in a factory, is given the final goal of maximizing the manufacturing of paperclips, and proceeds by converting first the Earth and then increasingly large chunks of the observable universe into paperclips."

[FOOTNOTEBostrom2014第7章-4] Bostrom 2014，第7章.

[:0-5] Bostrom, Nick. Ethical Issues in Advanced Artificial Intelligence. 2003 [2023-03-03]. （原始內容存檔於2018-10-08）.

[6] Will Artificial Intelligence Doom The Human Race Within The Next 100 Years?. HuffPost. 2014-08-22 [2023-03-03]. （原始內容存檔於2023-04-13）（英語）.

[7] Ford, Paul. Are We Smart Enough to Control Artificial Intelligence?. MIT Technology Review. 11 February 2015 [25 January 2016]. （原始內容存檔於2016-01-23）.

[8] Friend, Tad. Sam Altman's Manifest Destiny. The New Yorker. 3 October 2016 [25 November 2017]. （原始內容存檔於2017-05-17）.

[9] Ring, Mark; Orseau, Laurent. Schmidhuber, Jürgen; Thórisson, Kristinn R.; Looks, Moshe , 編. Delusion, Survival, and Intelligent Agents. Artificial General Intelligence (Berlin, Heidelberg: Springer). 2011 [2023-03-03]. ISBN 978-3-642-22887-2. doi:10.1007/978-3-642-22887-2_2. （原始內容存檔於2022-06-28）（英語）.

[10] Delusion Box - LessWrong. www.lesswrong.com. [2023-03-03]. （原始內容存檔於2022-09-27）（英語）.

[11] Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.

[14] Kaelbling, L. P.; Littman, M. L.; Moore, A. W. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research. 1 May 1996, 4: 237–285. doi:10.1613/jair.301  .

[17] Ring M., Orseau L. (2011) Delusion, Survival, and Intelligent Agents. In: Schmidhuber J., Thórisson K.R., Looks M. (eds) Artificial General Intelligence. AGI 2011. Lecture Notes in Computer Science, vol 6830. Springer, Berlin, Heidelberg.

[18] Yampolskiy, Roman; Fox, Joshua. Safety Engineering for Artificial General Intelligence. Topoi. 24 August 2012. S2CID 144113983. doi:10.1007/s11245-012-9128-9.

[19] Yampolskiy, Roman V. What to Do with the Singularity Paradox?. Philosophy and Theory of Artificial Intelligence. Studies in Applied Philosophy, Epistemology and Rational Ethics. 2013, 5: 397–413. ISBN 978-3-642-31673-9. doi:10.1007/978-3-642-31674-6_30.

[20] Seward, John P. Drive, incentive, and reinforcement.. Psychological Review. 1956, 63 (3): 195–203. PMID 13323175. doi:10.1037/h0048229.

[21] Omohundro, Stephen M. The basic AI drives. Artificial General Intelligence 2008 171. February 2008: 483–492. CiteSeerX 10.1.1.393.8356  . ISBN 978-1-60750-309-5.

[22] Bostrom 2014，footnote 8 to chapter 7

[23] Dewey, Daniel. Learning What to Value. Artificial General Intelligence. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer: 309–314. 2011. ISBN 978-3-642-22887-2. doi:10.1007/978-3-642-22887-2_35.

[a]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[b]

[c]

[d]

[e]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[11]