模仿學習

模仿學習是社會學習中的一種，個體通過模仿獲得新的行為。^[1] 模仿有助於溝通、社交互動以及調節自己的情緒以考慮他人情緒的能力，並且「對於健康的感覺運動發展和社會功能至關重要」。^[1] 人類和動物都具有將自己的行為與他人觀察到的行為相匹配的能力。^[1] 模仿學習在人類社會文化發展中發揮着重要作用。^[2] 模仿學習與觀察學習不同，因為它需要複製示範所表現出的行為，而觀察學習可以在發生在學習者觀察到不想要的行為及其後續後果時，並因此學會避免該行為時發生。

動物的模仿學習

在最基本的層面上，AL Saggerson、David N. George 和 RC Honey 的研究顯示，鴿子通過觀察示範鴿學習一項基本過程，以獲取獎勵。^[3] 示範鴿接受訓練，對一種刺激（例如紅光）作出啄食麵板的反應，並對另一種刺激（例如綠光）作出在面板上跳躍的反應。在示範鴿熟練掌握任務後，其他學習的鴿子被置於視頻監控的觀察室。經過第二次觀察試驗後，這些學習的鴿子被單獨放入示範鴿的籠子，並進行相同的測試。學習的鴿子在任務中表現出良好的表現，這表明它們在觀察過程中形成了反應-結果的聯繫。然而，研究人員指出，這些結果的另一種解釋可能是，學習的鴿子反而獲得了指導其行為的結果-反應關聯。需要進一步測試以確定這種解釋是否有效。此外，切斯勒也進行了一項類似的研究，比較了看見母貓按下槓桿獲取食物與未見過此情況的小貓。^[4] 提供閃爍燈光形式的刺激，之後小貓必須按下槓桿才能獲得食物獎勵。實驗使用閃爍燈光作為刺激，小貓隨後必須按下槓桿才能獲得食物獎勵。該研究測試了三組小貓的反應：一組在嘗試任務之前觀察母貓的表現，一組觀察陌生雌性的表現，以及一組沒有觀察者，在完成任務時必須通過反覆試驗（控制組）。研究發現，在觀察母貓執行任務之前，小貓比觀察陌生雌性反應的更快地獲得按下槓桿的反應。未經觀察而進行任務的小貓從未獲得此反應。這一結果表明小貓通過模仿學習。此外，該研究還推測模仿學習的優先性（相對於試驗結束錯誤）是可能是由於對母貓的社會和生物反應（一種學習偏差）。

是否動物具有真正的模仿能力是一個備受爭議的議題。要將一個動作視為模仿學習的例證，動物必須觀察並重現模型所展示的特定運動模式。一些研究人員提出的證據表明，非靈長類動物中並不存在真正的模仿現象，而觀察性學習所展示的認知複雜性相對較低，例如刺激加強。^[5] ^[6] 相反，黑猩猩更傾向於通過模仿而非真正的模仿來學習。然而，圈養的黑猩猩是一個例外，它們像孩子一樣被人工撫養長大。在巴特爾曼等人的研究中，發現人工飼養的黑猩猩的行為類似於幼兒，甚至會模仿那些與實現預期目標無關的行為。^[7] 在其他關於真正模仿的研究中，一些混雜的黑猩猩甚至在初次觀察模型一段時間後也會模仿其行為。 ^[8] ^[9]

人類的模仿學習

模仿學習在人類身上已經有着充分的研究記錄，通常被用作靈長類動物模仿學習研究中的對照組。 ^[8] ^[9] 霍納和懷頓的研究比較了（非文化的）黑猩猩和人類兒童的行為，發現兒童過度模仿了不必要的行為。^[10] 在這項研究中，3-4歲的兒童和黑猩猩被呈現一系列動作來打開一個不透明的拼圖盒子，裡面有獎勵。打開盒子需要執行其中兩項操作，但其中一項操作是不必要的，儘管受試者不知道這一點。一名示範者完成了打開盒子的所有三個動作，隨後黑猩猩和孩子們都嘗試了這項任務。孩子們和黑猩猩都模仿了這三種行為，並且在盒子裡收到了獎勵。研究的下一階段涉及透明盒子而不是不透明盒子。由於這個盒子是透明的，可以清楚地看到，這三個動作中的任何一個都不是必要的以獲得獎勵。黑猩猩沒有執行不必要的動作，只執行了實現預期目標所必需的兩個動作。而幼兒模仿了所有這三個動作，儘管他們可以選擇性地忽略不相關的動作。對此的一個解釋是人類遵循慣例。克萊格和勒加雷的研究通過向幼兒展示一種製作項鍊的方法來測試這一點。^[11] 在示範中，模特添加了一個步驟，這對於實現完成項鍊的最終目標並不是必需的。在一次演示中，模特使用語言提示告訴孩子們，製作項鍊是有幫助的，例如，「我要製作一條項鍊。讓我們看看我在做什麼。我要製作一條項鍊。」^[12] 在另一個演示中，模特使用語言提示暗示他們按照慣例製作項鍊，例如，「我總是這樣做。每個人總是這樣做。讓我們看看我在做什麼。每個人總是這樣做。」^[12] 在常規條件下，孩子們複製模特的行為更為忠實，包括不必要的步驟。而在儀器條件下，他們沒有複製不必要的步驟。研究表明，孩子們會辨別何時模仿，並將慣例視為模仿行為以適應慣例的一個顯著原因。從他人的行為中獲取正確行為的線索，而不是使用獨立的判斷，被稱為從眾偏見。最近的研究表明，人類在選擇模仿誰的行為時也會受到其他偏見的影響。人類會模仿他們認為在自己也希望在該領域取得成功的成功人士（成功偏見），以及其他人優先向其學習的受人尊敬、有聲望的個人（聲望偏見）。^[13] 在Chudek等人的研究中，注意力提示被用來向孩子們表明某個特定的模特是有聲望的。^[14] 在一項由兩名模特以不同方式玩玩具的實驗中，兩名觀察者觀看這位享有盛譽的模特10秒鐘，從而表明了其聲望。研究發現，孩子們會注意到象徵聲望的暗示，並優先模仿有聲望的模特。研究表明，這種偏見有助於人類直接和間接地獲取個人擁有值得學習的知識的線索。這些線索可能導致人類模仿有害行為。當試圖自殺的人模仿他們在媒體上聽說或看到的自殺企圖的方法時，就會發生模仿自殺，名人自殺後自殺企圖顯著增加（參見維特效應）。由於大批人模仿一個或一組模特的行為，自殺可以像流行病一樣通過社交網絡傳播（參見藍鯨挑戰）。

機器人技術中的模仿學習

模仿學習可以在機器人技術中作為傳統強化學習的替代方案。統的強化學習算法通常從隨機動作開始，並試圖自主發現正確的動作序列以實現預定目標。而，這種方法在機器人技術中可能遇到困難，因為獎勵信號往往是極其稀疏的（例如，機器人只能在成功或失敗這兩種狀態之間選擇，而沒有中間狀態）。如果成功需要機器人執行一系列複雜的動作，那麼強化學習算法可能會在訓練過程中難以取得進展，困於低獎勵區間。^[15] 模仿學習可以被用來創建一組成功的示例，供強化學習算法學習。這種方法涉及讓人類研究人員手動駕駛機器人，並記錄所採取的動作。這些成功示例能夠比純隨機行為更好地引導強化學習算法朝着正確的方向發展。^[16]

參考

^ ^1.0 ^1.1 ^1.2 Ganos C, Ogrzal T, Schnitzler A, Münchau A. The pathophysiology of echopraxia/echolalia: relevance to Gilles de la Tourette syndrome. Mov. Disord. September 2012, 27 (10): 1222–9. PMID 22807284. S2CID 22422642. doi:10.1002/mds.25103.
^ Heyes C. Grist and mills: on the cultural origins of cultural learning. Philos Trans R Soc Lond B Biol Sci. Aug 5, 2012, 367 (1599): 2181–91. PMC 3385685  . PMID 22734061. doi:10.1098/rstb.2012.0120.
^ Saggerson, George; Honey. Imitative Learning of Stimulus-Response and Response-Outcome Associations in Pigeons. Journal of Experimental Psychology: Animal Behavior Processes. 2005, 31 (3): 289–300. PMID 16045384. doi:10.1037/0097-7403.31.3.289.
^ Chesler, P. Maternal Influence in Learning by Observation in Kittens. Science. 1969, 166 (3907): 901–903. Bibcode:1969Sci...166..901C. ISSN 0036-8075. PMID 5345208. S2CID 683297. doi:10.1126/science.166.3907.901 （英語）.
^ Byrne, Richard W.; Russon, Anne E. Learning by imitation: A hierarchical approach. Behavioral and Brain Sciences. 1998, 21 (5): 667–684. ISSN 0140-525X. PMID 10097023. S2CID 988905. doi:10.1017/S0140525X98001745 （英語）.
^ Zentall, Thomas R. Imitation: definitions, evidence, and mechanisms. Animal Cognition. 2006, 9 (4): 335–353. ISSN 1435-9448. PMID 17024510. S2CID 16183221. doi:10.1007/s10071-006-0039-2 （英語）.
^ Buttelmann, David; Carpenter, Malinda; Call, Josep; Tomasello, Michael. Enculturated chimpanzees imitate rationally. Developmental Science. 2007, 10 (4): F31–F38. ISSN 1467-7687. PMID 17552931. doi:10.1111/j.1467-7687.2007.00630.x （英語）.
^ ^8.0 ^8.1 Bjorklund, David F.; Yunger, Jennifer L.; Bering, Jesse M.; Ragan, Patricia. The generalization of deferred imitation in enculturated chimpanzees (Pan troglodytes). Animal Cognition. 2002, 5 (1): 49–58. ISSN 1435-9448. PMID 11957402. S2CID 11537264. doi:10.1007/s10071-001-0124-5 （英語）.
^ ^9.0 ^9.1 Tomasello, Michael; Savage-Rumbaugh, Sue; Kruger, Ann Cale. Imitative Learning of Actions on Objects by Children, Chimpanzees, and Enculturated Chimpanzees. Child Development. 1993, 64 (6): 1688–1705. ISSN 0009-3920. JSTOR 1131463. PMID 8112113. doi:10.2307/1131463.
^ Horner, Victoria; Whiten, Andrew. Causal knowledge and imitation/emulation switching in chimpanzees (Pan troglodytes) and children (Homo sapiens). Animal Cognition. 2005, 8 (3): 164–181. ISSN 1435-9448. PMID 15549502. S2CID 1949770. doi:10.1007/s10071-004-0239-6 （英語）.
^ Clegg, Jennifer M.; Legare, Cristine H. Instrumental and Conventional Interpretations of Behavior Are Associated With Distinct Outcomes in Early Childhood. Child Development. 2015-12-19, 87 (2): 527–542. ISSN 0009-3920. PMID 26682522. doi:10.1111/cdev.12472.
^ ^12.0 ^12.1 Clegg, Jennifer M.; Legare, Cristine H. Instrumental and Conventional Interpretations of Behavior Are Associated With Distinct Outcomes in Early Childhood. Child Development. 2015-12-19, 87 (2): 527–42. ISSN 0009-3920. PMID 26682522. doi:10.1111/cdev.12472.
^ Henrich, J.; Broesch, J. On the nature of cultural transmission networks: evidence from Fijian villages for adaptive learning biases. Philosophical Transactions of the Royal Society B: Biological Sciences. 2011, 366 (1567): 1139–1148. ISSN 0962-8436. PMC 3049092  . PMID 21357236. doi:10.1098/rstb.2010.0323.
^ Chudek, Maciej; Heller, Sarah; Birch, Susan; Henrich, Joseph. Prestige-biased cultural learning: bystander's differential attention to potential models influences children's learning. Evolution and Human Behavior. 2012, 33 (1): 46–56. doi:10.1016/j.evolhumbehav.2011.05.005 （英語）.
^ Xuezhi, Niu. Optimal Gait Control of Soft Quadruped Robot by Model-based Reinforcement Learning. Stockholm, Sweden: Department of Machine Design, KTH Royal Institute of Technology. 2023.
^ Tianhao Zhang; Zoe McCarthy. Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation. 2018-03-06. arXiv:1710.04615v2  [cs.LG].

[Ganos-1] 1.0 ^1.1 ^1.2 Ganos C, Ogrzal T, Schnitzler A, Münchau A. The pathophysiology of echopraxia/echolalia: relevance to Gilles de la Tourette syndrome. Mov. Disord. September 2012, 27 (10): 1222–9. PMID 22807284. S2CID 22422642. doi:10.1002/mds.25103.

[Heyes-2] Heyes C. Grist and mills: on the cultural origins of cultural learning. Philos Trans R Soc Lond B Biol Sci. Aug 5, 2012, 367 (1599): 2181–91. PMC 3385685  . PMID 22734061. doi:10.1098/rstb.2012.0120.

[3] Saggerson, George; Honey. Imitative Learning of Stimulus-Response and Response-Outcome Associations in Pigeons. Journal of Experimental Psychology: Animal Behavior Processes. 2005, 31 (3): 289–300. PMID 16045384. doi:10.1037/0097-7403.31.3.289.

[4] Chesler, P. Maternal Influence in Learning by Observation in Kittens. Science. 1969, 166 (3907): 901–903. Bibcode:1969Sci...166..901C. ISSN 0036-8075. PMID 5345208. S2CID 683297. doi:10.1126/science.166.3907.901 （英語）.

[5] Byrne, Richard W.; Russon, Anne E. Learning by imitation: A hierarchical approach. Behavioral and Brain Sciences. 1998, 21 (5): 667–684. ISSN 0140-525X. PMID 10097023. S2CID 988905. doi:10.1017/S0140525X98001745 （英語）.

[6] Zentall, Thomas R. Imitation: definitions, evidence, and mechanisms. Animal Cognition. 2006, 9 (4): 335–353. ISSN 1435-9448. PMID 17024510. S2CID 16183221. doi:10.1007/s10071-006-0039-2 （英語）.

[7] Buttelmann, David; Carpenter, Malinda; Call, Josep; Tomasello, Michael. Enculturated chimpanzees imitate rationally. Developmental Science. 2007, 10 (4): F31–F38. ISSN 1467-7687. PMID 17552931. doi:10.1111/j.1467-7687.2007.00630.x （英語）.

[:0-8] 8.0 ^8.1 Bjorklund, David F.; Yunger, Jennifer L.; Bering, Jesse M.; Ragan, Patricia. The generalization of deferred imitation in enculturated chimpanzees (Pan troglodytes). Animal Cognition. 2002, 5 (1): 49–58. ISSN 1435-9448. PMID 11957402. S2CID 11537264. doi:10.1007/s10071-001-0124-5 （英語）.

[:1-9] 9.0 ^9.1 Tomasello, Michael; Savage-Rumbaugh, Sue; Kruger, Ann Cale. Imitative Learning of Actions on Objects by Children, Chimpanzees, and Enculturated Chimpanzees. Child Development. 1993, 64 (6): 1688–1705. ISSN 0009-3920. JSTOR 1131463. PMID 8112113. doi:10.2307/1131463.

[10] Horner, Victoria; Whiten, Andrew. Causal knowledge and imitation/emulation switching in chimpanzees (Pan troglodytes) and children (Homo sapiens). Animal Cognition. 2005, 8 (3): 164–181. ISSN 1435-9448. PMID 15549502. S2CID 1949770. doi:10.1007/s10071-004-0239-6 （英語）.

[11] Clegg, Jennifer M.; Legare, Cristine H. Instrumental and Conventional Interpretations of Behavior Are Associated With Distinct Outcomes in Early Childhood. Child Development. 2015-12-19, 87 (2): 527–542. ISSN 0009-3920. PMID 26682522. doi:10.1111/cdev.12472.

[:2-12] 12.0 ^12.1 Clegg, Jennifer M.; Legare, Cristine H. Instrumental and Conventional Interpretations of Behavior Are Associated With Distinct Outcomes in Early Childhood. Child Development. 2015-12-19, 87 (2): 527–42. ISSN 0009-3920. PMID 26682522. doi:10.1111/cdev.12472.

[13] Henrich, J.; Broesch, J. On the nature of cultural transmission networks: evidence from Fijian villages for adaptive learning biases. Philosophical Transactions of the Royal Society B: Biological Sciences. 2011, 366 (1567): 1139–1148. ISSN 0962-8436. PMC 3049092  . PMID 21357236. doi:10.1098/rstb.2010.0323.

[14] Chudek, Maciej; Heller, Sarah; Birch, Susan; Henrich, Joseph. Prestige-biased cultural learning: bystander's differential attention to potential models influences children's learning. Evolution and Human Behavior. 2012, 33 (1): 46–56. doi:10.1016/j.evolhumbehav.2011.05.005 （英語）.

[15] Xuezhi, Niu. Optimal Gait Control of Soft Quadruped Robot by Model-based Reinforcement Learning. Stockholm, Sweden: Department of Machine Design, KTH Royal Institute of Technology. 2023.

[16] Tianhao Zhang; Zoe McCarthy. Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation. 2018-03-06. arXiv:1710.04615v2  [cs.LG].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]