網絡存檔

網絡存檔是指人們將萬維網網站保存在一個地方，以便於未來的研究人員、歷史學家和公眾使用。因為許多網站會關閉以及消失，如果不及時保存，網站上的內容將不復存在。^[1]由於網站的規模和數量都非常巨大，通常人們用網路爬蟲自動抓取網站內容並將其保存。網站時光機就是負責網絡存檔的網站之一。國家圖書館、國家檔案館和各種組織也開始保存具有重要文化意義的Web內容^[2]。

歷史與發展

互聯網檔案館是全球第一個大型網絡歸檔項目，這是布魯斯特·卡利於1996 年創建的非營利組織。 ^[3]互聯網檔案館於2001年發布了自己的搜索引擎網站時光機，用於查看已保存的Web內容。 ^[3]截至 2018 年，互聯網檔案館已保存40 PB的數據。 ^[4]

參見

網絡存檔網站列表（英語：List of Web archiving initiatives）
Archive.is
Archive Team（英語：Archive Team）
Internet Archive
網站魚拓（英語：Megalodon (website)）
WebCite
網頁抓取
網站時光機

參考文獻

^ 早期互联网历史存档内容为何如此之少？. BBC. [2022-02-21]. （原始內容存檔於2022-03-25）.
^ Truman, Gail. 2016. Web Archiving Environmental Scan. Harvard Library Report. Gail Truman. 2016 [2022-02-21]. （原始內容存檔於2019-12-08）.
^ ^3.0 ^3.1 Toyoda, M.; Kitsuregawa, M. The History of Web Archiving. Proceedings of the IEEE. May 2012, 100 (Special Centennial Issue): 1441–1443. ISSN 0018-9219. doi:10.1109/JPROC.2012.2189920.
^ Inside Wayback Machine, the internet's time capsule. The Hustle. September 28, 2018 [July 21, 2020]. （原始內容存檔於2018-10-02）.

[1] 早期互联网历史存档内容为何如此之少？. BBC. [2022-02-21]. （原始內容存檔於2022-03-25）.

[2] Truman, Gail. 2016. Web Archiving Environmental Scan. Harvard Library Report. Gail Truman. 2016 [2022-02-21]. （原始內容存檔於2019-12-08）.

[kitsuregawa-3] 3.0 ^3.1 Toyoda, M.; Kitsuregawa, M. The History of Web Archiving. Proceedings of the IEEE. May 2012, 100 (Special Centennial Issue): 1441–1443. ISSN 0018-9219. doi:10.1109/JPROC.2012.2189920.

[4] Inside Wayback Machine, the internet's time capsule. The Hustle. September 28, 2018 [July 21, 2020]. （原始內容存檔於2018-10-02）.

[1]

[2]

[3]

[4]