网络存档

网络存档是指人们将万维网网站保存在一个地方，以便于未来的研究人员、历史学家和公众使用。因为许多网站会关闭以及消失，如果不及时保存，网站上的内容将不复存在。^[1]由于网站的规模和数量都非常巨大，通常人们用网络爬虫自动抓取网站内容并将其保存。网站时光机就是负责网络存档的网站之一。国家图书馆、国家档案馆和各种组织也开始保存具有重要文化意义的Web内容^[2]。

历史与发展

互联网档案馆是全球第一个大型网络归档项目，这是布鲁斯特·卡利于1996 年创建的非营利组织。 ^[3]互联网档案馆于2001年发布了自己的搜索引擎网站时光机，用于查看已保存的Web内容。 ^[3]截至 2018 年，互联网档案馆已保存40 PB的数据。 ^[4]

参见

网络存档网站列表（英语：List of Web archiving initiatives）
Archive.is
Archive Team（英语：Archive Team）
Internet Archive
网站鱼拓（英语：Megalodon (website)）
WebCite
网页抓取
网站时光机

参考文献

^ 早期互联网历史存档内容为何如此之少？. BBC. [2022-02-21]. （原始内容存档于2022-03-25）.
^ Truman, Gail. 2016. Web Archiving Environmental Scan. Harvard Library Report. Gail Truman. 2016 [2022-02-21]. （原始内容存档于2019-12-08）.
^ ^3.0 ^3.1 Toyoda, M.; Kitsuregawa, M. The History of Web Archiving. Proceedings of the IEEE. May 2012, 100 (Special Centennial Issue): 1441–1443. ISSN 0018-9219. doi:10.1109/JPROC.2012.2189920.
^ Inside Wayback Machine, the internet's time capsule. The Hustle. September 28, 2018 [July 21, 2020]. （原始内容存档于2018-10-02）.

[1] 早期互联网历史存档内容为何如此之少？. BBC. [2022-02-21]. （原始内容存档于2022-03-25）.

[2] Truman, Gail. 2016. Web Archiving Environmental Scan. Harvard Library Report. Gail Truman. 2016 [2022-02-21]. （原始内容存档于2019-12-08）.

[kitsuregawa-3] 3.0 ^3.1 Toyoda, M.; Kitsuregawa, M. The History of Web Archiving. Proceedings of the IEEE. May 2012, 100 (Special Centennial Issue): 1441–1443. ISSN 0018-9219. doi:10.1109/JPROC.2012.2189920.

[4] Inside Wayback Machine, the internet's time capsule. The Hustle. September 28, 2018 [July 21, 2020]. （原始内容存档于2018-10-02）.

[1]

[2]

[3]

[4]