HTTP壓縮

HTTP壓縮是一種內建到網頁伺服器和網頁客戶端中以改進傳輸速度和頻寬利用率的方式。^[1]

HTTP資料在從伺服器傳送前就已壓縮：相容的瀏覽器將在下載所需的格式前宣告支援何種方法給伺服器；不支援壓縮方法的瀏覽器將下載未經壓縮的資料。最常見的壓縮方案包括brotli、gzip和Deflate，但可用方案的完整列表由IANA維護。^[2]此外，第三方可能開發新的方法並納入到其自身的產品，例如Google的面向HTTP共享字典壓縮（SDCH）方案就實現在Google Chrome瀏覽器和使用在Google的伺服器上。

在HTTP中有兩種不同的方式可以完成壓縮。在較低層級，Transfer-Encoding頭可以指示HTTP訊息的有效載荷被壓縮。在較高層級，Content-Encoding頭可以指示一個被轉碼、快取或參照的資源已壓縮。使用Content-Encoding的壓縮比Transfer-Encoding有更廣泛的支援，並且某些瀏覽器不宣告Transfer-Encoding壓縮以避免觸發伺服器的缺陷。^[3]

壓縮方案協商

在大多數情況中（不包括SDCH），協商使用兩個步驟完成，這描述在RFC 2616：

1. 網頁客戶端在HTTP請求的頭部通告其支援的壓縮方案為一個標記列表（tokens）。對於Content-Encoding，這個列表稱作Accept-Encoding；對於Transfer-Encoding，該欄位被稱為TE。

GET /encrypted-area HTTP/1.1
Host: www.example.com
Accept-Encoding: gzip, deflate

2. 如果伺服器支援一種或多種壓縮方案，輸出的資料可能用一種或多種雙方支援的方法壓縮。如果是這種情況，伺服器將在HTTP回應中添加一個Content-Encoding或Transfer-Encoding欄位表明使用的方案，用逗號分隔。

HTTP/1.1 200 OK
Date: Tue, 27 Feb 2018 06:03:16 GMT
Server: Apache/1.3.3.7 (Unix)  (Red-Hat/Linux)
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
Accept-Ranges: bytes
Content-Length: 438
Connection: close
Content-Type: text/html; charset=UTF-8
Content-Encoding: gzip

網頁伺服器本身沒有義務使用任何壓縮方法——這取決於網頁伺服器的內部設定，並可能依賴於網站的內部架構。

在SDCH的情況下，完成一份字典協商也是必須的，其中可能涉及額外的步驟，比如從外部伺服器下載一個合適的字典。

Content-Encoding標記

伺服器和客戶端的標記（token）的官方列表由IANA維護，^[4]它包括：

compress – UNIX的「compress」程式的方法（歷史性，不推薦大多數應用使用，應該使用gzip或deflate）
deflate – 基於deflate演算法（定義於RFC 1951）的壓縮，使用zlib資料格式（RFC 1950）封裝
exi – W3C高效XML交換
gzip – GNU zip格式（定義於RFC 1952）。此方法截至2011年3月，是應用程式支援最廣泛的方法。^[5]
identity – 不轉換內容。這是內容編碼的預設值。
pack200-gzip – 傳輸Java存檔檔案的網路傳輸格式^[6]

除此之外，一些非官方或非標準化的標記也已被一些伺服器或客戶端使用：

br – Brotli，一種新的開源壓縮演算法，專為HTTP內容的編碼而設計，已在Mozilla Firefox 44中實現，並且Chromium正準備實施。
bzip2 – 基於自由格式bzip2的壓縮，被lighttpd支援。^[7]
lzma – 基於原始LZMA的壓縮，在Opera 20中可用，elinks使用一個編譯時選項也可啟用。^[8]
peerdist^[9] – Microsoft對等端內容快取和檢索
sdch^[10]^[11] – Google的面向HTTP共享字典壓縮，基於VCDIFF（RFC 3284）；在最近的Google Chrome、Chromium和Android版本中原生支援，並被Google的網站支援。
xpress - Windows商店（Windows 8及之後版本）的應用程式更新時使用的微軟壓縮協定。可選使用一個霍夫曼編碼的基於LZ77的壓縮。^[12]
xz - 基於LZMA2的內容壓縮，Firefox可使用非官方修補程式支援；^[13]mget自從2013年12月31日已完整實現。^[14]

支援HTTP壓縮的伺服器

SAP NetWeaver
Microsoft IIS：內建或使用第三方模組
Apache HTTP Server，通過mod_deflate（頁面存檔備份，存於網際網路檔案館）（正如其名，只支援gzip^[15]^{[self-published source?}^]^[16]）
Hiawatha HTTP server：伺服器預先壓縮檔案^[17]
Cherokee HTTP server，即時完成gzip和deflate壓縮
Oracle iPlanet Web Server
Zeus Web Server
lighttpd，通過mod_compress和較新的mod_deflate（1.5.x）
nginx – 內建
基於Tornado的應用程式，如果「compress_response」在應用設定中設定為True（對4.0之前的版本，設定「gzip」為True）
Jetty Server – 內建到預設的靜態內容服務並在servlet過濾器組態中可用
GeoServer
Apache Tomcat
IBM Websphere
AOLserver
Ruby Rack，通過Rack::Deflater中介軟體
Varnish – 內建。也可配合ESI

HTTP中的壓縮也可以使用伺服器手稿語言（例如PHP；或者程式語言，例如Java）來實現。

阻礙使用HTTP壓縮的問題

2009年Google工程師Arvind Jain和Jason Glasgow的文章指出，每天有超過99人年的時間由於使用者沒有接收到已壓縮內容而增加的頁面載入時間而浪費^[18]。這可能發生於：反病毒軟體檢查連接導致內容變為未壓縮；使用代理伺服器（網頁伺服器為保相容性而放棄壓縮）；伺服器組態不當；瀏覽器遇到問題而停止使用壓縮。Internet Explorer 6在使用代理伺服器時會回退到使用HTTP 1.0（沒有壓縮、管線等特性）——這是企業環境中的常見組態——這也是主流瀏覽器最常遇到的，回落到未壓縮HTTP的情況。^[18]

另一個大規模部署HTTP壓縮遇到的問題是，deflate編碼的定義：HTTP 1.1將deflate編碼定義為將deflate壓縮（RFC 1951）的資料放入一個zlib格式的資料流（RFC 1950），而微軟伺服器和客戶端產品歷來將它實現為「原樣」（"raw"）資料流，^[19]這使其部署是不可靠的。^[20]^[21]出於此原因，部分軟體（包括Apache HTTP Server）只實現gzip編碼。

安全問題

2012年，一種對資料壓縮不利的普遍性攻擊被公布，被稱為CRIME。CRIME攻擊可能對大量協定產生效果，包括但不限於TLS以及應用層協定（例如SPDY或HTTP）。只有針對TLS和SPDY的攻擊被論證，並且在瀏覽器和伺服器中得到了大幅緩解。CRIME利用的HTTP壓縮沒有得到全面的緩解，即使CRIME的作者已經警告說，該漏洞的影響範圍可能比SPDY和TLS的壓縮更廣泛。

2013年，涉及HTTP壓縮的CRIME攻擊新實例被發布，被稱為BREACH（英語：BREACH）。BREACH攻擊可以在30秒內從TLS加密的網頁流量中提取登入權杖、電子郵件位址或其他敏感資訊（時間取決於要提取的位元組數），這也可能使攻擊者誘騙受害者訪問惡意的網站連結^[可疑]。^[22]TLS和SSL的所有版本都受到了BREACH的影響，無論使用何種加密演算法或密碼本。^[23] 不同於以往的CRIME實例，那些都可以通過關閉TLS壓縮或SPDY頭壓縮緩解攻擊；BREACH利用的HTTP壓縮基本上不能關閉，因為幾乎所有網頁伺服器都依賴它提高與使用者的資料傳輸速度。^[22]

參考文獻

^ Using HTTP Compression (IIS 6.0). Microsoft Corporation. [9 February 2010]. （原始內容存檔於2011-12-14）.
^ RFC 2616, Section 3.5: "The Internet Assigned Numbers Authority (IANA) acts as a registry for content-coding value tokens."
^ 'RFC2616 "Transfer-Encoding: gzip, chunked" not handled properly' （頁面存檔備份，存於網際網路檔案館）, Chromium Issue 94730
^ Hypertext Transfer Protocol Parameters - HTTP Content Coding Registry. IANA. [18 April 2014]. （原始內容存檔於2016-05-16）.
^ Compression Tests: Results. Verve Studios, Co. [19 July 2012]. （原始內容存檔於2012年3月21日）.
^ JSR 200: Network Transfer Format for Java Archives. The Java Community Process Program. [2016-05-15]. （原始內容存檔於2016-05-06）.
^ ModCompress - Lighttpd. lighty labs. [18 April 2014]. （原始內容存檔於2016-05-10）.
^ elinks LZMA decompression. [2016-05-15]. （原始內容存檔於2016-04-18）.
^ [MS-PCCRTP]: Peer Content Caching and Retrieval: Hypertext Transfer Protocol (HTTP) Extensions. Microsoft. [19 April 2014]. （原始內容存檔於2012-03-20）.
^ Butler, Jon; Wei-Hsin Lee; McQuade, Bryan; Mixter, Kenneth. A Proposal for Shared Dictionary Compression Over HTTP (PDF). Google. [2016-05-15]. （原始內容存檔 (PDF)於2016-04-15）.
^ SDCH Mailing List. Google Groups. [2016-05-15]. （原始內容存檔於2013-03-02）.
^ [MS-XCA]: Xpress Compression Algorithm. [29 August 2015]. （原始內容存檔於2016-05-17）.
^ LZMA2 Compression - MozillaWiki. [18 April 2014]. （原始內容存檔於2016-03-04）.
^ mget GitHub project page. [May 2014]. （原始內容存檔於2017-04-09）.
^ HOWTO: Use Apache mod_deflate To Compress Web Content (Accept-Encoding: gzip). Mark S. Kolich. [23 March 2011]. （原始內容存檔於2011-08-20）.
^ mod_deflate - Apache HTTP Server Version 2.4 - Supported Encodings. [2016-05-15]. （原始內容存檔於2016-05-07）.
^ Extra part of Hiawatha webserver's manual. [2016-05-15]. （原始內容存檔於2016-03-22）.
^ ^18.0 ^18.1 Use compression to make the web faster. Google Developers. [22 May 2013]. （原始內容存檔於2014-06-25）. 參照錯誤：帶有name屬性「google-use-compression」的<ref>標籤用不同內容定義了多次
^ deflate - Why are major web sites using gzip?. Stack Overflow. [18 April 2014]. （原始內容存檔於2016-04-12）.
^ Compression Tests: About. Verve Studios. [18 April 2014]. （原始內容存檔於2015年1月2日）.
^ Lose the wait: HTTP Compression. Zoompf Web Performance. [18 April 2014]. （原始內容存檔於2016-04-04）.
^ ^22.0 ^22.1 Goodin, Dan. Gone in 30 seconds: New attack plucks secrets from HTTPS-protected pages. Ars Technica. Condé Nast. 1 August 2013 [2 August 2013]. （原始內容存檔於2014-07-01）. 參照錯誤：帶有name屬性「Gooin20130801」的<ref>標籤用不同內容定義了多次
^ Leyden, John. Step into the BREACH: New attack developed to read encrypted web data. The Register. 2 August 2013 [2 August 2013]. （原始內容存檔於2016-04-30）.

外部連結

RFC 2616: Hypertext Transfer Protocol – HTTP/1.1
HTTP Content-Coding Values（頁面存檔備份，存於網際網路檔案館） by Internet Assigned Numbers Authority
Compression with lighttpd（頁面存檔備份，存於網際網路檔案館）
Coding Horror: HTTP Compression on IIS 6.0（頁面存檔備份，存於網際網路檔案館）
15 Seconds: Web Site Compression at the Wayback Machine （2011年7月16日存檔）
HTTP Compression: resource page by the founder of VIGOS AG, Constantin Rack
Using HTTP Compression（頁面存檔備份，存於網際網路檔案館） by Martin Brown of Server Watch
Using HTTP Compression in PHP（頁面存檔備份，存於網際網路檔案館）
Dynamic and static HTTP compression with Apache httpd
Browser HTTP Compression Test（頁面存檔備份，存於網際網路檔案館）
Check HTTP compression（頁面存檔備份，存於網際網路檔案館）
Check Gzip Compression Online（頁面存檔備份，存於網際網路檔案館）

[1] Using HTTP Compression (IIS 6.0). Microsoft Corporation. [9 February 2010]. （原始內容存檔於2011-12-14）.

[2] RFC 2616, Section 3.5: "The Internet Assigned Numbers Authority (IANA) acts as a registry for content-coding value tokens."

[3] 'RFC2616 "Transfer-Encoding: gzip, chunked" not handled properly' （頁面存檔備份，存於網際網路檔案館）, Chromium Issue 94730

[4] Hypertext Transfer Protocol Parameters - HTTP Content Coding Registry. IANA. [18 April 2014]. （原始內容存檔於2016-05-16）.

[5] Compression Tests: Results. Verve Studios, Co. [19 July 2012]. （原始內容存檔於2012年3月21日）.

[6] JSR 200: Network Transfer Format for Java Archives. The Java Community Process Program. [2016-05-15]. （原始內容存檔於2016-05-06）.

[7] ModCompress - Lighttpd. lighty labs. [18 April 2014]. （原始內容存檔於2016-05-10）.

[8] s LZMA decompression. [2016-05-15]. （原始內容存檔於2016-04-18）.

[9] [MS-PCCRTP]: Peer Content Caching and Retrieval: Hypertext Transfer Protocol (HTTP) Extensions. Microsoft. [19 April 2014]. （原始內容存檔於2012-03-20）.

[10] Butler, Jon; Wei-Hsin Lee; McQuade, Bryan; Mixter, Kenneth. A Proposal for Shared Dictionary Compression Over HTTP (PDF). Google. [2016-05-15]. （原始內容存檔 (PDF)於2016-04-15）.

[11] SDCH Mailing List. Google Groups. [2016-05-15]. （原始內容存檔於2013-03-02）.

[12] [MS-XCA]: Xpress Compression Algorithm. [29 August 2015]. （原始內容存檔於2016-05-17）.

[13] LZMA2 Compression - MozillaWiki. [18 April 2014]. （原始內容存檔於2016-03-04）.

[14] t GitHub project page. [May 2014]. （原始內容存檔於2017-04-09）.

[15] HOWTO: Use Apache mod_deflate To Compress Web Content (Accept-Encoding: gzip). Mark S. Kolich. [23 March 2011]. （原始內容存檔於2011-08-20）.

[16] _deflate - Apache HTTP Server Version 2.4 - Supported Encodings. [2016-05-15]. （原始內容存檔於2016-05-07）.

[17] Extra part of Hiawatha webserver's manual. [2016-05-15]. （原始內容存檔於2016-03-22）.

[google-use-compression-18] 18.0 ^18.1 Use compression to make the web faster. Google Developers. [22 May 2013]. （原始內容存檔於2014-06-25）. 參照錯誤：帶有name屬性「google-use-compression」的<ref>標籤用不同內容定義了多次

[19] te - Why are major web sites using gzip?. Stack Overflow. [18 April 2014]. （原始內容存檔於2016-04-12）.

[20] Compression Tests: About. Verve Studios. [18 April 2014]. （原始內容存檔於2015年1月2日）.

[21] Lose the wait: HTTP Compression. Zoompf Web Performance. [18 April 2014]. （原始內容存檔於2016-04-04）.

[Gooin20130801-22] 22.0 ^22.1 Goodin, Dan. Gone in 30 seconds: New attack plucks secrets from HTTPS-protected pages. Ars Technica. Condé Nast. 1 August 2013 [2 August 2013]. （原始內容存檔於2014-07-01）. 參照錯誤：帶有name屬性「Gooin20130801」的<ref>標籤用不同內容定義了多次

[23] Leyden, John. Step into the BREACH: New attack developed to read encrypted web data. The Register. 2 August 2013 [2 August 2013]. （原始內容存檔於2016-04-30）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]