主题:【讨论】apache 设定压缩传送和缓存控制的一些问题 -- 铁手
想达到两个目标。一、对非图片,非视频内容,在客户浏览器支持的情况下,进行压缩传输;二、尽可能的支持客户端浏览器的缓存功能。对于一般情况下没有变动的内容,比如图片,可以缓存较长时间。对于一些时不时有修改的,比如 .css 和 .js 文件,则希望每次页面访问都到服务器去看一下是不是有更新,如果没有则返回 HTTP/1.1 304 Not Modified,如果有更新,就下载新的并缓存。还有一些内容,禁止缓存。还要附带考虑一下代理缓存的问题。
想尽可能在 apache httpd 中实现,必要的时候,在程序中补充实现。
压缩相关的:
<IfModule mod_deflate.c>
AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css text/javascript application/javascript application/x-javascript
</IfModule>
因为大多数图片都是已经压缩过的,所以不必在传输时再来压缩一次。静态图片正常,查看 header 信息后,确认没有被再压缩。但是动态产生的图片,比如用 php readfile 时,header 中有 Content-Encoding: gzip 而 Content-Type: image/jpeg,这按理应该不会被压缩的啊。
要怎么做,动态产生的图片才不会再被压缩一次?
缓存相关的:
<IfModule mod_expires.c>
ExpiresActive On
ExpiresDefault A864000
ExpiresByType text/html A300
</IfModule>
对于大部分的内容,设定足够长的缓存时间。
<IfModule mod_headers.c>
Header set Vary Accept-Encoding
<FilesMatch "\.(ico|gif|jpg|png|js|css)$">
Header append Cache-Control "public, must-revalidate"
</FilesMatch>
</IfModule>
特别是对于 .js 和 .css 文件,希望能够在每个页面访问时,都去检查一下服务器,看看是不是有内容改动。上面的设置似乎无效,改动的js没有被更新。补充,后来试验了一下用 no-cache,似乎就可以了,每次的都会去服务器询问是否有改变,而不是用了本地缓存就不管了。
这个 must-revalidate 和 no-cache 到底有什么区别啊?看说明,no-cache 需要到服务器确认后才能使用缓存。从字面意思,这个应该是 must-revalidate 的效果啊。难道说,must-revalidate是对客户端的缓存进行确认,比如缓存是不是到期了?而 no-cache 是对服务器端进行确认,比如缓存和内容源是不是对得上号?
上面的一些设置,大家觉得怎么优化比较好?为什么?
must-revalidate 和 no-cache不是apache自定义的,而是HTTP 1.1的定义。
must-revalidate的定义在第14章(header 定义部分)的第8节(认证)
A user agent that wishes to authenticate itself with a server--
usually, but not necessarily, after receiving a 401 response--does
so by including an Authorization request-header field with the
request. The Authorization field value consists of credentials
containing the authentication information of the user agent for
the realm of the resource being requested.
Authorization = "Authorization" ":" credentials
HTTP access authentication is described in "HTTP Authentication:
Basic and Digest Access Authentication" [43]. If a request is
authenticated and a realm specified, the same credentials SHOULD
be valid for all other requests within this realm (assuming that
the authentication scheme itself does not require otherwise, such
as credentials that vary according to a challenge value or using
synchronized clocks).
When a shared cache (see section 13.7) receives a request
containing an Authorization field, it MUST NOT return the
corresponding response as a reply to any other request, unless one
of the following specific exceptions holds:
1. If the response includes the "s-maxage" cache-control
directive, the cache MAY use that response in replying to a
subsequent request. But (if the specified maximum age has
passed) a proxy cache MUST first revalidate it with the origin
server, using the request-headers from the new request to allow
the origin server to authenticate the new request. (This is the
defined behavior for s-maxage.) If the response includes "s-
maxage=0", the proxy MUST always revalidate it before re-using
it.
2. If the response includes the "must-revalidate" cache-control
directive, the cache MAY use that response in replying to a
subsequent request. But if the response is stale, all caches
MUST first revalidate it with the origin server, using the
request-headers from the new request to allow the origin server
to authenticate the new request.
3. If the response includes the "public" cache-control directive,
it MAY be returned in reply to any subsequent request.
no-cache的定义在14章第9.1节
By default, a response is cacheable if the requirements of the request method, request header fields, and the response status indicate that it is cacheable. Section 13.4 summarizes these defaults for cacheability. The following Cache-Control response directives allow an origin server to override the default cacheability of a response:
public
Indicates that the response MAY be cached by any cache, even if it would normally be non-cacheable or cacheable only within a non- shared cache. (See also Authorization, section 14.8, for additional details.)
private
Indicates that all or part of the response message is intended for a single user and MUST NOT be cached by a shared cache. This allows an origin server to state that the specified parts of the
response are intended for only one user and are not a valid response for requests by other users. A private (non-shared) cache MAY cache the response.
Note: This usage of the word private only controls where the response may be cached, and cannot ensure the privacy of the message content.
no-cache
If the no-cache directive does not specify a field-name, then a cache MUST NOT use the response to satisfy a subsequent request without successful revalidation with the origin server. This allows an origin server to prevent caching even by caches that have been configured to return stale responses to client requests.
If the no-cache directive does specify one or more field-names, then a cache MAY use the response to satisfy a subsequent request, subject to any other restrictions on caching. However, the specified field-name(s) MUST NOT be sent in the response to a subsequent request without successful revalidation with the origin server. This allows an origin server to prevent the re-use of certain header fields in a response, while still allowing caching of the rest of the response.
Note: Most HTTP/1.0 caches will not recognize or obey this directive.
更多内容可以看w3.org中的http 1.1RFC全文.
注意:服务器不一定实现http 1.1定义,apache的行为需要测试才能确定。
第一个问题,测试过确实如此。在开启压缩的Apache中,使用以上配置会压缩PHP页面,可能导致生成的图片也被压缩。
readfile的文档里有人提到,可以在使用readfile的PHP页面中加入
经测试有效。
php源文件
< ? php header('Content-Type: image/jpeg'); readfile("http://img3.cache.netease.com/www/logo/logo_png.png"); ? >
测试结果:
HTTP/1.1 200 OK Date: Sat, 10 Nov 2012 21:04:40 GMT Server: Apache Vary: Accept-Encoding Content-Encoding: gzip Content-Length: 20 Connection: close Content-Type: image/jpeg
增加上述语句后测试
HTTP/1.1 200 OK Date: Sat, 10 Nov 2012 21:06:51 GMT Server: Apache Connection: close Content-Type: image/jpeg
第二个问题:Apache默认开启last-modified选项,对静态文件进行缓存处理,可以查看相关文档。
第二个问题,似乎还不是那么容易。暂时先算了,大不了每次更换版本号改文件名好了。
需要和 max-age=0 来一起使用达到 no-cache 的效果。
这样的选项肯定是要占用资源的,节省的带宽与占用的cpu和内存权衡是否值得,外人很难回答。也许直接提出您考虑的问题,大家可以给更合适的建议。作为用户的感觉是网站的performance大体还好,老话儿说“ain't broke don't fix it”。
除非是有了大问题,需要quick fix,我们做这样的改进都要先建立benchmark,否则做了也难评估效果,“到底是好了一点点还是坏了一点?”
从新用户到熟悉的过程,我个人感觉网站的usability有很多问题。分两个方面,一个是基本的使用,另一个是这里特有的用户角色的一些概念,象积分乐善、两种官阶、花通宝等等。很多用户使用很长时间还不甚了了,又没有明确的文字帮助。是否能发动大家写写?并且收集到帮助里去?象eraser的贴子对我就很有帮助。