标签为 "gzip" 的存档

curl使用via的header时,不支持gzip

curl -s -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8' -H 'Accept-encoding: gzip,deflate' "http://m.news.haosou.com/" -H "Via: curl"  | gunzip -

使用这条命令去抓取页面, 不传递Via的header时,是能正常执行的,当带上via时,返回内容不再gzip,直接是正常文本。
查询了下相关资料,原来via参数在RFC里是有规范的,不是随便指定的。

见:http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.45


The Via general-header field MUST be used by gateways and proxies to indicate the intermediate protocols and recipients between the user agent and the server on requests, and between the origin server and the client on responses. It is analogous to the "Received" field of RFC 822 [9] and is intended to be used for tracking message forwards, avoiding request loops, and identifying the protocol capabilities of all senders along the request/response chain.

Via = "Via" ":" 1#( received-protocol received-by [ comment ] )
received-protocol = [ protocol-name "/" ] protocol-version
protocol-name = token
protocol-version = token
received-by = ( host [ ":" port ] ) | pseudonym
pseudonym = token
The received-protocol indicates the protocol version of the message received by the server or client along each segment of the request/response chain. The received-protocol version is appended to the Via field value when the message is forwarded so that information about the protocol capabilities of upstream applications remains visible to all recipients.

The protocol-name is optional if and only if it would be "HTTP". The received-by field is normally the host and optional port number of a recipient server or client that subsequently forwarded the message. However, if the real host is considered to be sensitive information, it MAY be replaced by a pseudonym. If the port is not given, it MAY be assumed to be the default port of the received-protocol.

Multiple Via field values represents each proxy or gateway that has forwarded the message. Each recipient MUST append its information such that the end result is ordered according to the sequence of forwarding applications.

Comments MAY be used in the Via header field to identify the software of the recipient proxy or gateway, analogous to the User-Agent and Server header fields. However, all comments in the Via field are optional and MAY be removed by any recipient prior to forwarding the message.

For example, a request message could be sent from an HTTP/1.0 user agent to an internal proxy code-named "fred", which uses HTTP/1.1 to forward the request to a public proxy at nowhere.com, which completes the request by forwarding it to the origin server at www.ics.uci.edu. The request received by www.ics.uci.edu would then have the following Via header field:

Via: 1.0 fred, 1.1 nowhere.com (Apache/1.1)
Proxies and gateways used as a portal through a network firewall SHOULD NOT, by default, forward the names and ports of hosts within the firewall region. This information SHOULD only be propagated if explicitly enabled. If not enabled, the received-by host of any host behind the firewall SHOULD be replaced by an appropriate pseudonym for that host.

For organizations that have strong privacy requirements for hiding internal structures, a proxy MAY combine an ordered subsequence of Via header field entries with identical received-protocol values into a single such entry. For example,

Via: 1.0 ricky, 1.1 ethel, 1.1 fred, 1.0 lucy
could be collapsed to
Via: 1.0 ricky, 1.1 mertz, 1.0 lucy
Applications SHOULD NOT combine multiple entries unless they are all under the same organizational control and the hosts have already been replaced by pseudonyms. Applications MUST NOT combine entries which have different received-protocol values.

Nginx Gzip 压缩配置 IE6禁用

gzip(GNU-ZIP)是一种压缩技术。经过gzip压缩后页面大小可以变为原来的30%甚至更小,这样,用户浏览页面的时候速度会块得多。gzip的压缩页面需要浏览器和服务器双方都支持,实际上就是服务器端压缩,传到浏览器后浏览器解压并解析。

Nginx的压缩输出有一组gzip压缩指令来实现。相关指令位于http{….}两个大括号之间。

gzip on;
//该指令用于开启或关闭gzip模块(on/off)

gzip_min_length 1k;
//设置允许压缩的页面最小字节数,页面字节数从header头得content-length中进行获取。默认值是0,不管页面多大都压缩。建议设置成大于1k的字节数,小于1k可能会越压越大。

gzip_buffers 4 16k;
//设置系统获取几个单位的缓存用于存储gzip的压缩结果数据流。4 16k代表以16k为单位,安装原始数据大小以16k为单位的4倍申请内存。

gzip_http_version 1.1;
//识别http的协议版本(1.0/1.1)

gzip_comp_level 2;
//gzip压缩比,1压缩比最小处理速度最快,9压缩比最大但处理速度最慢(传输快但比较消耗cpu)

gzip_types text/plain application/x-javascript text/css application/xml
//匹配mime类型进行压缩,无论是否指定,”text/html”类型总是会被压缩的。
gzip_vary on;
//和http头有关系,加个vary头,给代理服务器用的,有的浏览器支持压缩,有的不支持,所以避免浪费不支持的也压缩,所以根据客户端的HTTP头来判断,是否需要压缩.

同时由于IE6不支持gizp解压缩,所以在IE6下要关闭gzip压缩功能。使用
gzip_disable “MSIE [1-6].”;

nginx 配置gzip段如下:
gzip on;
gzip_min_length 1k;
gzip_buffers 16 64k;
gzip_http_version 1.1;
gzip_comp_level 2;
gzip_types text/plain application/x-javascript text/css application/xml;
gzip_vary on;
gzip_disable “MSIE [1-6].”;

ThinkPHP 3.1自动开启gzip ,ie6下载文件时文件名与下载头不一致问题的解决

今天同事越到一个奇怪的问题,在下载头中指定了文件名,但是在ie6下,下载时却无法按给定的文件名给出保存,保存文件的名字为站点名称,在查阅一些资料后,确认为是gzip的问题,ie6不支持gzip,所以出现这个问题,但是在想关闭gzip的时候,发现关闭nginx的gzip压缩后,还是会有压缩头输出,后来才发现是ThinkPHP3.1版本新带的功能,自带“页面压缩输出支持”。
使用一个配置变量可以手动关闭:

‘OUTPUT_ENCODE’=>false

Linux下打包压缩解包解压缩目录文件-tar/gzip等

.tar
解包: tar xvf FileName.tar
打包:tar cvf FileName.tar DirName
(注:tar是打包,不是压缩!)
———————————————
.gz
解压1:gunzip FileName.gz
解压2:gzip -d FileName.gz
压缩:gzip FileName
.tar.gz
解压:tar zxvf FileName.tar.gz
压缩:tar zcvf FileName.tar.gz DirName
———————————————
.bz2
解压1:bzip2 -d FileName.bz2
解压2:bunzip2 FileName.bz2
压缩: bzip2 -z FileName
.tar.bz2
解压:tar jxvf FileName.tar.bz2
压缩:tar jcvf FileName.tar.bz2 DirName
———————————————
.bz
解压1:bzip2 -d FileName.bz
解压2:bunzip2 FileName.bz
压缩:未知
.tar.bz
解压:tar jxvf FileName.tar.bz
压缩:未知
———————————————
Read more…