PNG优化指南 #
背景 #
PNG文件格式 #
可移植网络图形 (PNG) 是一种用于存储压缩光栅图形的格式。压缩引擎基于 Deflate 方法 [RFC1951],由 PKWare 设计,最初用于 PKZIP 。
PNG 格式由 PNG规范定义。该规范由一个名为 PNG开发组 的特设小组制定,它既是国际标准(以正式名称 ISO/IEC 15948 发布)又是 W3C 推荐标准。
PNG 最初旨在作为 GIF 的卓越、无专利替代品。最终结果是一种现代、可扩展、可靠的图像格式,能够处理数量惊人的图像类型(从 1 位黑白图像到具有完整 16 位 alpha 通道的 48 位 RGB 图像),并由更强大的无损压缩引擎(通常比 GIF 好 5-25%)驱动。
与其他无损压缩方案不同,PNG 压缩不仅仅取决于输入的统计数据,而且它可能在很宽的范围内变化,具体取决于压缩器的实现。一个好的 PNG 编码器必须能够对影响输出大小的因素做出明智的决定。本文的目的是提供有关这些因素的信息,并提供有关实现高效 PNG 编码器的建议。
PNG压缩 #
PNG 压缩以管道方式工作。
在第一阶段,图像像素通过称为 delta filtering(或简称 filtering)的无损算术转换,并作为(过滤的)字节序列进一步发送。筛选不会压缩或以其他方式减小数据的大小,但会使数据更易于压缩。
在第二阶段,过滤后的字节序列通过 Ziv-Lempel 算法 (LZ77),生成 LZ77 代码,在第三阶段也是最后阶段由霍夫曼算法进一步压缩。最后两个阶段的组合称为 Deflate 压缩,这是一种广泛使用、无专利的算法,用于通用、无损的数据压缩。Deflate 中 LZ77 滑动窗口的最大大小为 32768 字节,LZ77 匹配项的长度可以在 3 到 258 字节之间。
PNG 压缩的完整说明不在本指南的讨论范围之内。PNG 规范完整地描述了格式,并提供了对基础技术的完整引用列表。
影响PNG文件大小的因素 #
与任何其他压缩方案一样,PNG 压缩取决于输入数据的统计信息。此外,它还取决于以下特定于 PNG 的参数:
- PNG 图像类型
- PNG 增量筛选器
- 搜索 LZ77 匹配的策略
- Deflate 编码器内的 Huffman 缓冲区的大小
根据实现选择这些参数的方式,PNG 压缩可能会在较宽的范围内有所不同。选择最佳配置的过程在计算上是不可行的,但可以使用启发式方法来选择令人满意的配置。改进这些启发式方法的问题构成了一个有趣的研究主题。
PNG 图像类型 #
PNG 图像的类型在 IHDR 图像标头中定义。图像具有一定的位深度(每个样本最多 16 位)和特定的颜色类型(从灰度到 RGB+Alpha)。如果两个不同类型的 PNG 文件表示完全相同的图像,则每个文件都可以被视为另一个文件的无损转换。无损转换可以减少未压缩的流,这种转换称为 image reduction。在大多数情况下,图像缩减能够减少压缩流(这实际上是我们的兴趣),作为减小压缩器输入大小的间接效果。
可能的图像压缩项有:
- 位深度压缩
位深度可以减少到所有样本都可以接受的最小值。例如,如果 16 位图像中的所有样本值都采用 (256+1)*n 格式(例如 #0000、#2323、#FFFF),则位深度可以减少到 8,新的样本值将变为 n(例如 #00、#23、#FF)。
- 颜色类型压缩
- 如果 RGB 图像具有 256 种或更少的不同颜色,则可以将其重新编码为 Palette 图像。
- 如果 RGB 或调色板图像只有灰色像素,则可以将其重新编码为灰度。
颜色类型压缩还可以启用位深度压缩。
- 调色板压缩
如果调色板包含冗余条目(即指示相同 RGB 值的重复条目)或无菌条目(即在原始像素数据中没有对应项的条目),则可以删除这些条目。
调色板减少还可以启用位深度减少。
- Alpha 通道减少
如果灰度 + Alpha 或 RGB+Alpha 图像中的所有像素都是完全不透明的(即所有 Alpha 分量都等于 2^bitdepth-1),或者如果透明度信息可以完全存储在(更便宜的)tRNS 块中,则可以剥离 Alpha 通道。
但是,在少数情况下,某些图像类型的减少不一定会导致压缩流的减少。 PNG-Tech 网站包含对这些可能性的实验分析;例如,请参阅 调色板图像中每像素 8 位一文。
隔行扫描对于更快的渐进式渲染非常有用,它是 PNG 图像类型的另一个影响压缩的组件。在隔行扫描流中,与相邻像素对应的样本存储在很远的地方,因此其中的数据相关性较低且可压缩性较低。与 JPEG 不同,在JPEG 中,隔行扫描可能会略微改善压缩效果,而 PNG 隔行扫描会显著降低压缩效果。
PNG 增量筛选器 #
以下示例可以说明筛选的作用。假设序列为 2、3、4、5、6、7、8、9。尽管它有很多冗余,但该序列不能被 Ziv-Lempel 压缩器压缩,也不能被 Huffman 压缩器压缩。但是,如果进行简单且可逆的转换,将每个值替换为它与左侧值之间的数值差,则序列将变为 2、1、1、1、1、1、1、1,这是高度可压缩的。
PNG 格式采用五种类型的滤镜:None、Left、Up、Average 和 Paeth。第一个过滤器保持原始数据不变,其他四个过滤器从每个像素中减去一个值,该值涉及从左侧、向上和/或左上角的相邻像素。
为每行分配一个特定的过滤器,并将其应用于该行中的所有像素。因此,可以在大量可能的配置 (5 ^ 高度) 中对图像进行增量筛选,并且每种配置都会导致不同的压缩输出。两种不同的过滤器配置可能会通过几个因素对压缩文件大小产生影响,因此仔细选择过滤器至关重要。
可以将单个过滤器应用于所有行,也可以将不同的过滤器应用于不同的行。在前一种情况下,过滤过程是固定的;在后者中,它是自适应的。
虽然穷举搜索不可行,但 PNG 规范提出了一种启发式过滤策略:
- 如果图像类型为 Palette,或者位深度小于 8,则不过滤图像(即使用固定过滤,过滤器 None )。
- (另一个案例)如果图像类型为灰度或 RGB(带或不带 Alpha),并且位深度不小于 8,则按如下方式使用自适应筛选:独立地对每一行应用所有五个筛选器,并选择每行生成最小绝对值总和的筛选器。
上述启发式方法不理想的情况显示在 PNG-Tech 网站上;例如,请参阅 暴力过滤与启发式过滤。
搜索 LZ77 匹配的策略 #
The Ziv-Lempel algorithm works under the assumption that contiguous sequences appear repeatedly in the input stream. If the sequence to be encoded matches one or more sequences already present in the sliding history window, the encoder sends a LZ77 pair (distance, length) that points to the closest match. In most LZ77 incarnations, including Deflate, smaller distance codes are encoded more concisely.
In Deflate, in particular, the regular (non-matched) symbols, and the match lengths, are sent to the same Huffman coder, while the match distances are sent to a separate Huffman coder. If the LZ77 matches fall between the accepted boundaries (i.e. they are not shorter than 3 and not longer than 258), a greedy strategy will accept them as a replacement for the symbols to which they correspond.
The greedy strategy is preferable when compressing text files, or many types of binary files, but it may be suboptimal when compressing filtered data, such as the byte strings that come from a PNG filter. Filtered data consist mostly of small values with a pseudo-random distribution. Therefore, in certain situations, it may be desirable to favor the encoding of individual symbols, even if matches that may replace these symbols exist.
The zlib Reference Library is a reference implementation of Deflate, which is further used by the PNG Reference Library. By default, zlib selects the greedy strategy, but the user is able to specify his or her custom preference via the strategy parameter. This parameter can take one of the following values:
- Z_DEFAULT_STRATEGY = 0, the default greedy search strategy.
- Z_FILTERED = 1, a strategy in which the matches are accepted only if their length is 6 or bigger.
- Z_HUFFMAN_ONLY = 2, a fast strategy in which the Ziv-Lempel algorithm is entirely bypassed, and all the symbols from the input are encoded directly by the Huffman coder.
- Z_RLE = 3 (appeared in the zlib-1.2.x series), a fast strategy in which the LZ77 algorithm is essentially reduced to the Run-Length Encoding algorithm. In other words, the matches are accepted only if their distance is 1. For example, the 10-symbol sequence “aaaaaaaaaa” can be LZ77-encoded as [‘a’, (distance=1, length=9)]; by removing distance=1 from the picture, this encoding can be regarded as a peculiar run-length encoding (which differs from the classic RLE by using length=9 instead of count=10). The strategy parameter affects only the compression ratio. It does not affect the correctness of the compressed output, even if it is set to an inappropriate value.
It was experimentally observed that the LZ77 search is occasionally capable of producing smaller PNGs if it is less exhaustive. The reason behind this act resides in the same category of “strategic searches” discussed here. Unfortunately, there is no known method of anticipating which search level (from the fastest and the least exhaustive, to the slowest and the most exhaustive) is better, other than assuming “the most exhaustive is better in most cases”.
Unfortunately, even a “filtered” strategy does not always produce better results than a “greedy” strategy on filtered input, and the only known method to obtain the best combination is by multiple trials. Experiments and measurements can, again, be found on the PNG-Tech site; for example, see the original Z_RLE strategy proposal.
2.4 The size of Huffman buffers As mentioned earlier, the entropy encoder inside the Deflate method is the static Huffman algorithm. The output of LZ77 is fed into a buffer which is occasionally flushed by sending a static Huffman tree followed by all the Huffman codes, to the output of Deflate. After this, both the buffer and the Huffman tree are reset, waiting for the subsequent LZ77 codes to come and refill the buffer.
The Deflate specification refers to dynamic Huffman codes. However, this is a misnomer, in which the term dynamic is used in contrast to the fixed Huffman codes. The fixed Huffman codes are simply built according to a predefined Huffman tree, without regard to the actual symbol frequencies. The dynamic Huffman codes referred to by the Deflate specification are NOT built by the dynamic Huffman algorithm, as defined, for example, by Faller, Gallager and Knuth (the FGK algorithm), or by Vitter (the V algorithm). The predefined Huffman tree was introduced in PKZIP as a fast compression alternative, but it produces poor results even on text, and it is almost useless in PNG compression. Still, a PNG stream that contains codes built by the fixed (predefined) Huffman tree, is a valid stream, and a compliant PNG reader must decode this stream correctly. It is desirable to establish the buffer boundaries so that sequences conforming to the same probability model are fit in the same Huffman buffer. Methods for approaching these boundaries exist, but they are not used in the mainstream Deflate implementation(s). Instead, the buffers are flushed when a limit (typically, 16k LZ77 codes) is reached. This is, however, a fast approach, and the results are satisfactory.
The size of Huffman buffers is indirectly determined by the encoder’s memory (usage) level. For this reason, certain memory levels might be good for certain types of images.
- PNG (lossless) optimization programs The multitude of PNG encoding programs is listed at http://www.libpng.org/pub/png/pngapps.html. Their performance varies as much as the range of possible compression ratios; the good encoders are at least applying the filtering heuristics, described briefly in the PNG Specification, and illustrated above. Some programs gain extra compression by discarding some of the data in the input images (so these programs are lossy!)
This section contains the small list of PNG optimization programs that show a particular concern towards obtaining a file size as small as possible. They work by performing repeated compression trials, applying various parameter sets, and selecting the parameter set that yields the smallest compressed output.
pngrewrite by Jason Summers, available at http://www.pobox.com/~jason1/pngrewrite, is an open-source program that performs lossless image reductions. It works best in conjunction with pngcrush (see below); the user should run pngcrush after pngrewrite.
pngcrush by Glenn Randers-Pehrson, available at http://pmt.sourceforge.net/pngcrush, is an open-source program that iterates over PNG filters and zlib (Deflate) parameters, compresses the image repeatedly using each parameter configuration, and chooses the configuration that yields the smallest compressed (IDAT) output. At the user’s option, the program can explore few (below 10) or many (a brute-force traversal over more than 100) configurations. The method of selecting the parameters for “few” trials is particularly effective, and the use of a brute-force traversal is generally not recommended.
In addition, pngcrush offers a multitude of extra features, such as recovery of erroneous PNG files (e.g. files containing bad CRCs), and chunk-level editing of PNG meta-data.
OptiPNG by Cosmin Truţa, available at http://www.cs.toronto.edu/pngtech/optipng, is a newer open-source program, inspired from pngcrush, but designed to be more flexible and to run faster. Unlike pngcrush, OptiPNG performs the trials entirely in memory, and writes only the final output file on the disk. Moreover, it offers multiple optimization presets to the user, who can choose among a range of options from “very few trials” to “very many trials” (in contrast to the coarser “smart vs. brute” option offered by pngcrush).
It is important to mention that the achieved compression ratio is less and less likely to improve when higher-level presets (trigerring more trials) are being used. Even if the program is capable of searching automatically over more than 200 configurations (and the advanced users have access to more than 1000 configurations!), a preset that selects around 10 trials should be satisfactory for most users. Furthermore, a preset that selects between 30-40 trials should be satisfactory for all users, for it is very, very unlikely to be beaten significantly by any wider search. The rest of the trial configurations are offered rather as a curiosity (but they were used in the experimentation from which we concluded they are indeed useless!)
AdvanceCOMP by Andrea Mazzoleni is a set of tools for optimizing ZIP/GZIP, PNG and MNG files, based on the powerful 7-Zip deflation engine. The name of the PNG optimization tool is AdvPNG. At the time of this writing, AdvPNG does not perform image reductions, so the use of pngrewrite or OptiPNG prior to optimiziation may be necessary. However, given the effectivenes of 7-Zip deflation, AdvanceCOMP is a powerful contender.
The AdvanceCOMP tool set is a part of the AdvanceMAME project, available at http://advancemame.sourceforge.net.
PNGOut by Ken Silverman, available at http://advsys.net/ken/utils.htm, is a freely-available compiled program (no source code), running on Windows and Linux. According to our tests, the compression ratio achieved by PNGOut is comparable to that of AdvPNG. Unfortunately, due to the lack of information, we cannot say much about this tool.
A nice GUI frontend for PNGOut, named PNGGauntlet, is available at http://www.numbera.com/software/pnggauntlet.aspx.
- An extra note on losslessness What is lossless PNG optimization, after all? This is a straightforward question, whose answer is intuitive, yet not so straightforward.
Losslessness in the strictest sense, where no information whatsoever is lost, can only be achieved by leaving the original file (any file) intact, or by transforming it (e.g. compressing it, encrypting it) in such a way that there is an inverse transformation which recovers it completely, bit by bit.
In the case of PNG images, this condition of strict losslessness has little relevance to the casual graphics user, and is, therefore, too strong. There are instances where strict losslessness is required; for example, when handling certified PNG files whose integrity is guaranteed by an external checksum like MD5 or SHA, or by a digital signature such as dSIG. Most of the time, however, it is desirable to relax the notion of PNG losslessness, to the extent of not losing any information that pertains to the rendered image and to the semantic value of the metadata that accompanies the image. This allows the user to concentrate on what is really important when it comes to preserving the contents of a PNG image, and enables the concept of PNG optimization tools.
A lossless transform of a PNG image file is a transform which fully preserves the rendered RGB triples (the RGB triples that come either directly, or from a palette index, or from a gray->RGB expansion), the rendered transparency (the alpha samples that come either directly, or from a tRNS chunk, or the implicit 100% opacity assumed due to the lack of any explicit transparency information), the order of rendering (sequential or interlaced), and the semantics contained by the ancillary chunks. This definition allows the execution of the above-mentioned image reduction operations, and the recompression of IDAT. It also allows the alteration or the elimination of other pieces of information that are technically valid, but have no influence on any presentation of the image pixels: The information that pertains to Deflate streams, either inside IDAT, or in other compressed chunks like zTXt, iTXt or iCCP; e.g. the LZ77 window size, the type and size of Deflate blocks, etc. (The only thing that matters is that the decompressed byte sequence must remain the same.) The order of palette entries inside a PLTE chunk. (When changing this order, the information that depends on it, such as the palette-encoded pixels or the tRNS information, must be updated accordingly.) RGB triples that do not correspond to any pixel in the actual image, but are stored in a tRNS chunk. Fully opaque tRNS entries in a palette image. Gamma correction (gAMA) or significant bit (sBIT) information inside an image that consists exclusively of samples whose intensity is either minimum (0) or maximum (2^bitdepth-1). The fact that a textual comment is stored uncompressed in a tEXt chunk, or compressed in a zTXt chunk, or with no translation in an iTXt chunk. Etcetera. If any of the discardable information is important in a particular application, and lossless PNG optimization is still desirable, it is recommended to store this information in ancillary chunks, rather than hack it inside critical chunks. For example, if sterile palette entries are necessary (e.g. for later editing stages), it is recommended to store them inside a suggested palette (sPLT) chunk, rather than keeping them inside PLTE.
- Selective bibliography Besides the discussed specifications, the references below provide essential information necessary to comprehend the contents of this article.
Thomas Boutell, Glenn Randers-Pehrson et al. Portable Network Graphics (PNG) Specification, Second Edition. ISO/IEC 15948:2003(E); W3C Recommendation 10 November 2003. David A. Huffman. A method for the construction of minimum redundancy codes. In Proceedings of the Institute of Radio Engineers, vol. 40, no. 9, pp. 1098-1101, September 1952. Jacob Ziv and Abraham Lempel. A universal algorithm for data compression. IEEE Transactions on Information Theory, vol. IT-23, no. 3, pp. 337-343, May 1977. Due to a historical accident, the famous algorithm is better-known as the “Lempel-Ziv (LZ) algorithm”, even though the “Ziv-Lempel algorithm” is a more legitimate name. Greg Roelofs. PNG: The definitive guide. O’Reilly and Associates, 1999.
版权所有 © 2003-2008 Cosmin Truţa。自由分发的权限。
出现时间:2003 年 4 月 7 日。
最后更新日期:2008 年 5 月 10 日。