Snappy Algorithm
How to analyse the source codes of snappy ? In this blog, the pseudocode of compression and uncompression using snappy is given, which is aimed to help you understand snappy algorithm.
Introduction
Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. (For more information, see “Performance”, below.)
Snappy has the following properties:
- Fast: Compression speeds at 250 MB/sec and beyond, with no assembler code. See “Performance” below.
- Stable: Over the last few years, Snappy has compressed and decompressed petabytes of data in Google’s production environment. The Snappy bitstream format is stable and will not change between versions.
- Robust: The Snappy decompressor is designed not to crash in the face of corrupted or malicious input.
- Free and open source software: Snappy is licensed under a BSD-type license. For more information, see the included COPYING file.
Snappy has previously been called “Zippy” in some Google presentations and the like.
Snappy in RocksDB
-
How to link: https://github.com/facebook/rocksdb/blob/main/build_tools/build_detect_platform
1
2
3
4
5
6if ! test $ROCKSDB_DISABLE_SNAPPY; then
# Test whether Snappy library is installed
$CXX $PLATFORM_CXXFLAGS -x c++ - -o /dev/null 2>/dev/null <<EOF
int main() {} -
Where to use: https://github.com/facebook/rocksdb/blob/main/util/compression.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42inline bool Snappy_Compress(const CompressionInfo& /*info*/, const char* input,
size_t length, ::std::string* output) {
output->resize(snappy::MaxCompressedLength(length));
size_t outlen;
snappy::RawCompress(input, length, &(*output)[0], &outlen);
output->resize(outlen);
return true;
(void)input;
(void)length;
(void)output;
return false;
}
inline CacheAllocationPtr Snappy_Uncompress(
const char* input, size_t length, size_t* uncompressed_size,
MemoryAllocator* allocator = nullptr) {
size_t uncompressed_length = 0;
if (!snappy::GetUncompressedLength(input, length, &uncompressed_length)) {
return nullptr;
}
CacheAllocationPtr output = AllocateBlock(uncompressed_length, allocator);
if (!snappy::RawUncompress(input, length, output.get())) {
return nullptr;
}
*uncompressed_size = uncompressed_length;
return output;
(void)input;
(void)length;
(void)uncompressed_size;
(void)allocator;
return nullptr;
}RocksDB主要调用了两个接口
RawCompress
和RawUncompress
。
Snappy
Source: https://github.com/google/snappy/
首先看一下Format,然后分别从RawCompress和RawUncompress入手分析Snappy的压缩和解压过程。
Format
format_description.txt
说明了一些编码格式。
1 | Snappy compressed format description |
Compress
-
RawCompress
1
2
3
4
5
6
7
8
9void RawCompress(const char* input, size_t input_length, char* compressed,
size_t* compressed_length) {
ByteArraySource reader(input, input_length);
UncheckedByteArraySink writer(compressed);
Compress(&reader, &writer);
// Compute how many bytes were added
*compressed_length = (writer.CurrentDestination() - compressed);
}首先根据参数创建
reader
,writer
,然后调用Compress
进行压缩,最后计算compressed_length
。下面看一下
reader
和writer
的结构。 -
ByteArraySource
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47// A Source is an interface that yields a sequence of bytes
class Source {
public:
Source() { }
virtual ~Source();
// Return the number of bytes left to read from the source
virtual size_t Available() const = 0;
// Peek at the next flat region of the source. Does not reposition
// the source. The returned region is empty iff Available()==0.
//
// Returns a pointer to the beginning of the region and store its
// length in *len.
//
// The returned region is valid until the next call to Skip() or
// until this object is destroyed, whichever occurs first.
//
// The returned region may be larger than Available() (for example
// if this ByteSource is a view on a substring of a larger source).
// The caller is responsible for ensuring that it only reads the
// Available() bytes.
virtual const char* Peek(size_t* len) = 0;
// Skip the next n bytes. Invalidates any buffer returned by
// a previous call to Peek().
// REQUIRES: Available() >= n
virtual void Skip(size_t n) = 0;
private:
// No copying
Source(const Source&);
void operator=(const Source&);
};
// A Source implementation that yields the contents of a flat array
class ByteArraySource : public Source {
public:
ByteArraySource(const char* p, size_t n) : ptr_(p), left_(n) { }
~ByteArraySource() override;
size_t Available() const override;
const char* Peek(size_t* len) override;
void Skip(size_t n) override;
private:
const char* ptr_;
size_t left_;
};Available
: 表示还有多少个字节剩余。Peek
: 返回前面可以窥探到的字节流,并且返回长度。返回的buffer必须持续有效直到Skip
。Skip
: 告诉Source某个部分的字节流已经不需要被使用了,将这一部分跳过。
-
UncheckedByteArraySink
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95// A Sink is an interface that consumes a sequence of bytes.
class Sink {
public:
Sink() { }
virtual ~Sink();
// Append "bytes[0,n-1]" to this.
virtual void Append(const char* bytes, size_t n) = 0;
// Returns a writable buffer of the specified length for appending.
// May return a pointer to the caller-owned scratch buffer which
// must have at least the indicated length. The returned buffer is
// only valid until the next operation on this Sink.
//
// After writing at most "length" bytes, call Append() with the
// pointer returned from this function and the number of bytes
// written. Many Append() implementations will avoid copying
// bytes if this function returned an internal buffer.
//
// If a non-scratch buffer is returned, the caller may only pass a
// prefix of it to Append(). That is, it is not correct to pass an
// interior pointer of the returned array to Append().
//
// The default implementation always returns the scratch buffer.
virtual char* GetAppendBuffer(size_t length, char* scratch);
// For higher performance, Sink implementations can provide custom
// AppendAndTakeOwnership() and GetAppendBufferVariable() methods.
// These methods can reduce the number of copies done during
// compression/decompression.
// Append "bytes[0,n-1] to the sink. Takes ownership of "bytes"
// and calls the deleter function as (*deleter)(deleter_arg, bytes, n)
// to free the buffer. deleter function must be non NULL.
//
// The default implementation just calls Append and frees "bytes".
// Other implementations may avoid a copy while appending the buffer.
virtual void AppendAndTakeOwnership(
char* bytes, size_t n, void (*deleter)(void*, const char*, size_t),
void *deleter_arg);
// Returns a writable buffer for appending and writes the buffer's capacity to
// *allocated_size. Guarantees *allocated_size >= min_size.
// May return a pointer to the caller-owned scratch buffer which must have
// scratch_size >= min_size.
//
// The returned buffer is only valid until the next operation
// on this ByteSink.
//
// After writing at most *allocated_size bytes, call Append() with the
// pointer returned from this function and the number of bytes written.
// Many Append() implementations will avoid copying bytes if this function
// returned an internal buffer.
//
// If the sink implementation allocates or reallocates an internal buffer,
// it should use the desired_size_hint if appropriate. If a caller cannot
// provide a reasonable guess at the desired capacity, it should set
// desired_size_hint = 0.
//
// If a non-scratch buffer is returned, the caller may only pass
// a prefix to it to Append(). That is, it is not correct to pass an
// interior pointer to Append().
//
// The default implementation always returns the scratch buffer.
virtual char* GetAppendBufferVariable(
size_t min_size, size_t desired_size_hint, char* scratch,
size_t scratch_size, size_t* allocated_size);
private:
// No copying
Sink(const Sink&);
void operator=(const Sink&);
};
// A Sink implementation that writes to a flat array without any bound checks.
class UncheckedByteArraySink : public Sink {
public:
explicit UncheckedByteArraySink(char* dest) : dest_(dest) { }
~UncheckedByteArraySink() override;
void Append(const char* data, size_t n) override;
char* GetAppendBuffer(size_t len, char* scratch) override;
char* GetAppendBufferVariable(
size_t min_size, size_t desired_size_hint, char* scratch,
size_t scratch_size, size_t* allocated_size) override;
void AppendAndTakeOwnership(
char* bytes, size_t n, void (*deleter)(void*, const char*, size_t),
void *deleter_arg) override;
// Return the current output pointer so that a caller can see how
// many bytes were produced.
// Note: this is not a Sink method.
char* CurrentDestination() const { return dest_; }
private:
char* dest_;
};Append
: 将bytes[0,n-1]这个字节流写入。getAppendBuffer
: 交出一块length的buffer,这块length的buffer的话必须一直有效直到Append
被调用。当然我们也可以直接返回scratch(外围框架分配的内存)。
-
Compress
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69size_t Compress(Source* reader, Sink* writer) {
size_t written = 0;
size_t N = reader->Available();
const size_t uncompressed_size = N;
char ulength[Varint::kMax32];
char* p = Varint::Encode32(ulength, N);
writer->Append(ulength, p - ulength);
written += (p - ulength);
internal::WorkingMemory wmem(N);
while (N > 0) {
// Get next block to compress (without copying if possible)
size_t fragment_size;
const char* fragment = reader->Peek(&fragment_size);
assert(fragment_size != 0); // premature end of input
const size_t num_to_read = std::min(N, kBlockSize);
size_t bytes_read = fragment_size;
size_t pending_advance = 0;
if (bytes_read >= num_to_read) {
// Buffer returned by reader is large enough
pending_advance = num_to_read;
fragment_size = num_to_read;
} else {
char* scratch = wmem.GetScratchInput();
std::memcpy(scratch, fragment, bytes_read);
reader->Skip(bytes_read);
while (bytes_read < num_to_read) {
fragment = reader->Peek(&fragment_size);
size_t n = std::min<size_t>(fragment_size, num_to_read - bytes_read);
std::memcpy(scratch + bytes_read, fragment, n);
bytes_read += n;
reader->Skip(n);
}
assert(bytes_read == num_to_read);
fragment = scratch;
fragment_size = num_to_read;
}
assert(fragment_size == num_to_read);
// Get encoding table for compression
int table_size;
uint16_t* table = wmem.GetHashTable(num_to_read, &table_size);
// Compress input_fragment and append to dest
const int max_output = MaxCompressedLength(num_to_read);
// Need a scratch buffer for the output, in case the byte sink doesn't
// have room for us directly.
// Since we encode kBlockSize regions followed by a region
// which is <= kBlockSize in length, a previously allocated
// scratch_output[] region is big enough for this iteration.
char* dest = writer->GetAppendBuffer(max_output, wmem.GetScratchOutput());
char* end = internal::CompressFragment(fragment, fragment_size, dest, table,
table_size);
writer->Append(dest, end - dest);
written += (end - dest);
N -= num_to_read;
reader->Skip(pending_advance);
}
Report("snappy_compress", written, uncompressed_size);
return written;
}-
头部是原始串长度,使用变长整数方式
Encode
来编码。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27inline char* Varint::Encode32(char* sptr, uint32_t v) {
// Operate on characters as unsigneds
uint8_t* ptr = reinterpret_cast<uint8_t*>(sptr);
static const uint8_t B = 128;
if (v < (1 << 7)) {
*(ptr++) = static_cast<uint8_t>(v);
} else if (v < (1 << 14)) {
*(ptr++) = static_cast<uint8_t>(v | B);
*(ptr++) = static_cast<uint8_t>(v >> 7);
} else if (v < (1 << 21)) {
*(ptr++) = static_cast<uint8_t>(v | B);
*(ptr++) = static_cast<uint8_t>((v >> 7) | B);
*(ptr++) = static_cast<uint8_t>(v >> 14);
} else if (v < (1 << 28)) {
*(ptr++) = static_cast<uint8_t>(v | B);
*(ptr++) = static_cast<uint8_t>((v >> 7) | B);
*(ptr++) = static_cast<uint8_t>((v >> 14) | B);
*(ptr++) = static_cast<uint8_t>(v >> 21);
} else {
*(ptr++) = static_cast<uint8_t>(v | B);
*(ptr++) = static_cast<uint8_t>((v>>7) | B);
*(ptr++) = static_cast<uint8_t>((v>>14) | B);
*(ptr++) = static_cast<uint8_t>((v>>21) | B);
*(ptr++) = static_cast<uint8_t>(v >> 28);
}
return reinterpret_cast<char*>(ptr);
} -
获取
fragment
和fragmentsize
。 -
调用
CompressFragment
。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180// Flat array compression that does not emit the "uncompressed length"
// prefix. Compresses "input" string to the "*op" buffer.
//
// REQUIRES: "input" is at most "kBlockSize" bytes long.
// REQUIRES: "op" points to an array of memory that is at least
// "MaxCompressedLength(input.size())" in size.
// REQUIRES: All elements in "table[0..table_size-1]" are initialized to zero.
// REQUIRES: "table_size" is a power of two
//
// Returns an "end" pointer into "op" buffer.
// "end - op" is the compressed size of "input".
namespace internal {
char* CompressFragment(const char* input, size_t input_size, char* op,
uint16_t* table, const int table_size) {
// "ip" is the input pointer, and "op" is the output pointer.
const char* ip = input;
assert(input_size <= kBlockSize);
assert((table_size & (table_size - 1)) == 0); // table must be power of two
const uint32_t mask = table_size - 1;
const char* ip_end = input + input_size;
const char* base_ip = ip;
const size_t kInputMarginBytes = 15;
if (SNAPPY_PREDICT_TRUE(input_size >= kInputMarginBytes)) {
const char* ip_limit = input + input_size - kInputMarginBytes;
for (uint32_t preload = LittleEndian::Load32(ip + 1);;) {
// Bytes in [next_emit, ip) will be emitted as literal bytes. Or
// [next_emit, ip_end) after the main loop.
const char* next_emit = ip++;
uint64_t data = LittleEndian::Load64(ip);
// The body of this loop calls EmitLiteral once and then EmitCopy one or
// more times. (The exception is that when we're close to exhausting
// the input we goto emit_remainder.)
//
// In the first iteration of this loop we're just starting, so
// there's nothing to copy, so calling EmitLiteral once is
// necessary. And we only start a new iteration when the
// current iteration has determined that a call to EmitLiteral will
// precede the next call to EmitCopy (if any).
//
// Step 1: Scan forward in the input looking for a 4-byte-long match.
// If we get close to exhausting the input then goto emit_remainder.
//
// Heuristic match skipping: If 32 bytes are scanned with no matches
// found, start looking only at every other byte. If 32 more bytes are
// scanned (or skipped), look at every third byte, etc.. When a match is
// found, immediately go back to looking at every byte. This is a small
// loss (~5% performance, ~0.1% density) for compressible data due to more
// bookkeeping, but for non-compressible data (such as JPEG) it's a huge
// win since the compressor quickly "realizes" the data is incompressible
// and doesn't bother looking for matches everywhere.
//
// The "skip" variable keeps track of how many bytes there are since the
// last match; dividing it by 32 (ie. right-shifting by five) gives the
// number of bytes to move ahead for each iteration.
uint32_t skip = 32;
const char* candidate;
if (ip_limit - ip >= 16) {
auto delta = ip - base_ip;
for (int j = 0; j < 4; ++j) {
for (int k = 0; k < 4; ++k) {
int i = 4 * j + k;
// These for-loops are meant to be unrolled. So we can freely
// special case the first iteration to use the value already
// loaded in preload.
uint32_t dword = i == 0 ? preload : static_cast<uint32_t>(data);
assert(dword == LittleEndian::Load32(ip + i));
uint32_t hash = HashBytes(dword, mask);
candidate = base_ip + table[hash];
assert(candidate >= base_ip);
assert(candidate < ip + i);
table[hash] = delta + i;
if (SNAPPY_PREDICT_FALSE(LittleEndian::Load32(candidate) == dword)) {
*op = LITERAL | (i << 2);
UnalignedCopy128(next_emit, op + 1);
ip += i;
op = op + i + 2;
goto emit_match;
}
data >>= 8;
}
data = LittleEndian::Load64(ip + 4 * j + 4);
}
ip += 16;
skip += 16;
}
while (true) {
assert(static_cast<uint32_t>(data) == LittleEndian::Load32(ip));
uint32_t hash = HashBytes(data, mask);
uint32_t bytes_between_hash_lookups = skip >> 5;
skip += bytes_between_hash_lookups;
const char* next_ip = ip + bytes_between_hash_lookups;
if (SNAPPY_PREDICT_FALSE(next_ip > ip_limit)) {
ip = next_emit;
goto emit_remainder;
}
candidate = base_ip + table[hash];
assert(candidate >= base_ip);
assert(candidate < ip);
table[hash] = ip - base_ip;
if (SNAPPY_PREDICT_FALSE(static_cast<uint32_t>(data) ==
LittleEndian::Load32(candidate))) {
break;
}
data = LittleEndian::Load32(next_ip);
ip = next_ip;
}
// Step 2: A 4-byte match has been found. We'll later see if more
// than 4 bytes match. But, prior to the match, input
// bytes [next_emit, ip) are unmatched. Emit them as "literal bytes."
assert(next_emit + 16 <= ip_end);
op = EmitLiteral</*allow_fast_path=*/true>(op, next_emit, ip - next_emit);
// Step 3: Call EmitCopy, and then see if another EmitCopy could
// be our next move. Repeat until we find no match for the
// input immediately after what was consumed by the last EmitCopy call.
//
// If we exit this loop normally then we need to call EmitLiteral next,
// though we don't yet know how big the literal will be. We handle that
// by proceeding to the next iteration of the main loop. We also can exit
// this loop via goto if we get close to exhausting the input.
emit_match:
do {
// We have a 4-byte match at ip, and no need to emit any
// "literal bytes" prior to ip.
const char* base = ip;
std::pair<size_t, bool> p =
FindMatchLength(candidate + 4, ip + 4, ip_end, &data);
size_t matched = 4 + p.first;
ip += matched;
size_t offset = base - candidate;
assert(0 == memcmp(base, candidate, matched));
if (p.second) {
op = EmitCopy</*len_less_than_12=*/true>(op, offset, matched);
} else {
op = EmitCopy</*len_less_than_12=*/false>(op, offset, matched);
}
if (SNAPPY_PREDICT_FALSE(ip >= ip_limit)) {
goto emit_remainder;
}
// Expect 5 bytes to match
assert((data & 0xFFFFFFFFFF) ==
(LittleEndian::Load64(ip) & 0xFFFFFFFFFF));
// We are now looking for a 4-byte match again. We read
// table[Hash(ip, shift)] for that. To improve compression,
// we also update table[Hash(ip - 1, mask)] and table[Hash(ip, mask)].
table[HashBytes(LittleEndian::Load32(ip - 1), mask)] = ip - base_ip - 1;
uint32_t hash = HashBytes(data, mask);
candidate = base_ip + table[hash];
table[hash] = ip - base_ip;
// Measurements on the benchmarks have shown the following probabilities
// for the loop to exit (ie. avg. number of iterations is reciprocal).
// BM_Flat/6 txt1 p = 0.3-0.4
// BM_Flat/7 txt2 p = 0.35
// BM_Flat/8 txt3 p = 0.3-0.4
// BM_Flat/9 txt3 p = 0.34-0.4
// BM_Flat/10 pb p = 0.4
// BM_Flat/11 gaviota p = 0.1
// BM_Flat/12 cp p = 0.5
// BM_Flat/13 c p = 0.3
} while (static_cast<uint32_t>(data) == LittleEndian::Load32(candidate));
// Because the least significant 5 bytes matched, we can utilize data
// for the next iteration.
preload = data >> 8;
}
}
emit_remainder:
// Emit the remaining bytes as a literal
if (ip < ip_end) {
op = EmitLiteral</*allow_fast_path=*/false>(op, ip, ip_end - ip);
}
return op;
}
} // end namespace internal-
核心代码是
for (uint32_t preload = LittleEndian::Load32(ip + 1);;)
控制的大循环。 -
j和k控制两层for循环,指针每次向后移动1个byte(即内层循环k每次加1,data右移8位),对于当前指针指向的4bytes内容dword,将其放入hashtable中。
-
如果在循环中出现了
candidata==dword
的情况,则将从next_emit开始的16个bytes作为literal写入op,然后goto emit_match
。1
2
3
4
5
6
7if (SNAPPY_PREDICT_FALSE(LittleEndian::Load32(candidate) == dword)) {
*op = LITERAL | (i << 2);
UnalignedCopy128(next_emit, op + 1);
ip += i;
op = op + i + 2;
goto emit_match;
} -
否则,进入下面的while循环。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22while (true) {
assert(static_cast<uint32_t>(data) == LittleEndian::Load32(ip));
uint32_t hash = HashBytes(data, mask);
uint32_t bytes_between_hash_lookups = skip >> 5;
skip += bytes_between_hash_lookups;
const char* next_ip = ip + bytes_between_hash_lookups;
if (SNAPPY_PREDICT_FALSE(next_ip > ip_limit)) {
ip = next_emit;
goto emit_remainder;
}
candidate = base_ip + table[hash];
assert(candidate >= base_ip);
assert(candidate < ip);
table[hash] = ip - base_ip;
if (SNAPPY_PREDICT_FALSE(static_cast<uint32_t>(data) ==
LittleEndian::Load32(candidate))) {
break;
}
data = LittleEndian::Load32(next_ip);
ip = next_ip;
}这就是注释中提到的启发式搜索,skip右移5位作为检查标准,不超过32bytes逐字节检查,超过32bytes不超过64bytes每两个字节检查一次…以此类推,bytes_between_hash_lookups的含义就是每多少个字节检查一次。最终会出现两种情况,一种是next_ip大于ip_limit,直接将其作为literal。另一种是data等于candidate,break跳出循环。
-
while循环结束后,我们得到了4bytes的match,先将match对应的literal写入op。
1
2assert(next_emit + 16 <= ip_end);
op = EmitLiteral</*allow_fast_path=*/true>(op, next_emit, ip - next_emit); -
然后进入
emit_match
这个label标记的程序段。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43emit_match:
do {
// We have a 4-byte match at ip, and no need to emit any
// "literal bytes" prior to ip.
const char* base = ip;
std::pair<size_t, bool> p =
FindMatchLength(candidate + 4, ip + 4, ip_end, &data);
size_t matched = 4 + p.first;
ip += matched;
size_t offset = base - candidate;
assert(0 == memcmp(base, candidate, matched));
if (p.second) {
op = EmitCopy</*len_less_than_12=*/true>(op, offset, matched);
} else {
op = EmitCopy</*len_less_than_12=*/false>(op, offset, matched);
}
if (SNAPPY_PREDICT_FALSE(ip >= ip_limit)) {
goto emit_remainder;
}
// Expect 5 bytes to match
assert((data & 0xFFFFFFFFFF) ==
(LittleEndian::Load64(ip) & 0xFFFFFFFFFF));
// We are now looking for a 4-byte match again. We read
// table[Hash(ip, shift)] for that. To improve compression,
// we also update table[Hash(ip - 1, mask)] and table[Hash(ip, mask)].
table[HashBytes(LittleEndian::Load32(ip - 1), mask)] = ip - base_ip - 1;
uint32_t hash = HashBytes(data, mask);
candidate = base_ip + table[hash];
table[hash] = ip - base_ip;
// Measurements on the benchmarks have shown the following probabilities
// for the loop to exit (ie. avg. number of iterations is reciprocal).
// BM_Flat/6 txt1 p = 0.3-0.4
// BM_Flat/7 txt2 p = 0.35
// BM_Flat/8 txt3 p = 0.3-0.4
// BM_Flat/9 txt3 p = 0.34-0.4
// BM_Flat/10 pb p = 0.4
// BM_Flat/11 gaviota p = 0.1
// BM_Flat/12 cp p = 0.5
// BM_Flat/13 c p = 0.3
} while (static_cast<uint32_t>(data) == LittleEndian::Load32(candidate));
// Because the least significant 5 bytes matched, we can utilize data
// for the next iteration.
preload = data >> 8;FindMatchLength
求出最大的match长度,将offset和matched写入op,最后更新hashtable。如果data和candidate不相等,退出循环。
-
-
CompressFragment
结束后,回到Compress
中,最后通过writer->Append(dest, end - dest)
写入writer。
-
Uncompress
-
RawUncompress
1
2
3
4
5bool RawUncompress(const char* compressed, size_t compressed_length,
char* uncompressed) {
ByteArraySource reader(compressed, compressed_length);
return RawUncompress(&reader, uncompressed);
}构造ByteArraySource,将reader作为参数调用重载的
RawUncompress
。1
2
3
4bool RawUncompress(Source* compressed, char* uncompressed) {
SnappyArrayWriter output(uncompressed);
return InternalUncompress(compressed, &output);
}构造SnappyArrayWriter,将output作为参数调用
InternalUncompress
。1
2
3
4
5
6
7
8
9
10template <typename Writer>
static bool InternalUncompress(Source* r, Writer* writer) {
// Read the uncompressed length from the front of the compressed input
SnappyDecompressor decompressor(r);
uint32_t uncompressed_len = 0;
if (!decompressor.ReadUncompressedLength(&uncompressed_len)) return false;
return InternalUncompressAllTags(&decompressor, writer, r->Available(),
uncompressed_len);
}通过source构造decompressor,获取uncompressed_len,调用
InternalUncompressAllTags
。1
2
3
4
5
6
7
8
9
10
11
12
13template <typename Writer>
static bool InternalUncompressAllTags(SnappyDecompressor* decompressor,
Writer* writer, uint32_t compressed_len,
uint32_t uncompressed_len) {
Report("snappy_uncompress", compressed_len, uncompressed_len);
writer->SetExpectedLength(uncompressed_len);
// Process the entire input
decompressor->DecompressAllTags(writer);
writer->Flush();
return (decompressor->eof() && writer->CheckLength());
}在writer中设置uncompressed_len,通过decompressor的
DecompressAllTags(writer)
进行解压。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120void
DecompressAllTags(Writer* writer) {
const char* ip = ip_;
ResetLimit(ip);
auto op = writer->GetOutputPtr();
// We could have put this refill fragment only at the beginning of the loop.
// However, duplicating it at the end of each branch gives the compiler more
// scope to optimize the <ip_limit_ - ip> expression based on the local
// context, which overall increases speed.
// At the start of the for loop below the least significant byte of preload
// contains the tag.
uint32_t preload;
MAYBE_REFILL();
for (;;) {
{
ptrdiff_t op_limit_min_slop;
auto op_base = writer->GetBase(&op_limit_min_slop);
if (op_base) {
auto res =
DecompressBranchless(reinterpret_cast<const uint8_t*>(ip),
reinterpret_cast<const uint8_t*>(ip_limit_),
op - op_base, op_base, op_limit_min_slop);
ip = reinterpret_cast<const char*>(res.first);
op = op_base + res.second;
MAYBE_REFILL();
}
}
const uint8_t c = static_cast<uint8_t>(preload);
ip++;
// Ratio of iterations that have LITERAL vs non-LITERAL for different
// inputs.
//
// input LITERAL NON_LITERAL
// -----------------------------------
// html|html4|cp 23% 77%
// urls 36% 64%
// jpg 47% 53%
// pdf 19% 81%
// txt[1-4] 25% 75%
// pb 24% 76%
// bin 24% 76%
if (SNAPPY_PREDICT_FALSE((c & 0x3) == LITERAL)) {
size_t literal_length = (c >> 2) + 1u;
if (writer->TryFastAppend(ip, ip_limit_ - ip, literal_length, &op)) {
assert(literal_length < 61);
ip += literal_length;
// NOTE: There is no MAYBE_REFILL() here, as TryFastAppend()
// will not return true unless there's already at least five spare
// bytes in addition to the literal.
preload = static_cast<uint8_t>(*ip);
continue;
}
if (SNAPPY_PREDICT_FALSE(literal_length >= 61)) {
// Long literal.
const size_t literal_length_length = literal_length - 60;
literal_length =
ExtractLowBytes(LittleEndian::Load32(ip), literal_length_length) +
1;
ip += literal_length_length;
}
size_t avail = ip_limit_ - ip;
while (avail < literal_length) {
if (!writer->Append(ip, avail, &op)) goto exit;
literal_length -= avail;
reader_->Skip(peeked_);
size_t n;
ip = reader_->Peek(&n);
avail = n;
peeked_ = avail;
if (avail == 0) goto exit;
ip_limit_ = ip + avail;
ResetLimit(ip);
}
if (!writer->Append(ip, literal_length, &op)) goto exit;
ip += literal_length;
MAYBE_REFILL();
} else {
if (SNAPPY_PREDICT_FALSE((c & 3) == COPY_4_BYTE_OFFSET)) {
const size_t copy_offset = LittleEndian::Load32(ip);
const size_t length = (c >> 2) + 1;
ip += 4;
if (!writer->AppendFromSelf(copy_offset, length, &op)) goto exit;
} else {
const ptrdiff_t entry = kLengthMinusOffset[c];
preload = LittleEndian::Load32(ip);
const uint32_t trailer = ExtractLowBytes(preload, c & 3);
const uint32_t length = entry & 0xff;
assert(length > 0);
// copy_offset/256 is encoded in bits 8..10. By just fetching
// those bits, we get copy_offset (since the bit-field starts at
// bit 8).
const uint32_t copy_offset = trailer - entry + length;
if (!writer->AppendFromSelf(copy_offset, length, &op)) goto exit;
ip += (c & 3);
// By using the result of the previous load we reduce the critical
// dependency chain of ip to 4 cycles.
preload >>= (c & 3) * 8;
if (ip < ip_limit_min_maxtaglen_) continue;
}
MAYBE_REFILL();
}
}
exit:
writer->SetOutputPtr(op);
}
Pseudocode
Performance
Snappy is intended to be fast. On a single core of a Core i7 processor in 64-bit mode, it compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more. (These numbers are for the slowest inputs in our benchmark suite; others are much faster.) In our tests, Snappy usually is faster than algorithms in the same class (e.g. LZO, LZF, QuickLZ, etc.) while achieving comparable compression ratios.
Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and other already-compressed data. Similar numbers for zlib in its fastest mode are 2.6-2.8x, 3-7x and 1.0x, respectively. More sophisticated algorithms are capable of achieving yet higher compression rates, although usually at the expense of speed. Of course, compression ratio will vary significantly with the input.
Although Snappy should be fairly portable, it is primarily optimized for 64-bit x86-compatible processors, and may run slower in other environments.
In particular:
- Snappy uses 64-bit operations in several places to process more data at once than would otherwise be possible.
- Snappy assumes unaligned 32 and 64-bit loads and stores are cheap. On some platforms, these must be emulated with single-byte loads and stores, which is much slower.
- Snappy assumes little-endian throughout, and needs to byte-swap data in several places if running on a big-endian platform.
Experience has shown that even heavily tuned code can be improved. Performance optimizations, whether for 64-bit x86 or other platforms, are of course most welcome; see “Contact”, below.
Reference
Snappy Algorithm