What if Dictionary size in LZW algorithm is full?

up vote
3
down vote

favorite

I have been studying LZW compression and there is one thing that I can't satisfy myself with which is that while building up the dictionary in LZW mostly its maximum limit is set to 4096 entries. Why is that ?. Also if dictionary gets full then the dictionary is reset but what if the next few characters about to read were present in the dictionary before resetting the dictionary. Is this a limitation ? or my understanding is not correct?

edited Nov 11 at 7:52

Kromster

4,77364885

asked Oct 15 '16 at 2:03

Syed Ahmed Jamil

187215

2

One prominent early LZW implementation was BSD compress(1). It started with 9 bit codes (dictionary size 512 (2^9)) - Huffman coding the LZW code stream would have been slow, and arithmetic coding wasn't common, let alone speedy encoders unencumbered by patents. It grew codesize as needed up to a limit of 16, parametrisable down to 12. Compression ratio was monitored, a reset occurred when it got bad, only. Limiting dictionary size allowed computers with little memory to decompress the stream - think 1/3 century ago, 1200/75 modems, … Little input left is a valid concern not addressed AFAIR.
– greybeard
Oct 15 '16 at 5:55

add a comment |

up vote
3
down vote

favorite

edited Nov 11 at 7:52

Kromster

4,77364885

asked Oct 15 '16 at 2:03

Syed Ahmed Jamil

187215

2

One prominent early LZW implementation was BSD compress(1). It started with 9 bit codes (dictionary size 512 (2^9)) - Huffman coding the LZW code stream would have been slow, and arithmetic coding wasn't common, let alone speedy encoders unencumbered by patents. It grew codesize as needed up to a limit of 16, parametrisable down to 12. Compression ratio was monitored, a reset occurred when it got bad, only. Limiting dictionary size allowed computers with little memory to decompress the stream - think 1/3 century ago, 1200/75 modems, … Little input left is a valid concern not addressed AFAIR.
– greybeard
Oct 15 '16 at 5:55

add a comment |

up vote
3
down vote

favorite

edited Nov 11 at 7:52

Kromster

4,77364885

asked Oct 15 '16 at 2:03

Syed Ahmed Jamil

187215

algorithm lzw

edited Nov 11 at 7:52

Kromster

4,77364885

asked Oct 15 '16 at 2:03

Syed Ahmed Jamil

187215

edited Nov 11 at 7:52

Kromster

4,77364885

asked Oct 15 '16 at 2:03

Syed Ahmed Jamil

187215

edited Nov 11 at 7:52

Kromster

4,77364885

edited Nov 11 at 7:52

Kromster

4,77364885

edited Nov 11 at 7:52

Kromster

4,77364885

asked Oct 15 '16 at 2:03

Syed Ahmed Jamil

187215

asked Oct 15 '16 at 2:03

Syed Ahmed Jamil

187215

asked Oct 15 '16 at 2:03

Syed Ahmed Jamil

187215

2

One prominent early LZW implementation was BSD compress(1). It started with 9 bit codes (dictionary size 512 (2^9)) - Huffman coding the LZW code stream would have been slow, and arithmetic coding wasn't common, let alone speedy encoders unencumbered by patents. It grew codesize as needed up to a limit of 16, parametrisable down to 12. Compression ratio was monitored, a reset occurred when it got bad, only. Limiting dictionary size allowed computers with little memory to decompress the stream - think 1/3 century ago, 1200/75 modems, … Little input left is a valid concern not addressed AFAIR.
– greybeard
Oct 15 '16 at 5:55

add a comment |

2

One prominent early LZW implementation was BSD compress(1). It started with 9 bit codes (dictionary size 512 (2^9)) - Huffman coding the LZW code stream would have been slow, and arithmetic coding wasn't common, let alone speedy encoders unencumbered by patents. It grew codesize as needed up to a limit of 16, parametrisable down to 12. Compression ratio was monitored, a reset occurred when it got bad, only. Limiting dictionary size allowed computers with little memory to decompress the stream - think 1/3 century ago, 1200/75 modems, … Little input left is a valid concern not addressed AFAIR.
– greybeard
Oct 15 '16 at 5:55

One prominent early LZW implementation was BSD compress(1). It started with 9 bit codes (dictionary size 512 (2^9)) - Huffman coding the LZW code stream would have been slow, and arithmetic coding wasn't common, let alone speedy encoders unencumbered by patents. It grew codesize as needed up to a limit of 16, parametrisable down to 12. Compression ratio was monitored, a reset occurred when it got bad, only. Limiting dictionary size allowed computers with little memory to decompress the stream - think 1/3 century ago, 1200/75 modems, … Little input left is a valid concern not addressed AFAIR.
– greybeard
Oct 15 '16 at 5:55

add a comment |

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

The dictionary size is limited to the output symbol size. 12 bits can encode 4096 distinct values. It is a common choice for lecture notes and simple/assignment implementations.

However, ANY symbol bits more than the source bits can be used: a 16 bit symbol would allow 65k dictionary entries. The more bits, the more entries in the current dictionary can exist which can increase the “maximum” compression. Conversely, as each output symbol is larger it may decrease compression rates, especially for smaller inputs (insufficient time to generate a dictionary) and data that is more random (reduced ability to re-use the symbols in the dictionary). In practice, 19-20 bits seems to be about the useful limit², while 16 bit symbols naturally align to bytes.

It is also possible to have an adaptive symbol size based on log2 of the CURRENT number of mapped symbols¹- but this benefit disappears as data size increases, as the dictionary quickly fills. It is also largely superseded by Huffman coding.

When the dictionary is "reset", it is effectively the same as compressing multiple chunks of data and appending the compressed output: the dictionaries are separate. However, the data can be “split” dynamically based on when it fills the dictionary as opposed to, say, every X bytes of input. Since the symbol size is fixed, it is more efficient to ensure the dictionary is filled up before making the decision.

The primary purpose to reset a dictionary is to avoid “fixating” the symbols to data characteristics in one part of the input that might not be true for later data. A compressor can use a single non-resetting dictionary, reset the dictionary as soon as it is full, reset the dictionary when it’s full and a drop in compression is encountered, etc.: the goal is to achieve the highest compression within the domain/parameters.

Many LZ77/LZ78/LZW variations (and optimizations they utilize) are briefly discussed in "On parsing optimality for dictionary-based text compression—the Zip case" by AlessioLangiu; these excerpts contain lots of juicy details to further additional research.

¹"Improving LZW' by R. Nigel Horspool goes into some details on adaptive symbol sizes. Nigel's "The Effect of Non-Greedy Parsing
in Ziv-Lempel Compression Methods" paper also includes a summary on compress's handling of dictionary resets.

²"The Relative Efficiency of Data Compression by LZW and LZSS" by Yair Wiseman includes a sample graph of symbols sizes vs. compression efficiency. The graph is highly dependent on the data.

edited Nov 12 at 5:23

answered Nov 11 at 7:24

user2864740

43.3k669142

Your link is broken.
– Michael Foukarakis
Nov 11 at 7:47

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f40054218%2fwhat-if-dictionary-size-in-lzw-algorithm-is-full%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
2
down vote

accepted

The dictionary size is limited to the output symbol size. 12 bits can encode 4096 distinct values. It is a common choice for lecture notes and simple/assignment implementations.

²"The Relative Efficiency of Data Compression by LZW and LZSS" by Yair Wiseman includes a sample graph of symbols sizes vs. compression efficiency. The graph is highly dependent on the data.

edited Nov 12 at 5:23

answered Nov 11 at 7:24

user2864740

43.3k669142

Your link is broken.
– Michael Foukarakis
Nov 11 at 7:47

add a comment |

up vote
2
down vote

accepted

The dictionary size is limited to the output symbol size. 12 bits can encode 4096 distinct values. It is a common choice for lecture notes and simple/assignment implementations.

²"The Relative Efficiency of Data Compression by LZW and LZSS" by Yair Wiseman includes a sample graph of symbols sizes vs. compression efficiency. The graph is highly dependent on the data.

edited Nov 12 at 5:23

answered Nov 11 at 7:24

user2864740

43.3k669142

Your link is broken.
– Michael Foukarakis
Nov 11 at 7:47

add a comment |

up vote
2
down vote

accepted

The dictionary size is limited to the output symbol size. 12 bits can encode 4096 distinct values. It is a common choice for lecture notes and simple/assignment implementations.

²"The Relative Efficiency of Data Compression by LZW and LZSS" by Yair Wiseman includes a sample graph of symbols sizes vs. compression efficiency. The graph is highly dependent on the data.

edited Nov 12 at 5:23

answered Nov 11 at 7:24

user2864740

43.3k669142

The dictionary size is limited to the output symbol size. 12 bits can encode 4096 distinct values. It is a common choice for lecture notes and simple/assignment implementations.

²"The Relative Efficiency of Data Compression by LZW and LZSS" by Yair Wiseman includes a sample graph of symbols sizes vs. compression efficiency. The graph is highly dependent on the data.

edited Nov 12 at 5:23

answered Nov 11 at 7:24

user2864740

43.3k669142

edited Nov 12 at 5:23

answered Nov 11 at 7:24

user2864740

43.3k669142

answered Nov 11 at 7:24

user2864740

43.3k669142

answered Nov 11 at 7:24

user2864740

43.3k669142

Your link is broken.
– Michael Foukarakis
Nov 11 at 7:47

add a comment |

Your link is broken.
– Michael Foukarakis
Nov 11 at 7:47

Your link is broken.
– Michael Foukarakis
Nov 11 at 7:47

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

wkm 3Xp EqALGO06VZYIgH FE

搜尋此網誌

Vfrdtyky