How can I optimize Huffman Decoding?
up vote
0
down vote
favorite
So I've been trying to decode using huffman, and I have this working function but it has horrible time and space complexity. What I've been doing so far is reading each byte, getting each bit and adding it to the currentBitString. I then reversed the string, and added it to a huge string which bascially contains all the byte data for the file. After that, I would trace the giant string and look for the Huffman code and then if its a match, i would write to file. This code takes about 60 seconds to decode a 200kb, which is very bad, but I'm not really sure how to improve it? I know I could for starters, write more than one byte to the file at a time, but it didn't seem to improve the time when I tried?
public static void decode(File f) throws Exception {
BufferedInputStream fin = new BufferedInputStream(new FileInputStream(f));
int i = f.getName().lastIndexOf('.');
String extension="txt";
String newFileName=f.getName().substring(0, i)+extension;
File nf = new File(newFileName);
BufferedOutputStream fw = new BufferedOutputStream(new FileOutputStream(nf));
int c;
byte bits;
byte current;
String currentBitString="";
String bitString="";
//read each byte from file, reverse it, add to giant bitString
//reads ALL BYTES
while( (c=fin.read())!=-1 ) {
current=(byte) c;
currentBitString="";
bits=0;
for(int q=0;q<8;q++) {
bits=getBit(current,q);
currentBitString+=bits;
}
StringBuilder bitStringReverse=new StringBuilder(currentBitString);
bitString+=bitStringReverse.reverse().toString();
}
currentBitString="";
boolean foundCode=false;
for(int j=0;j<bitString.length();j++) {
currentBitString+=bitString.charAt(j);
for(int k=0;k<nodes.length;k++) {
//nodes is an array of huffman nodes which contains the the byte
//data and the huffman codes for each byte
if(nodes[k].code.compareTo(currentBitString.trim())==0) {
fw.write(nodes[k].data);
foundCode=true;
break;
}
}
if(foundCode) {
currentBitString="";
foundCode=false;
}
}
fw.flush();
fw.close();
fin.close();
}
here's the gitBit function
public static byte getBit(byte ID, int position) {
// return cretin bit in selected byte
return (byte) ((ID >> position) & 1);
}
here's the data members of the HuffmanNode class (the nodes array is a array of HuffmanNodes)
public class HuffmanNode{
byte data;
int repetitions;
String code;
HuffmanNode right;
HuffmanNode left;
}
java huffman-code
add a comment |
up vote
0
down vote
favorite
So I've been trying to decode using huffman, and I have this working function but it has horrible time and space complexity. What I've been doing so far is reading each byte, getting each bit and adding it to the currentBitString. I then reversed the string, and added it to a huge string which bascially contains all the byte data for the file. After that, I would trace the giant string and look for the Huffman code and then if its a match, i would write to file. This code takes about 60 seconds to decode a 200kb, which is very bad, but I'm not really sure how to improve it? I know I could for starters, write more than one byte to the file at a time, but it didn't seem to improve the time when I tried?
public static void decode(File f) throws Exception {
BufferedInputStream fin = new BufferedInputStream(new FileInputStream(f));
int i = f.getName().lastIndexOf('.');
String extension="txt";
String newFileName=f.getName().substring(0, i)+extension;
File nf = new File(newFileName);
BufferedOutputStream fw = new BufferedOutputStream(new FileOutputStream(nf));
int c;
byte bits;
byte current;
String currentBitString="";
String bitString="";
//read each byte from file, reverse it, add to giant bitString
//reads ALL BYTES
while( (c=fin.read())!=-1 ) {
current=(byte) c;
currentBitString="";
bits=0;
for(int q=0;q<8;q++) {
bits=getBit(current,q);
currentBitString+=bits;
}
StringBuilder bitStringReverse=new StringBuilder(currentBitString);
bitString+=bitStringReverse.reverse().toString();
}
currentBitString="";
boolean foundCode=false;
for(int j=0;j<bitString.length();j++) {
currentBitString+=bitString.charAt(j);
for(int k=0;k<nodes.length;k++) {
//nodes is an array of huffman nodes which contains the the byte
//data and the huffman codes for each byte
if(nodes[k].code.compareTo(currentBitString.trim())==0) {
fw.write(nodes[k].data);
foundCode=true;
break;
}
}
if(foundCode) {
currentBitString="";
foundCode=false;
}
}
fw.flush();
fw.close();
fin.close();
}
here's the gitBit function
public static byte getBit(byte ID, int position) {
// return cretin bit in selected byte
return (byte) ((ID >> position) & 1);
}
here's the data members of the HuffmanNode class (the nodes array is a array of HuffmanNodes)
public class HuffmanNode{
byte data;
int repetitions;
String code;
HuffmanNode right;
HuffmanNode left;
}
java huffman-code
Maybe this will be helpful geeksforgeeks.org/…
– Maor Refaeli
Nov 11 at 8:31
You may have a very inefficiently coded implementation but complexity doesn't say anything about that. It's still O(n) time and from a quick scan, O(n) space too. You're asking the wrong question, you can't xhsnde it's complexity but you want to optimize the code. And the question is likely too broad for SO.
– Erwin Bolwidt
Nov 11 at 9:07
1
I have really no plan what your code is doing. But I observe that it's super heavy regarding memory allocation: For each character you read, you allocate a StringBuilder instance, toString() allocates a String instance and+=
allocates another 9 String instance. There is tremendous potential for optimization.
– Codo
Nov 11 at 10:09
1
See rules.sonarsource.com/java/RSPEC-1643. Also, why build the bitstring left-to-right and then reverse it (thus using up more memory and taking up more time) when you can simply build it the right way to begin with, usingfor (int q=7;q>=0;q--)
.
– Klitos Kyriacou
Nov 11 at 12:07
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
So I've been trying to decode using huffman, and I have this working function but it has horrible time and space complexity. What I've been doing so far is reading each byte, getting each bit and adding it to the currentBitString. I then reversed the string, and added it to a huge string which bascially contains all the byte data for the file. After that, I would trace the giant string and look for the Huffman code and then if its a match, i would write to file. This code takes about 60 seconds to decode a 200kb, which is very bad, but I'm not really sure how to improve it? I know I could for starters, write more than one byte to the file at a time, but it didn't seem to improve the time when I tried?
public static void decode(File f) throws Exception {
BufferedInputStream fin = new BufferedInputStream(new FileInputStream(f));
int i = f.getName().lastIndexOf('.');
String extension="txt";
String newFileName=f.getName().substring(0, i)+extension;
File nf = new File(newFileName);
BufferedOutputStream fw = new BufferedOutputStream(new FileOutputStream(nf));
int c;
byte bits;
byte current;
String currentBitString="";
String bitString="";
//read each byte from file, reverse it, add to giant bitString
//reads ALL BYTES
while( (c=fin.read())!=-1 ) {
current=(byte) c;
currentBitString="";
bits=0;
for(int q=0;q<8;q++) {
bits=getBit(current,q);
currentBitString+=bits;
}
StringBuilder bitStringReverse=new StringBuilder(currentBitString);
bitString+=bitStringReverse.reverse().toString();
}
currentBitString="";
boolean foundCode=false;
for(int j=0;j<bitString.length();j++) {
currentBitString+=bitString.charAt(j);
for(int k=0;k<nodes.length;k++) {
//nodes is an array of huffman nodes which contains the the byte
//data and the huffman codes for each byte
if(nodes[k].code.compareTo(currentBitString.trim())==0) {
fw.write(nodes[k].data);
foundCode=true;
break;
}
}
if(foundCode) {
currentBitString="";
foundCode=false;
}
}
fw.flush();
fw.close();
fin.close();
}
here's the gitBit function
public static byte getBit(byte ID, int position) {
// return cretin bit in selected byte
return (byte) ((ID >> position) & 1);
}
here's the data members of the HuffmanNode class (the nodes array is a array of HuffmanNodes)
public class HuffmanNode{
byte data;
int repetitions;
String code;
HuffmanNode right;
HuffmanNode left;
}
java huffman-code
So I've been trying to decode using huffman, and I have this working function but it has horrible time and space complexity. What I've been doing so far is reading each byte, getting each bit and adding it to the currentBitString. I then reversed the string, and added it to a huge string which bascially contains all the byte data for the file. After that, I would trace the giant string and look for the Huffman code and then if its a match, i would write to file. This code takes about 60 seconds to decode a 200kb, which is very bad, but I'm not really sure how to improve it? I know I could for starters, write more than one byte to the file at a time, but it didn't seem to improve the time when I tried?
public static void decode(File f) throws Exception {
BufferedInputStream fin = new BufferedInputStream(new FileInputStream(f));
int i = f.getName().lastIndexOf('.');
String extension="txt";
String newFileName=f.getName().substring(0, i)+extension;
File nf = new File(newFileName);
BufferedOutputStream fw = new BufferedOutputStream(new FileOutputStream(nf));
int c;
byte bits;
byte current;
String currentBitString="";
String bitString="";
//read each byte from file, reverse it, add to giant bitString
//reads ALL BYTES
while( (c=fin.read())!=-1 ) {
current=(byte) c;
currentBitString="";
bits=0;
for(int q=0;q<8;q++) {
bits=getBit(current,q);
currentBitString+=bits;
}
StringBuilder bitStringReverse=new StringBuilder(currentBitString);
bitString+=bitStringReverse.reverse().toString();
}
currentBitString="";
boolean foundCode=false;
for(int j=0;j<bitString.length();j++) {
currentBitString+=bitString.charAt(j);
for(int k=0;k<nodes.length;k++) {
//nodes is an array of huffman nodes which contains the the byte
//data and the huffman codes for each byte
if(nodes[k].code.compareTo(currentBitString.trim())==0) {
fw.write(nodes[k].data);
foundCode=true;
break;
}
}
if(foundCode) {
currentBitString="";
foundCode=false;
}
}
fw.flush();
fw.close();
fin.close();
}
here's the gitBit function
public static byte getBit(byte ID, int position) {
// return cretin bit in selected byte
return (byte) ((ID >> position) & 1);
}
here's the data members of the HuffmanNode class (the nodes array is a array of HuffmanNodes)
public class HuffmanNode{
byte data;
int repetitions;
String code;
HuffmanNode right;
HuffmanNode left;
}
java huffman-code
java huffman-code
edited Nov 11 at 10:25
asked Nov 11 at 8:28
TheHaruWhoCodes
207
207
Maybe this will be helpful geeksforgeeks.org/…
– Maor Refaeli
Nov 11 at 8:31
You may have a very inefficiently coded implementation but complexity doesn't say anything about that. It's still O(n) time and from a quick scan, O(n) space too. You're asking the wrong question, you can't xhsnde it's complexity but you want to optimize the code. And the question is likely too broad for SO.
– Erwin Bolwidt
Nov 11 at 9:07
1
I have really no plan what your code is doing. But I observe that it's super heavy regarding memory allocation: For each character you read, you allocate a StringBuilder instance, toString() allocates a String instance and+=
allocates another 9 String instance. There is tremendous potential for optimization.
– Codo
Nov 11 at 10:09
1
See rules.sonarsource.com/java/RSPEC-1643. Also, why build the bitstring left-to-right and then reverse it (thus using up more memory and taking up more time) when you can simply build it the right way to begin with, usingfor (int q=7;q>=0;q--)
.
– Klitos Kyriacou
Nov 11 at 12:07
add a comment |
Maybe this will be helpful geeksforgeeks.org/…
– Maor Refaeli
Nov 11 at 8:31
You may have a very inefficiently coded implementation but complexity doesn't say anything about that. It's still O(n) time and from a quick scan, O(n) space too. You're asking the wrong question, you can't xhsnde it's complexity but you want to optimize the code. And the question is likely too broad for SO.
– Erwin Bolwidt
Nov 11 at 9:07
1
I have really no plan what your code is doing. But I observe that it's super heavy regarding memory allocation: For each character you read, you allocate a StringBuilder instance, toString() allocates a String instance and+=
allocates another 9 String instance. There is tremendous potential for optimization.
– Codo
Nov 11 at 10:09
1
See rules.sonarsource.com/java/RSPEC-1643. Also, why build the bitstring left-to-right and then reverse it (thus using up more memory and taking up more time) when you can simply build it the right way to begin with, usingfor (int q=7;q>=0;q--)
.
– Klitos Kyriacou
Nov 11 at 12:07
Maybe this will be helpful geeksforgeeks.org/…
– Maor Refaeli
Nov 11 at 8:31
Maybe this will be helpful geeksforgeeks.org/…
– Maor Refaeli
Nov 11 at 8:31
You may have a very inefficiently coded implementation but complexity doesn't say anything about that. It's still O(n) time and from a quick scan, O(n) space too. You're asking the wrong question, you can't xhsnde it's complexity but you want to optimize the code. And the question is likely too broad for SO.
– Erwin Bolwidt
Nov 11 at 9:07
You may have a very inefficiently coded implementation but complexity doesn't say anything about that. It's still O(n) time and from a quick scan, O(n) space too. You're asking the wrong question, you can't xhsnde it's complexity but you want to optimize the code. And the question is likely too broad for SO.
– Erwin Bolwidt
Nov 11 at 9:07
1
1
I have really no plan what your code is doing. But I observe that it's super heavy regarding memory allocation: For each character you read, you allocate a StringBuilder instance, toString() allocates a String instance and
+=
allocates another 9 String instance. There is tremendous potential for optimization.– Codo
Nov 11 at 10:09
I have really no plan what your code is doing. But I observe that it's super heavy regarding memory allocation: For each character you read, you allocate a StringBuilder instance, toString() allocates a String instance and
+=
allocates another 9 String instance. There is tremendous potential for optimization.– Codo
Nov 11 at 10:09
1
1
See rules.sonarsource.com/java/RSPEC-1643. Also, why build the bitstring left-to-right and then reverse it (thus using up more memory and taking up more time) when you can simply build it the right way to begin with, using
for (int q=7;q>=0;q--)
.– Klitos Kyriacou
Nov 11 at 12:07
See rules.sonarsource.com/java/RSPEC-1643. Also, why build the bitstring left-to-right and then reverse it (thus using up more memory and taking up more time) when you can simply build it the right way to begin with, using
for (int q=7;q>=0;q--)
.– Klitos Kyriacou
Nov 11 at 12:07
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
You can replace the String concatination +=
with StringBuilder
. This allocates less objects and puts less load on the garbage collector.
int c;
StringBuilder bitString = new StringBuilder();
//read each byte from file, reverse it, add to giant bitString
//reads ALL BYTES
while ((c = fin.read()) != -1) {
byte current = (byte) c;
StringBuilder currentBitString = new StringBuilder();
for (int q = 0; q < 8; q++) {
byte bits = getBit(current, q);
currentBitString.append(bits);
}
bitString.append(currentBitString.reverse());
}
Instead of putting the codes and data into an array nodes
you should use a HashMap
here. You are comparing the code by iterating over the whole array until you find the right match. On average that are n/2
calls to String#equals()
per item. With a HashMap
you reduce this to ~1.
Fill your map with the data for the codes as keys.
Map<String, Integer> nodes = new HashMap<>();
nodes.put(code, data);
Access the data from the map
boolean foundCode = false;
for (int j = 0; j < bitString.length(); j++) {
currentBitString.append(bitString.charAt(j));
Integer data = nodes.get(currentBitString.toString().trim());
if (data != null) {
fw.write(data);
foundCode = true;
}
if (foundCode) {
currentBitString = new StringBuilder();
foundCode = false;
}
}
2
Doesn't the break statement just get out of the entire loop and not check if the code is found?
– TheHaruWhoCodes
Nov 11 at 11:21
I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
– TheHaruWhoCodes
Nov 11 at 11:37
The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
– Simulant
Nov 11 at 11:56
I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
– TheHaruWhoCodes
Nov 11 at 12:13
add a comment |
up vote
0
down vote
Don't read the whole thing into memory. Process your codes as they are encountered. Read enough bits to decode the next code, decode it, retain the unused bits for the subsequent code, repeat.
Don't use strings of characters to represent bits, where you are representing one bit per character. Use bits to represent bits. The shift, and, and or operators are what you should be using. You would have an integer as a bit buffer, with all the bits you need to decode the next code.
Don't do a search over all code lengths, and inside that a linear search of all codes to find your code! I would be hard pressed to come up with a slower approach. You should use a tree descent or table lookup for decoding. If you generate a canonical Huffman code in the first place, there is an easy lookup approach that can be implemented. See puff.c for an example. The textbook approach (which is slower than what puff.c does) is to build the same Huffman tree on the receiving end, and go down that tree bit-by-bit until you get to a symbol. Emit the symbol and repeat.
You should be able to process 200K of compressed input in a few milliseconds on a single core of a modern processor.
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
You can replace the String concatination +=
with StringBuilder
. This allocates less objects and puts less load on the garbage collector.
int c;
StringBuilder bitString = new StringBuilder();
//read each byte from file, reverse it, add to giant bitString
//reads ALL BYTES
while ((c = fin.read()) != -1) {
byte current = (byte) c;
StringBuilder currentBitString = new StringBuilder();
for (int q = 0; q < 8; q++) {
byte bits = getBit(current, q);
currentBitString.append(bits);
}
bitString.append(currentBitString.reverse());
}
Instead of putting the codes and data into an array nodes
you should use a HashMap
here. You are comparing the code by iterating over the whole array until you find the right match. On average that are n/2
calls to String#equals()
per item. With a HashMap
you reduce this to ~1.
Fill your map with the data for the codes as keys.
Map<String, Integer> nodes = new HashMap<>();
nodes.put(code, data);
Access the data from the map
boolean foundCode = false;
for (int j = 0; j < bitString.length(); j++) {
currentBitString.append(bitString.charAt(j));
Integer data = nodes.get(currentBitString.toString().trim());
if (data != null) {
fw.write(data);
foundCode = true;
}
if (foundCode) {
currentBitString = new StringBuilder();
foundCode = false;
}
}
2
Doesn't the break statement just get out of the entire loop and not check if the code is found?
– TheHaruWhoCodes
Nov 11 at 11:21
I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
– TheHaruWhoCodes
Nov 11 at 11:37
The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
– Simulant
Nov 11 at 11:56
I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
– TheHaruWhoCodes
Nov 11 at 12:13
add a comment |
up vote
1
down vote
You can replace the String concatination +=
with StringBuilder
. This allocates less objects and puts less load on the garbage collector.
int c;
StringBuilder bitString = new StringBuilder();
//read each byte from file, reverse it, add to giant bitString
//reads ALL BYTES
while ((c = fin.read()) != -1) {
byte current = (byte) c;
StringBuilder currentBitString = new StringBuilder();
for (int q = 0; q < 8; q++) {
byte bits = getBit(current, q);
currentBitString.append(bits);
}
bitString.append(currentBitString.reverse());
}
Instead of putting the codes and data into an array nodes
you should use a HashMap
here. You are comparing the code by iterating over the whole array until you find the right match. On average that are n/2
calls to String#equals()
per item. With a HashMap
you reduce this to ~1.
Fill your map with the data for the codes as keys.
Map<String, Integer> nodes = new HashMap<>();
nodes.put(code, data);
Access the data from the map
boolean foundCode = false;
for (int j = 0; j < bitString.length(); j++) {
currentBitString.append(bitString.charAt(j));
Integer data = nodes.get(currentBitString.toString().trim());
if (data != null) {
fw.write(data);
foundCode = true;
}
if (foundCode) {
currentBitString = new StringBuilder();
foundCode = false;
}
}
2
Doesn't the break statement just get out of the entire loop and not check if the code is found?
– TheHaruWhoCodes
Nov 11 at 11:21
I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
– TheHaruWhoCodes
Nov 11 at 11:37
The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
– Simulant
Nov 11 at 11:56
I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
– TheHaruWhoCodes
Nov 11 at 12:13
add a comment |
up vote
1
down vote
up vote
1
down vote
You can replace the String concatination +=
with StringBuilder
. This allocates less objects and puts less load on the garbage collector.
int c;
StringBuilder bitString = new StringBuilder();
//read each byte from file, reverse it, add to giant bitString
//reads ALL BYTES
while ((c = fin.read()) != -1) {
byte current = (byte) c;
StringBuilder currentBitString = new StringBuilder();
for (int q = 0; q < 8; q++) {
byte bits = getBit(current, q);
currentBitString.append(bits);
}
bitString.append(currentBitString.reverse());
}
Instead of putting the codes and data into an array nodes
you should use a HashMap
here. You are comparing the code by iterating over the whole array until you find the right match. On average that are n/2
calls to String#equals()
per item. With a HashMap
you reduce this to ~1.
Fill your map with the data for the codes as keys.
Map<String, Integer> nodes = new HashMap<>();
nodes.put(code, data);
Access the data from the map
boolean foundCode = false;
for (int j = 0; j < bitString.length(); j++) {
currentBitString.append(bitString.charAt(j));
Integer data = nodes.get(currentBitString.toString().trim());
if (data != null) {
fw.write(data);
foundCode = true;
}
if (foundCode) {
currentBitString = new StringBuilder();
foundCode = false;
}
}
You can replace the String concatination +=
with StringBuilder
. This allocates less objects and puts less load on the garbage collector.
int c;
StringBuilder bitString = new StringBuilder();
//read each byte from file, reverse it, add to giant bitString
//reads ALL BYTES
while ((c = fin.read()) != -1) {
byte current = (byte) c;
StringBuilder currentBitString = new StringBuilder();
for (int q = 0; q < 8; q++) {
byte bits = getBit(current, q);
currentBitString.append(bits);
}
bitString.append(currentBitString.reverse());
}
Instead of putting the codes and data into an array nodes
you should use a HashMap
here. You are comparing the code by iterating over the whole array until you find the right match. On average that are n/2
calls to String#equals()
per item. With a HashMap
you reduce this to ~1.
Fill your map with the data for the codes as keys.
Map<String, Integer> nodes = new HashMap<>();
nodes.put(code, data);
Access the data from the map
boolean foundCode = false;
for (int j = 0; j < bitString.length(); j++) {
currentBitString.append(bitString.charAt(j));
Integer data = nodes.get(currentBitString.toString().trim());
if (data != null) {
fw.write(data);
foundCode = true;
}
if (foundCode) {
currentBitString = new StringBuilder();
foundCode = false;
}
}
edited Nov 11 at 11:55
answered Nov 11 at 10:27
Simulant
11.3k63972
11.3k63972
2
Doesn't the break statement just get out of the entire loop and not check if the code is found?
– TheHaruWhoCodes
Nov 11 at 11:21
I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
– TheHaruWhoCodes
Nov 11 at 11:37
The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
– Simulant
Nov 11 at 11:56
I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
– TheHaruWhoCodes
Nov 11 at 12:13
add a comment |
2
Doesn't the break statement just get out of the entire loop and not check if the code is found?
– TheHaruWhoCodes
Nov 11 at 11:21
I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
– TheHaruWhoCodes
Nov 11 at 11:37
The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
– Simulant
Nov 11 at 11:56
I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
– TheHaruWhoCodes
Nov 11 at 12:13
2
2
Doesn't the break statement just get out of the entire loop and not check if the code is found?
– TheHaruWhoCodes
Nov 11 at 11:21
Doesn't the break statement just get out of the entire loop and not check if the code is found?
– TheHaruWhoCodes
Nov 11 at 11:21
I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
– TheHaruWhoCodes
Nov 11 at 11:37
I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
– TheHaruWhoCodes
Nov 11 at 11:37
The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
– Simulant
Nov 11 at 11:56
The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
– Simulant
Nov 11 at 11:56
I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
– TheHaruWhoCodes
Nov 11 at 12:13
I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
– TheHaruWhoCodes
Nov 11 at 12:13
add a comment |
up vote
0
down vote
Don't read the whole thing into memory. Process your codes as they are encountered. Read enough bits to decode the next code, decode it, retain the unused bits for the subsequent code, repeat.
Don't use strings of characters to represent bits, where you are representing one bit per character. Use bits to represent bits. The shift, and, and or operators are what you should be using. You would have an integer as a bit buffer, with all the bits you need to decode the next code.
Don't do a search over all code lengths, and inside that a linear search of all codes to find your code! I would be hard pressed to come up with a slower approach. You should use a tree descent or table lookup for decoding. If you generate a canonical Huffman code in the first place, there is an easy lookup approach that can be implemented. See puff.c for an example. The textbook approach (which is slower than what puff.c does) is to build the same Huffman tree on the receiving end, and go down that tree bit-by-bit until you get to a symbol. Emit the symbol and repeat.
You should be able to process 200K of compressed input in a few milliseconds on a single core of a modern processor.
add a comment |
up vote
0
down vote
Don't read the whole thing into memory. Process your codes as they are encountered. Read enough bits to decode the next code, decode it, retain the unused bits for the subsequent code, repeat.
Don't use strings of characters to represent bits, where you are representing one bit per character. Use bits to represent bits. The shift, and, and or operators are what you should be using. You would have an integer as a bit buffer, with all the bits you need to decode the next code.
Don't do a search over all code lengths, and inside that a linear search of all codes to find your code! I would be hard pressed to come up with a slower approach. You should use a tree descent or table lookup for decoding. If you generate a canonical Huffman code in the first place, there is an easy lookup approach that can be implemented. See puff.c for an example. The textbook approach (which is slower than what puff.c does) is to build the same Huffman tree on the receiving end, and go down that tree bit-by-bit until you get to a symbol. Emit the symbol and repeat.
You should be able to process 200K of compressed input in a few milliseconds on a single core of a modern processor.
add a comment |
up vote
0
down vote
up vote
0
down vote
Don't read the whole thing into memory. Process your codes as they are encountered. Read enough bits to decode the next code, decode it, retain the unused bits for the subsequent code, repeat.
Don't use strings of characters to represent bits, where you are representing one bit per character. Use bits to represent bits. The shift, and, and or operators are what you should be using. You would have an integer as a bit buffer, with all the bits you need to decode the next code.
Don't do a search over all code lengths, and inside that a linear search of all codes to find your code! I would be hard pressed to come up with a slower approach. You should use a tree descent or table lookup for decoding. If you generate a canonical Huffman code in the first place, there is an easy lookup approach that can be implemented. See puff.c for an example. The textbook approach (which is slower than what puff.c does) is to build the same Huffman tree on the receiving end, and go down that tree bit-by-bit until you get to a symbol. Emit the symbol and repeat.
You should be able to process 200K of compressed input in a few milliseconds on a single core of a modern processor.
Don't read the whole thing into memory. Process your codes as they are encountered. Read enough bits to decode the next code, decode it, retain the unused bits for the subsequent code, repeat.
Don't use strings of characters to represent bits, where you are representing one bit per character. Use bits to represent bits. The shift, and, and or operators are what you should be using. You would have an integer as a bit buffer, with all the bits you need to decode the next code.
Don't do a search over all code lengths, and inside that a linear search of all codes to find your code! I would be hard pressed to come up with a slower approach. You should use a tree descent or table lookup for decoding. If you generate a canonical Huffman code in the first place, there is an easy lookup approach that can be implemented. See puff.c for an example. The textbook approach (which is slower than what puff.c does) is to build the same Huffman tree on the receiving end, and go down that tree bit-by-bit until you get to a symbol. Emit the symbol and repeat.
You should be able to process 200K of compressed input in a few milliseconds on a single core of a modern processor.
answered Nov 11 at 15:14
Mark Adler
56.6k760107
56.6k760107
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53247027%2fhow-can-i-optimize-huffman-decoding%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Maybe this will be helpful geeksforgeeks.org/…
– Maor Refaeli
Nov 11 at 8:31
You may have a very inefficiently coded implementation but complexity doesn't say anything about that. It's still O(n) time and from a quick scan, O(n) space too. You're asking the wrong question, you can't xhsnde it's complexity but you want to optimize the code. And the question is likely too broad for SO.
– Erwin Bolwidt
Nov 11 at 9:07
1
I have really no plan what your code is doing. But I observe that it's super heavy regarding memory allocation: For each character you read, you allocate a StringBuilder instance, toString() allocates a String instance and
+=
allocates another 9 String instance. There is tremendous potential for optimization.– Codo
Nov 11 at 10:09
1
See rules.sonarsource.com/java/RSPEC-1643. Also, why build the bitstring left-to-right and then reverse it (thus using up more memory and taking up more time) when you can simply build it the right way to begin with, using
for (int q=7;q>=0;q--)
.– Klitos Kyriacou
Nov 11 at 12:07