How can I optimize Huffman Decoding?











up vote
0
down vote

favorite












So I've been trying to decode using huffman, and I have this working function but it has horrible time and space complexity. What I've been doing so far is reading each byte, getting each bit and adding it to the currentBitString. I then reversed the string, and added it to a huge string which bascially contains all the byte data for the file. After that, I would trace the giant string and look for the Huffman code and then if its a match, i would write to file. This code takes about 60 seconds to decode a 200kb, which is very bad, but I'm not really sure how to improve it? I know I could for starters, write more than one byte to the file at a time, but it didn't seem to improve the time when I tried?



         public static void decode(File f) throws Exception {

BufferedInputStream fin = new BufferedInputStream(new FileInputStream(f));
int i = f.getName().lastIndexOf('.');
String extension="txt";
String newFileName=f.getName().substring(0, i)+extension;
File nf = new File(newFileName);
BufferedOutputStream fw = new BufferedOutputStream(new FileOutputStream(nf));
int c;
byte bits;
byte current;
String currentBitString="";
String bitString="";
//read each byte from file, reverse it, add to giant bitString
//reads ALL BYTES
while( (c=fin.read())!=-1 ) {
current=(byte) c;
currentBitString="";
bits=0;
for(int q=0;q<8;q++) {
bits=getBit(current,q);
currentBitString+=bits;
}
StringBuilder bitStringReverse=new StringBuilder(currentBitString);
bitString+=bitStringReverse.reverse().toString();
}
currentBitString="";
boolean foundCode=false;
for(int j=0;j<bitString.length();j++) {
currentBitString+=bitString.charAt(j);
for(int k=0;k<nodes.length;k++) {
//nodes is an array of huffman nodes which contains the the byte
//data and the huffman codes for each byte
if(nodes[k].code.compareTo(currentBitString.trim())==0) {
fw.write(nodes[k].data);
foundCode=true;
break;
}
}
if(foundCode) {
currentBitString="";
foundCode=false;
}

}
fw.flush();
fw.close();
fin.close();

}


here's the gitBit function



        public static byte getBit(byte ID, int position) {
// return cretin bit in selected byte
return (byte) ((ID >> position) & 1);
}


here's the data members of the HuffmanNode class (the nodes array is a array of HuffmanNodes)



       public class HuffmanNode{
byte data;
int repetitions;
String code;
HuffmanNode right;
HuffmanNode left;
}









share|improve this question
























  • Maybe this will be helpful geeksforgeeks.org/…
    – Maor Refaeli
    Nov 11 at 8:31










  • You may have a very inefficiently coded implementation but complexity doesn't say anything about that. It's still O(n) time and from a quick scan, O(n) space too. You're asking the wrong question, you can't xhsnde it's complexity but you want to optimize the code. And the question is likely too broad for SO.
    – Erwin Bolwidt
    Nov 11 at 9:07






  • 1




    I have really no plan what your code is doing. But I observe that it's super heavy regarding memory allocation: For each character you read, you allocate a StringBuilder instance, toString() allocates a String instance and += allocates another 9 String instance. There is tremendous potential for optimization.
    – Codo
    Nov 11 at 10:09






  • 1




    See rules.sonarsource.com/java/RSPEC-1643. Also, why build the bitstring left-to-right and then reverse it (thus using up more memory and taking up more time) when you can simply build it the right way to begin with, using for (int q=7;q>=0;q--).
    – Klitos Kyriacou
    Nov 11 at 12:07















up vote
0
down vote

favorite












So I've been trying to decode using huffman, and I have this working function but it has horrible time and space complexity. What I've been doing so far is reading each byte, getting each bit and adding it to the currentBitString. I then reversed the string, and added it to a huge string which bascially contains all the byte data for the file. After that, I would trace the giant string and look for the Huffman code and then if its a match, i would write to file. This code takes about 60 seconds to decode a 200kb, which is very bad, but I'm not really sure how to improve it? I know I could for starters, write more than one byte to the file at a time, but it didn't seem to improve the time when I tried?



         public static void decode(File f) throws Exception {

BufferedInputStream fin = new BufferedInputStream(new FileInputStream(f));
int i = f.getName().lastIndexOf('.');
String extension="txt";
String newFileName=f.getName().substring(0, i)+extension;
File nf = new File(newFileName);
BufferedOutputStream fw = new BufferedOutputStream(new FileOutputStream(nf));
int c;
byte bits;
byte current;
String currentBitString="";
String bitString="";
//read each byte from file, reverse it, add to giant bitString
//reads ALL BYTES
while( (c=fin.read())!=-1 ) {
current=(byte) c;
currentBitString="";
bits=0;
for(int q=0;q<8;q++) {
bits=getBit(current,q);
currentBitString+=bits;
}
StringBuilder bitStringReverse=new StringBuilder(currentBitString);
bitString+=bitStringReverse.reverse().toString();
}
currentBitString="";
boolean foundCode=false;
for(int j=0;j<bitString.length();j++) {
currentBitString+=bitString.charAt(j);
for(int k=0;k<nodes.length;k++) {
//nodes is an array of huffman nodes which contains the the byte
//data and the huffman codes for each byte
if(nodes[k].code.compareTo(currentBitString.trim())==0) {
fw.write(nodes[k].data);
foundCode=true;
break;
}
}
if(foundCode) {
currentBitString="";
foundCode=false;
}

}
fw.flush();
fw.close();
fin.close();

}


here's the gitBit function



        public static byte getBit(byte ID, int position) {
// return cretin bit in selected byte
return (byte) ((ID >> position) & 1);
}


here's the data members of the HuffmanNode class (the nodes array is a array of HuffmanNodes)



       public class HuffmanNode{
byte data;
int repetitions;
String code;
HuffmanNode right;
HuffmanNode left;
}









share|improve this question
























  • Maybe this will be helpful geeksforgeeks.org/…
    – Maor Refaeli
    Nov 11 at 8:31










  • You may have a very inefficiently coded implementation but complexity doesn't say anything about that. It's still O(n) time and from a quick scan, O(n) space too. You're asking the wrong question, you can't xhsnde it's complexity but you want to optimize the code. And the question is likely too broad for SO.
    – Erwin Bolwidt
    Nov 11 at 9:07






  • 1




    I have really no plan what your code is doing. But I observe that it's super heavy regarding memory allocation: For each character you read, you allocate a StringBuilder instance, toString() allocates a String instance and += allocates another 9 String instance. There is tremendous potential for optimization.
    – Codo
    Nov 11 at 10:09






  • 1




    See rules.sonarsource.com/java/RSPEC-1643. Also, why build the bitstring left-to-right and then reverse it (thus using up more memory and taking up more time) when you can simply build it the right way to begin with, using for (int q=7;q>=0;q--).
    – Klitos Kyriacou
    Nov 11 at 12:07













up vote
0
down vote

favorite









up vote
0
down vote

favorite











So I've been trying to decode using huffman, and I have this working function but it has horrible time and space complexity. What I've been doing so far is reading each byte, getting each bit and adding it to the currentBitString. I then reversed the string, and added it to a huge string which bascially contains all the byte data for the file. After that, I would trace the giant string and look for the Huffman code and then if its a match, i would write to file. This code takes about 60 seconds to decode a 200kb, which is very bad, but I'm not really sure how to improve it? I know I could for starters, write more than one byte to the file at a time, but it didn't seem to improve the time when I tried?



         public static void decode(File f) throws Exception {

BufferedInputStream fin = new BufferedInputStream(new FileInputStream(f));
int i = f.getName().lastIndexOf('.');
String extension="txt";
String newFileName=f.getName().substring(0, i)+extension;
File nf = new File(newFileName);
BufferedOutputStream fw = new BufferedOutputStream(new FileOutputStream(nf));
int c;
byte bits;
byte current;
String currentBitString="";
String bitString="";
//read each byte from file, reverse it, add to giant bitString
//reads ALL BYTES
while( (c=fin.read())!=-1 ) {
current=(byte) c;
currentBitString="";
bits=0;
for(int q=0;q<8;q++) {
bits=getBit(current,q);
currentBitString+=bits;
}
StringBuilder bitStringReverse=new StringBuilder(currentBitString);
bitString+=bitStringReverse.reverse().toString();
}
currentBitString="";
boolean foundCode=false;
for(int j=0;j<bitString.length();j++) {
currentBitString+=bitString.charAt(j);
for(int k=0;k<nodes.length;k++) {
//nodes is an array of huffman nodes which contains the the byte
//data and the huffman codes for each byte
if(nodes[k].code.compareTo(currentBitString.trim())==0) {
fw.write(nodes[k].data);
foundCode=true;
break;
}
}
if(foundCode) {
currentBitString="";
foundCode=false;
}

}
fw.flush();
fw.close();
fin.close();

}


here's the gitBit function



        public static byte getBit(byte ID, int position) {
// return cretin bit in selected byte
return (byte) ((ID >> position) & 1);
}


here's the data members of the HuffmanNode class (the nodes array is a array of HuffmanNodes)



       public class HuffmanNode{
byte data;
int repetitions;
String code;
HuffmanNode right;
HuffmanNode left;
}









share|improve this question















So I've been trying to decode using huffman, and I have this working function but it has horrible time and space complexity. What I've been doing so far is reading each byte, getting each bit and adding it to the currentBitString. I then reversed the string, and added it to a huge string which bascially contains all the byte data for the file. After that, I would trace the giant string and look for the Huffman code and then if its a match, i would write to file. This code takes about 60 seconds to decode a 200kb, which is very bad, but I'm not really sure how to improve it? I know I could for starters, write more than one byte to the file at a time, but it didn't seem to improve the time when I tried?



         public static void decode(File f) throws Exception {

BufferedInputStream fin = new BufferedInputStream(new FileInputStream(f));
int i = f.getName().lastIndexOf('.');
String extension="txt";
String newFileName=f.getName().substring(0, i)+extension;
File nf = new File(newFileName);
BufferedOutputStream fw = new BufferedOutputStream(new FileOutputStream(nf));
int c;
byte bits;
byte current;
String currentBitString="";
String bitString="";
//read each byte from file, reverse it, add to giant bitString
//reads ALL BYTES
while( (c=fin.read())!=-1 ) {
current=(byte) c;
currentBitString="";
bits=0;
for(int q=0;q<8;q++) {
bits=getBit(current,q);
currentBitString+=bits;
}
StringBuilder bitStringReverse=new StringBuilder(currentBitString);
bitString+=bitStringReverse.reverse().toString();
}
currentBitString="";
boolean foundCode=false;
for(int j=0;j<bitString.length();j++) {
currentBitString+=bitString.charAt(j);
for(int k=0;k<nodes.length;k++) {
//nodes is an array of huffman nodes which contains the the byte
//data and the huffman codes for each byte
if(nodes[k].code.compareTo(currentBitString.trim())==0) {
fw.write(nodes[k].data);
foundCode=true;
break;
}
}
if(foundCode) {
currentBitString="";
foundCode=false;
}

}
fw.flush();
fw.close();
fin.close();

}


here's the gitBit function



        public static byte getBit(byte ID, int position) {
// return cretin bit in selected byte
return (byte) ((ID >> position) & 1);
}


here's the data members of the HuffmanNode class (the nodes array is a array of HuffmanNodes)



       public class HuffmanNode{
byte data;
int repetitions;
String code;
HuffmanNode right;
HuffmanNode left;
}






java huffman-code






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 11 at 10:25

























asked Nov 11 at 8:28









TheHaruWhoCodes

207




207












  • Maybe this will be helpful geeksforgeeks.org/…
    – Maor Refaeli
    Nov 11 at 8:31










  • You may have a very inefficiently coded implementation but complexity doesn't say anything about that. It's still O(n) time and from a quick scan, O(n) space too. You're asking the wrong question, you can't xhsnde it's complexity but you want to optimize the code. And the question is likely too broad for SO.
    – Erwin Bolwidt
    Nov 11 at 9:07






  • 1




    I have really no plan what your code is doing. But I observe that it's super heavy regarding memory allocation: For each character you read, you allocate a StringBuilder instance, toString() allocates a String instance and += allocates another 9 String instance. There is tremendous potential for optimization.
    – Codo
    Nov 11 at 10:09






  • 1




    See rules.sonarsource.com/java/RSPEC-1643. Also, why build the bitstring left-to-right and then reverse it (thus using up more memory and taking up more time) when you can simply build it the right way to begin with, using for (int q=7;q>=0;q--).
    – Klitos Kyriacou
    Nov 11 at 12:07


















  • Maybe this will be helpful geeksforgeeks.org/…
    – Maor Refaeli
    Nov 11 at 8:31










  • You may have a very inefficiently coded implementation but complexity doesn't say anything about that. It's still O(n) time and from a quick scan, O(n) space too. You're asking the wrong question, you can't xhsnde it's complexity but you want to optimize the code. And the question is likely too broad for SO.
    – Erwin Bolwidt
    Nov 11 at 9:07






  • 1




    I have really no plan what your code is doing. But I observe that it's super heavy regarding memory allocation: For each character you read, you allocate a StringBuilder instance, toString() allocates a String instance and += allocates another 9 String instance. There is tremendous potential for optimization.
    – Codo
    Nov 11 at 10:09






  • 1




    See rules.sonarsource.com/java/RSPEC-1643. Also, why build the bitstring left-to-right and then reverse it (thus using up more memory and taking up more time) when you can simply build it the right way to begin with, using for (int q=7;q>=0;q--).
    – Klitos Kyriacou
    Nov 11 at 12:07
















Maybe this will be helpful geeksforgeeks.org/…
– Maor Refaeli
Nov 11 at 8:31




Maybe this will be helpful geeksforgeeks.org/…
– Maor Refaeli
Nov 11 at 8:31












You may have a very inefficiently coded implementation but complexity doesn't say anything about that. It's still O(n) time and from a quick scan, O(n) space too. You're asking the wrong question, you can't xhsnde it's complexity but you want to optimize the code. And the question is likely too broad for SO.
– Erwin Bolwidt
Nov 11 at 9:07




You may have a very inefficiently coded implementation but complexity doesn't say anything about that. It's still O(n) time and from a quick scan, O(n) space too. You're asking the wrong question, you can't xhsnde it's complexity but you want to optimize the code. And the question is likely too broad for SO.
– Erwin Bolwidt
Nov 11 at 9:07




1




1




I have really no plan what your code is doing. But I observe that it's super heavy regarding memory allocation: For each character you read, you allocate a StringBuilder instance, toString() allocates a String instance and += allocates another 9 String instance. There is tremendous potential for optimization.
– Codo
Nov 11 at 10:09




I have really no plan what your code is doing. But I observe that it's super heavy regarding memory allocation: For each character you read, you allocate a StringBuilder instance, toString() allocates a String instance and += allocates another 9 String instance. There is tremendous potential for optimization.
– Codo
Nov 11 at 10:09




1




1




See rules.sonarsource.com/java/RSPEC-1643. Also, why build the bitstring left-to-right and then reverse it (thus using up more memory and taking up more time) when you can simply build it the right way to begin with, using for (int q=7;q>=0;q--).
– Klitos Kyriacou
Nov 11 at 12:07




See rules.sonarsource.com/java/RSPEC-1643. Also, why build the bitstring left-to-right and then reverse it (thus using up more memory and taking up more time) when you can simply build it the right way to begin with, using for (int q=7;q>=0;q--).
– Klitos Kyriacou
Nov 11 at 12:07












2 Answers
2






active

oldest

votes

















up vote
1
down vote













You can replace the String concatination += with StringBuilder. This allocates less objects and puts less load on the garbage collector.



int c;
StringBuilder bitString = new StringBuilder();
//read each byte from file, reverse it, add to giant bitString
//reads ALL BYTES
while ((c = fin.read()) != -1) {
byte current = (byte) c;
StringBuilder currentBitString = new StringBuilder();
for (int q = 0; q < 8; q++) {
byte bits = getBit(current, q);
currentBitString.append(bits);
}
bitString.append(currentBitString.reverse());
}


Instead of putting the codes and data into an array nodes you should use a HashMap here. You are comparing the code by iterating over the whole array until you find the right match. On average that are n/2 calls to String#equals() per item. With a HashMap you reduce this to ~1.



Fill your map with the data for the codes as keys.



Map<String, Integer> nodes = new HashMap<>();
nodes.put(code, data);


Access the data from the map



boolean foundCode = false;
for (int j = 0; j < bitString.length(); j++) {
currentBitString.append(bitString.charAt(j));
Integer data = nodes.get(currentBitString.toString().trim());
if (data != null) {
fw.write(data);
foundCode = true;
}
if (foundCode) {
currentBitString = new StringBuilder();
foundCode = false;
}
}





share|improve this answer



















  • 2




    Doesn't the break statement just get out of the entire loop and not check if the code is found?
    – TheHaruWhoCodes
    Nov 11 at 11:21










  • I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
    – TheHaruWhoCodes
    Nov 11 at 11:37










  • The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
    – Simulant
    Nov 11 at 11:56












  • I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
    – TheHaruWhoCodes
    Nov 11 at 12:13


















up vote
0
down vote














  1. Don't read the whole thing into memory. Process your codes as they are encountered. Read enough bits to decode the next code, decode it, retain the unused bits for the subsequent code, repeat.


  2. Don't use strings of characters to represent bits, where you are representing one bit per character. Use bits to represent bits. The shift, and, and or operators are what you should be using. You would have an integer as a bit buffer, with all the bits you need to decode the next code.


  3. Don't do a search over all code lengths, and inside that a linear search of all codes to find your code! I would be hard pressed to come up with a slower approach. You should use a tree descent or table lookup for decoding. If you generate a canonical Huffman code in the first place, there is an easy lookup approach that can be implemented. See puff.c for an example. The textbook approach (which is slower than what puff.c does) is to build the same Huffman tree on the receiving end, and go down that tree bit-by-bit until you get to a symbol. Emit the symbol and repeat.



You should be able to process 200K of compressed input in a few milliseconds on a single core of a modern processor.






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53247027%2fhow-can-i-optimize-huffman-decoding%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote













    You can replace the String concatination += with StringBuilder. This allocates less objects and puts less load on the garbage collector.



    int c;
    StringBuilder bitString = new StringBuilder();
    //read each byte from file, reverse it, add to giant bitString
    //reads ALL BYTES
    while ((c = fin.read()) != -1) {
    byte current = (byte) c;
    StringBuilder currentBitString = new StringBuilder();
    for (int q = 0; q < 8; q++) {
    byte bits = getBit(current, q);
    currentBitString.append(bits);
    }
    bitString.append(currentBitString.reverse());
    }


    Instead of putting the codes and data into an array nodes you should use a HashMap here. You are comparing the code by iterating over the whole array until you find the right match. On average that are n/2 calls to String#equals() per item. With a HashMap you reduce this to ~1.



    Fill your map with the data for the codes as keys.



    Map<String, Integer> nodes = new HashMap<>();
    nodes.put(code, data);


    Access the data from the map



    boolean foundCode = false;
    for (int j = 0; j < bitString.length(); j++) {
    currentBitString.append(bitString.charAt(j));
    Integer data = nodes.get(currentBitString.toString().trim());
    if (data != null) {
    fw.write(data);
    foundCode = true;
    }
    if (foundCode) {
    currentBitString = new StringBuilder();
    foundCode = false;
    }
    }





    share|improve this answer



















    • 2




      Doesn't the break statement just get out of the entire loop and not check if the code is found?
      – TheHaruWhoCodes
      Nov 11 at 11:21










    • I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
      – TheHaruWhoCodes
      Nov 11 at 11:37










    • The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
      – Simulant
      Nov 11 at 11:56












    • I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
      – TheHaruWhoCodes
      Nov 11 at 12:13















    up vote
    1
    down vote













    You can replace the String concatination += with StringBuilder. This allocates less objects and puts less load on the garbage collector.



    int c;
    StringBuilder bitString = new StringBuilder();
    //read each byte from file, reverse it, add to giant bitString
    //reads ALL BYTES
    while ((c = fin.read()) != -1) {
    byte current = (byte) c;
    StringBuilder currentBitString = new StringBuilder();
    for (int q = 0; q < 8; q++) {
    byte bits = getBit(current, q);
    currentBitString.append(bits);
    }
    bitString.append(currentBitString.reverse());
    }


    Instead of putting the codes and data into an array nodes you should use a HashMap here. You are comparing the code by iterating over the whole array until you find the right match. On average that are n/2 calls to String#equals() per item. With a HashMap you reduce this to ~1.



    Fill your map with the data for the codes as keys.



    Map<String, Integer> nodes = new HashMap<>();
    nodes.put(code, data);


    Access the data from the map



    boolean foundCode = false;
    for (int j = 0; j < bitString.length(); j++) {
    currentBitString.append(bitString.charAt(j));
    Integer data = nodes.get(currentBitString.toString().trim());
    if (data != null) {
    fw.write(data);
    foundCode = true;
    }
    if (foundCode) {
    currentBitString = new StringBuilder();
    foundCode = false;
    }
    }





    share|improve this answer



















    • 2




      Doesn't the break statement just get out of the entire loop and not check if the code is found?
      – TheHaruWhoCodes
      Nov 11 at 11:21










    • I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
      – TheHaruWhoCodes
      Nov 11 at 11:37










    • The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
      – Simulant
      Nov 11 at 11:56












    • I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
      – TheHaruWhoCodes
      Nov 11 at 12:13













    up vote
    1
    down vote










    up vote
    1
    down vote









    You can replace the String concatination += with StringBuilder. This allocates less objects and puts less load on the garbage collector.



    int c;
    StringBuilder bitString = new StringBuilder();
    //read each byte from file, reverse it, add to giant bitString
    //reads ALL BYTES
    while ((c = fin.read()) != -1) {
    byte current = (byte) c;
    StringBuilder currentBitString = new StringBuilder();
    for (int q = 0; q < 8; q++) {
    byte bits = getBit(current, q);
    currentBitString.append(bits);
    }
    bitString.append(currentBitString.reverse());
    }


    Instead of putting the codes and data into an array nodes you should use a HashMap here. You are comparing the code by iterating over the whole array until you find the right match. On average that are n/2 calls to String#equals() per item. With a HashMap you reduce this to ~1.



    Fill your map with the data for the codes as keys.



    Map<String, Integer> nodes = new HashMap<>();
    nodes.put(code, data);


    Access the data from the map



    boolean foundCode = false;
    for (int j = 0; j < bitString.length(); j++) {
    currentBitString.append(bitString.charAt(j));
    Integer data = nodes.get(currentBitString.toString().trim());
    if (data != null) {
    fw.write(data);
    foundCode = true;
    }
    if (foundCode) {
    currentBitString = new StringBuilder();
    foundCode = false;
    }
    }





    share|improve this answer














    You can replace the String concatination += with StringBuilder. This allocates less objects and puts less load on the garbage collector.



    int c;
    StringBuilder bitString = new StringBuilder();
    //read each byte from file, reverse it, add to giant bitString
    //reads ALL BYTES
    while ((c = fin.read()) != -1) {
    byte current = (byte) c;
    StringBuilder currentBitString = new StringBuilder();
    for (int q = 0; q < 8; q++) {
    byte bits = getBit(current, q);
    currentBitString.append(bits);
    }
    bitString.append(currentBitString.reverse());
    }


    Instead of putting the codes and data into an array nodes you should use a HashMap here. You are comparing the code by iterating over the whole array until you find the right match. On average that are n/2 calls to String#equals() per item. With a HashMap you reduce this to ~1.



    Fill your map with the data for the codes as keys.



    Map<String, Integer> nodes = new HashMap<>();
    nodes.put(code, data);


    Access the data from the map



    boolean foundCode = false;
    for (int j = 0; j < bitString.length(); j++) {
    currentBitString.append(bitString.charAt(j));
    Integer data = nodes.get(currentBitString.toString().trim());
    if (data != null) {
    fw.write(data);
    foundCode = true;
    }
    if (foundCode) {
    currentBitString = new StringBuilder();
    foundCode = false;
    }
    }






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 11 at 11:55

























    answered Nov 11 at 10:27









    Simulant

    11.3k63972




    11.3k63972








    • 2




      Doesn't the break statement just get out of the entire loop and not check if the code is found?
      – TheHaruWhoCodes
      Nov 11 at 11:21










    • I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
      – TheHaruWhoCodes
      Nov 11 at 11:37










    • The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
      – Simulant
      Nov 11 at 11:56












    • I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
      – TheHaruWhoCodes
      Nov 11 at 12:13














    • 2




      Doesn't the break statement just get out of the entire loop and not check if the code is found?
      – TheHaruWhoCodes
      Nov 11 at 11:21










    • I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
      – TheHaruWhoCodes
      Nov 11 at 11:37










    • The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
      – Simulant
      Nov 11 at 11:56












    • I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
      – TheHaruWhoCodes
      Nov 11 at 12:13








    2




    2




    Doesn't the break statement just get out of the entire loop and not check if the code is found?
    – TheHaruWhoCodes
    Nov 11 at 11:21




    Doesn't the break statement just get out of the entire loop and not check if the code is found?
    – TheHaruWhoCodes
    Nov 11 at 11:21












    I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
    – TheHaruWhoCodes
    Nov 11 at 11:37




    I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
    – TheHaruWhoCodes
    Nov 11 at 11:37












    The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
    – Simulant
    Nov 11 at 11:56






    The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
    – Simulant
    Nov 11 at 11:56














    I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
    – TheHaruWhoCodes
    Nov 11 at 12:13




    I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
    – TheHaruWhoCodes
    Nov 11 at 12:13












    up vote
    0
    down vote














    1. Don't read the whole thing into memory. Process your codes as they are encountered. Read enough bits to decode the next code, decode it, retain the unused bits for the subsequent code, repeat.


    2. Don't use strings of characters to represent bits, where you are representing one bit per character. Use bits to represent bits. The shift, and, and or operators are what you should be using. You would have an integer as a bit buffer, with all the bits you need to decode the next code.


    3. Don't do a search over all code lengths, and inside that a linear search of all codes to find your code! I would be hard pressed to come up with a slower approach. You should use a tree descent or table lookup for decoding. If you generate a canonical Huffman code in the first place, there is an easy lookup approach that can be implemented. See puff.c for an example. The textbook approach (which is slower than what puff.c does) is to build the same Huffman tree on the receiving end, and go down that tree bit-by-bit until you get to a symbol. Emit the symbol and repeat.



    You should be able to process 200K of compressed input in a few milliseconds on a single core of a modern processor.






    share|improve this answer

























      up vote
      0
      down vote














      1. Don't read the whole thing into memory. Process your codes as they are encountered. Read enough bits to decode the next code, decode it, retain the unused bits for the subsequent code, repeat.


      2. Don't use strings of characters to represent bits, where you are representing one bit per character. Use bits to represent bits. The shift, and, and or operators are what you should be using. You would have an integer as a bit buffer, with all the bits you need to decode the next code.


      3. Don't do a search over all code lengths, and inside that a linear search of all codes to find your code! I would be hard pressed to come up with a slower approach. You should use a tree descent or table lookup for decoding. If you generate a canonical Huffman code in the first place, there is an easy lookup approach that can be implemented. See puff.c for an example. The textbook approach (which is slower than what puff.c does) is to build the same Huffman tree on the receiving end, and go down that tree bit-by-bit until you get to a symbol. Emit the symbol and repeat.



      You should be able to process 200K of compressed input in a few milliseconds on a single core of a modern processor.






      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote










        1. Don't read the whole thing into memory. Process your codes as they are encountered. Read enough bits to decode the next code, decode it, retain the unused bits for the subsequent code, repeat.


        2. Don't use strings of characters to represent bits, where you are representing one bit per character. Use bits to represent bits. The shift, and, and or operators are what you should be using. You would have an integer as a bit buffer, with all the bits you need to decode the next code.


        3. Don't do a search over all code lengths, and inside that a linear search of all codes to find your code! I would be hard pressed to come up with a slower approach. You should use a tree descent or table lookup for decoding. If you generate a canonical Huffman code in the first place, there is an easy lookup approach that can be implemented. See puff.c for an example. The textbook approach (which is slower than what puff.c does) is to build the same Huffman tree on the receiving end, and go down that tree bit-by-bit until you get to a symbol. Emit the symbol and repeat.



        You should be able to process 200K of compressed input in a few milliseconds on a single core of a modern processor.






        share|improve this answer













        1. Don't read the whole thing into memory. Process your codes as they are encountered. Read enough bits to decode the next code, decode it, retain the unused bits for the subsequent code, repeat.


        2. Don't use strings of characters to represent bits, where you are representing one bit per character. Use bits to represent bits. The shift, and, and or operators are what you should be using. You would have an integer as a bit buffer, with all the bits you need to decode the next code.


        3. Don't do a search over all code lengths, and inside that a linear search of all codes to find your code! I would be hard pressed to come up with a slower approach. You should use a tree descent or table lookup for decoding. If you generate a canonical Huffman code in the first place, there is an easy lookup approach that can be implemented. See puff.c for an example. The textbook approach (which is slower than what puff.c does) is to build the same Huffman tree on the receiving end, and go down that tree bit-by-bit until you get to a symbol. Emit the symbol and repeat.



        You should be able to process 200K of compressed input in a few milliseconds on a single core of a modern processor.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 11 at 15:14









        Mark Adler

        56.6k760107




        56.6k760107






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53247027%2fhow-can-i-optimize-huffman-decoding%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Xamarin.iOS Cant Deploy on Iphone

            Glorious Revolution

            Dulmage-Mendelsohn matrix decomposition in Python