How can I optimize Huffman Decoding?

up vote
0
down vote

favorite

So I've been trying to decode using huffman, and I have this working function but it has horrible time and space complexity. What I've been doing so far is reading each byte, getting each bit and adding it to the currentBitString. I then reversed the string, and added it to a huge string which bascially contains all the byte data for the file. After that, I would trace the giant string and look for the Huffman code and then if its a match, i would write to file. This code takes about 60 seconds to decode a 200kb, which is very bad, but I'm not really sure how to improve it? I know I could for starters, write more than one byte to the file at a time, but it didn't seem to improve the time when I tried?

         public static void decode(File f) throws Exception {



    BufferedInputStream fin = new BufferedInputStream(new FileInputStream(f));

    int i = f.getName().lastIndexOf('.');

    String extension="txt";

    String newFileName=f.getName().substring(0, i)+extension;

    File nf = new File(newFileName);

    BufferedOutputStream fw = new BufferedOutputStream(new FileOutputStream(nf));

    int c;

    byte bits;

    byte current;

    String currentBitString="";

    String bitString="";

    //read each byte from file, reverse it, add to giant bitString

    //reads ALL BYTES

    while( (c=fin.read())!=-1 ) {

        current=(byte) c;

        currentBitString="";

        bits=0;

        for(int q=0;q<8;q++) {

            bits=getBit(current,q);

            currentBitString+=bits;

        }

        StringBuilder bitStringReverse=new StringBuilder(currentBitString);

        bitString+=bitStringReverse.reverse().toString();

    }

    currentBitString="";

    boolean foundCode=false;

    for(int j=0;j<bitString.length();j++) {

        currentBitString+=bitString.charAt(j);

        for(int k=0;k<nodes.length;k++) {

            //nodes is an array of huffman nodes which contains the the byte 

            //data and the huffman codes for each byte

            if(nodes[k].code.compareTo(currentBitString.trim())==0) {

                fw.write(nodes[k].data);    

                foundCode=true;

                break;

            }

        }

        if(foundCode) {

            currentBitString="";

            foundCode=false;

        }



    }

    fw.flush();

    fw.close();

    fin.close();



}

here's the gitBit function

        public static byte getBit(byte ID, int position) {

        // return cretin bit in selected byte

        return (byte) ((ID >> position) & 1);

        }

here's the data members of the HuffmanNode class (the nodes array is a array of HuffmanNodes)

       public class HuffmanNode{

       byte data;

       int repetitions;

       String code;

       HuffmanNode right;

       HuffmanNode left;

       }

edited Nov 11 at 10:25

asked Nov 11 at 8:28

TheHaruWhoCodes

207

Maybe this will be helpful geeksforgeeks.org/…
– Maor Refaeli
Nov 11 at 8:31

You may have a very inefficiently coded implementation but complexity doesn't say anything about that. It's still O(n) time and from a quick scan, O(n) space too. You're asking the wrong question, you can't xhsnde it's complexity but you want to optimize the code. And the question is likely too broad for SO.
– Erwin Bolwidt
Nov 11 at 9:07

1

I have really no plan what your code is doing. But I observe that it's super heavy regarding memory allocation: For each character you read, you allocate a StringBuilder instance, toString() allocates a String instance and += allocates another 9 String instance. There is tremendous potential for optimization.
– Codo
Nov 11 at 10:09

1

See rules.sonarsource.com/java/RSPEC-1643. Also, why build the bitstring left-to-right and then reverse it (thus using up more memory and taking up more time) when you can simply build it the right way to begin with, using for (int q=7;q>=0;q--).
– Klitos Kyriacou
Nov 11 at 12:07

add a comment |

up vote
0
down vote

favorite

         public static void decode(File f) throws Exception {



    BufferedInputStream fin = new BufferedInputStream(new FileInputStream(f));

    int i = f.getName().lastIndexOf('.');

    String extension="txt";

    String newFileName=f.getName().substring(0, i)+extension;

    File nf = new File(newFileName);

    BufferedOutputStream fw = new BufferedOutputStream(new FileOutputStream(nf));

    int c;

    byte bits;

    byte current;

    String currentBitString="";

    String bitString="";

    //read each byte from file, reverse it, add to giant bitString

    //reads ALL BYTES

    while( (c=fin.read())!=-1 ) {

        current=(byte) c;

        currentBitString="";

        bits=0;

        for(int q=0;q<8;q++) {

            bits=getBit(current,q);

            currentBitString+=bits;

        }

        StringBuilder bitStringReverse=new StringBuilder(currentBitString);

        bitString+=bitStringReverse.reverse().toString();

    }

    currentBitString="";

    boolean foundCode=false;

    for(int j=0;j<bitString.length();j++) {

        currentBitString+=bitString.charAt(j);

        for(int k=0;k<nodes.length;k++) {

            //nodes is an array of huffman nodes which contains the the byte 

            //data and the huffman codes for each byte

            if(nodes[k].code.compareTo(currentBitString.trim())==0) {

                fw.write(nodes[k].data);    

                foundCode=true;

                break;

            }

        }

        if(foundCode) {

            currentBitString="";

            foundCode=false;

        }



    }

    fw.flush();

    fw.close();

    fin.close();



}

here's the gitBit function

        public static byte getBit(byte ID, int position) {

        // return cretin bit in selected byte

        return (byte) ((ID >> position) & 1);

        }

here's the data members of the HuffmanNode class (the nodes array is a array of HuffmanNodes)

       public class HuffmanNode{

       byte data;

       int repetitions;

       String code;

       HuffmanNode right;

       HuffmanNode left;

       }

edited Nov 11 at 10:25

asked Nov 11 at 8:28

TheHaruWhoCodes

207

Maybe this will be helpful geeksforgeeks.org/…
– Maor Refaeli
Nov 11 at 8:31

You may have a very inefficiently coded implementation but complexity doesn't say anything about that. It's still O(n) time and from a quick scan, O(n) space too. You're asking the wrong question, you can't xhsnde it's complexity but you want to optimize the code. And the question is likely too broad for SO.
– Erwin Bolwidt
Nov 11 at 9:07

1

I have really no plan what your code is doing. But I observe that it's super heavy regarding memory allocation: For each character you read, you allocate a StringBuilder instance, toString() allocates a String instance and += allocates another 9 String instance. There is tremendous potential for optimization.
– Codo
Nov 11 at 10:09

1

See rules.sonarsource.com/java/RSPEC-1643. Also, why build the bitstring left-to-right and then reverse it (thus using up more memory and taking up more time) when you can simply build it the right way to begin with, using for (int q=7;q>=0;q--).
– Klitos Kyriacou
Nov 11 at 12:07

add a comment |

up vote
0
down vote

favorite

         public static void decode(File f) throws Exception {



    BufferedInputStream fin = new BufferedInputStream(new FileInputStream(f));

    int i = f.getName().lastIndexOf('.');

    String extension="txt";

    String newFileName=f.getName().substring(0, i)+extension;

    File nf = new File(newFileName);

    BufferedOutputStream fw = new BufferedOutputStream(new FileOutputStream(nf));

    int c;

    byte bits;

    byte current;

    String currentBitString="";

    String bitString="";

    //read each byte from file, reverse it, add to giant bitString

    //reads ALL BYTES

    while( (c=fin.read())!=-1 ) {

        current=(byte) c;

        currentBitString="";

        bits=0;

        for(int q=0;q<8;q++) {

            bits=getBit(current,q);

            currentBitString+=bits;

        }

        StringBuilder bitStringReverse=new StringBuilder(currentBitString);

        bitString+=bitStringReverse.reverse().toString();

    }

    currentBitString="";

    boolean foundCode=false;

    for(int j=0;j<bitString.length();j++) {

        currentBitString+=bitString.charAt(j);

        for(int k=0;k<nodes.length;k++) {

            //nodes is an array of huffman nodes which contains the the byte 

            //data and the huffman codes for each byte

            if(nodes[k].code.compareTo(currentBitString.trim())==0) {

                fw.write(nodes[k].data);    

                foundCode=true;

                break;

            }

        }

        if(foundCode) {

            currentBitString="";

            foundCode=false;

        }



    }

    fw.flush();

    fw.close();

    fin.close();



}

here's the gitBit function

        public static byte getBit(byte ID, int position) {

        // return cretin bit in selected byte

        return (byte) ((ID >> position) & 1);

        }

here's the data members of the HuffmanNode class (the nodes array is a array of HuffmanNodes)

       public class HuffmanNode{

       byte data;

       int repetitions;

       String code;

       HuffmanNode right;

       HuffmanNode left;

       }

edited Nov 11 at 10:25

asked Nov 11 at 8:28

TheHaruWhoCodes

207

         public static void decode(File f) throws Exception {



    BufferedInputStream fin = new BufferedInputStream(new FileInputStream(f));

    int i = f.getName().lastIndexOf('.');

    String extension="txt";

    String newFileName=f.getName().substring(0, i)+extension;

    File nf = new File(newFileName);

    BufferedOutputStream fw = new BufferedOutputStream(new FileOutputStream(nf));

    int c;

    byte bits;

    byte current;

    String currentBitString="";

    String bitString="";

    //read each byte from file, reverse it, add to giant bitString

    //reads ALL BYTES

    while( (c=fin.read())!=-1 ) {

        current=(byte) c;

        currentBitString="";

        bits=0;

        for(int q=0;q<8;q++) {

            bits=getBit(current,q);

            currentBitString+=bits;

        }

        StringBuilder bitStringReverse=new StringBuilder(currentBitString);

        bitString+=bitStringReverse.reverse().toString();

    }

    currentBitString="";

    boolean foundCode=false;

    for(int j=0;j<bitString.length();j++) {

        currentBitString+=bitString.charAt(j);

        for(int k=0;k<nodes.length;k++) {

            //nodes is an array of huffman nodes which contains the the byte 

            //data and the huffman codes for each byte

            if(nodes[k].code.compareTo(currentBitString.trim())==0) {

                fw.write(nodes[k].data);    

                foundCode=true;

                break;

            }

        }

        if(foundCode) {

            currentBitString="";

            foundCode=false;

        }



    }

    fw.flush();

    fw.close();

    fin.close();



}

here's the gitBit function

        public static byte getBit(byte ID, int position) {

        // return cretin bit in selected byte

        return (byte) ((ID >> position) & 1);

        }

here's the data members of the HuffmanNode class (the nodes array is a array of HuffmanNodes)

       public class HuffmanNode{

       byte data;

       int repetitions;

       String code;

       HuffmanNode right;

       HuffmanNode left;

       }

java huffman-code

edited Nov 11 at 10:25

asked Nov 11 at 8:28

TheHaruWhoCodes

207

edited Nov 11 at 10:25

asked Nov 11 at 8:28

TheHaruWhoCodes

207

edited Nov 11 at 10:25

asked Nov 11 at 8:28

TheHaruWhoCodes

207

asked Nov 11 at 8:28

TheHaruWhoCodes

207

asked Nov 11 at 8:28

TheHaruWhoCodes

207

Maybe this will be helpful geeksforgeeks.org/…
– Maor Refaeli
Nov 11 at 8:31

You may have a very inefficiently coded implementation but complexity doesn't say anything about that. It's still O(n) time and from a quick scan, O(n) space too. You're asking the wrong question, you can't xhsnde it's complexity but you want to optimize the code. And the question is likely too broad for SO.
– Erwin Bolwidt
Nov 11 at 9:07

1

I have really no plan what your code is doing. But I observe that it's super heavy regarding memory allocation: For each character you read, you allocate a StringBuilder instance, toString() allocates a String instance and += allocates another 9 String instance. There is tremendous potential for optimization.
– Codo
Nov 11 at 10:09

1

See rules.sonarsource.com/java/RSPEC-1643. Also, why build the bitstring left-to-right and then reverse it (thus using up more memory and taking up more time) when you can simply build it the right way to begin with, using for (int q=7;q>=0;q--).
– Klitos Kyriacou
Nov 11 at 12:07

add a comment |

Maybe this will be helpful geeksforgeeks.org/…
– Maor Refaeli
Nov 11 at 8:31

You may have a very inefficiently coded implementation but complexity doesn't say anything about that. It's still O(n) time and from a quick scan, O(n) space too. You're asking the wrong question, you can't xhsnde it's complexity but you want to optimize the code. And the question is likely too broad for SO.
– Erwin Bolwidt
Nov 11 at 9:07

1

I have really no plan what your code is doing. But I observe that it's super heavy regarding memory allocation: For each character you read, you allocate a StringBuilder instance, toString() allocates a String instance and += allocates another 9 String instance. There is tremendous potential for optimization.
– Codo
Nov 11 at 10:09

1

See rules.sonarsource.com/java/RSPEC-1643. Also, why build the bitstring left-to-right and then reverse it (thus using up more memory and taking up more time) when you can simply build it the right way to begin with, using for (int q=7;q>=0;q--).
– Klitos Kyriacou
Nov 11 at 12:07

Maybe this will be helpful geeksforgeeks.org/…
– Maor Refaeli
Nov 11 at 8:31

You may have a very inefficiently coded implementation but complexity doesn't say anything about that. It's still O(n) time and from a quick scan, O(n) space too. You're asking the wrong question, you can't xhsnde it's complexity but you want to optimize the code. And the question is likely too broad for SO.
– Erwin Bolwidt
Nov 11 at 9:07

I have really no plan what your code is doing. But I observe that it's super heavy regarding memory allocation: For each character you read, you allocate a StringBuilder instance, toString() allocates a String instance and += allocates another 9 String instance. There is tremendous potential for optimization.
– Codo
Nov 11 at 10:09

See rules.sonarsource.com/java/RSPEC-1643. Also, why build the bitstring left-to-right and then reverse it (thus using up more memory and taking up more time) when you can simply build it the right way to begin with, using for (int q=7;q>=0;q--).
– Klitos Kyriacou
Nov 11 at 12:07

add a comment |

2 Answers
2

active

oldest

votes

up vote
1
down vote

You can replace the String concatination += with StringBuilder. This allocates less objects and puts less load on the garbage collector.

int c;

StringBuilder bitString = new StringBuilder();

//read each byte from file, reverse it, add to giant bitString

//reads ALL BYTES

while ((c = fin.read()) != -1) {

    byte current = (byte) c;

    StringBuilder currentBitString = new StringBuilder();

    for (int q = 0; q < 8; q++) {

        byte bits = getBit(current, q);

        currentBitString.append(bits);

    }

    bitString.append(currentBitString.reverse());

}

Instead of putting the codes and data into an array nodes you should use a HashMap here. You are comparing the code by iterating over the whole array until you find the right match. On average that are n/2 calls to String#equals() per item. With a HashMap you reduce this to ~1.

Fill your map with the data for the codes as keys.

Map<String, Integer> nodes = new HashMap<>();

nodes.put(code, data);

Access the data from the map

boolean foundCode = false;

for (int j = 0; j < bitString.length(); j++) {

    currentBitString.append(bitString.charAt(j));

    Integer data = nodes.get(currentBitString.toString().trim());

    if (data != null) {

        fw.write(data);

        foundCode = true;

    }

    if (foundCode) {

        currentBitString = new StringBuilder();

        foundCode = false;

    }

}

edited Nov 11 at 11:55

answered Nov 11 at 10:27

Simulant

11.3k63972

2

Doesn't the break statement just get out of the entire loop and not check if the code is found?
– TheHaruWhoCodes
Nov 11 at 11:21

I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
– TheHaruWhoCodes
Nov 11 at 11:37

The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
– Simulant
Nov 11 at 11:56

I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
– TheHaruWhoCodes
Nov 11 at 12:13

add a comment |

up vote
0
down vote

Don't read the whole thing into memory. Process your codes as they are encountered. Read enough bits to decode the next code, decode it, retain the unused bits for the subsequent code, repeat.

Don't use strings of characters to represent bits, where you are representing one bit per character. Use bits to represent bits. The shift, and, and or operators are what you should be using. You would have an integer as a bit buffer, with all the bits you need to decode the next code.

Don't do a search over all code lengths, and inside that a linear search of all codes to find your code! I would be hard pressed to come up with a slower approach. You should use a tree descent or table lookup for decoding. If you generate a canonical Huffman code in the first place, there is an easy lookup approach that can be implemented. See puff.c for an example. The textbook approach (which is slower than what puff.c does) is to build the same Huffman tree on the receiving end, and go down that tree bit-by-bit until you get to a symbol. Emit the symbol and repeat.

You should be able to process 200K of compressed input in a few milliseconds on a single core of a modern processor.

answered Nov 11 at 15:14

Mark Adler

56.6k760107

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53247027%2fhow-can-i-optimize-huffman-decoding%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

You can replace the String concatination += with StringBuilder. This allocates less objects and puts less load on the garbage collector.

int c;

StringBuilder bitString = new StringBuilder();

//read each byte from file, reverse it, add to giant bitString

//reads ALL BYTES

while ((c = fin.read()) != -1) {

    byte current = (byte) c;

    StringBuilder currentBitString = new StringBuilder();

    for (int q = 0; q < 8; q++) {

        byte bits = getBit(current, q);

        currentBitString.append(bits);

    }

    bitString.append(currentBitString.reverse());

}

Fill your map with the data for the codes as keys.

Map<String, Integer> nodes = new HashMap<>();

nodes.put(code, data);

Access the data from the map

boolean foundCode = false;

for (int j = 0; j < bitString.length(); j++) {

    currentBitString.append(bitString.charAt(j));

    Integer data = nodes.get(currentBitString.toString().trim());

    if (data != null) {

        fw.write(data);

        foundCode = true;

    }

    if (foundCode) {

        currentBitString = new StringBuilder();

        foundCode = false;

    }

}

edited Nov 11 at 11:55

answered Nov 11 at 10:27

Simulant

11.3k63972

2

Doesn't the break statement just get out of the entire loop and not check if the code is found?
– TheHaruWhoCodes
Nov 11 at 11:21

I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
– TheHaruWhoCodes
Nov 11 at 11:37

The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
– Simulant
Nov 11 at 11:56

I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
– TheHaruWhoCodes
Nov 11 at 12:13

add a comment |

up vote
1
down vote

You can replace the String concatination += with StringBuilder. This allocates less objects and puts less load on the garbage collector.

int c;

StringBuilder bitString = new StringBuilder();

//read each byte from file, reverse it, add to giant bitString

//reads ALL BYTES

while ((c = fin.read()) != -1) {

    byte current = (byte) c;

    StringBuilder currentBitString = new StringBuilder();

    for (int q = 0; q < 8; q++) {

        byte bits = getBit(current, q);

        currentBitString.append(bits);

    }

    bitString.append(currentBitString.reverse());

}

Fill your map with the data for the codes as keys.

Map<String, Integer> nodes = new HashMap<>();

nodes.put(code, data);

Access the data from the map

boolean foundCode = false;

for (int j = 0; j < bitString.length(); j++) {

    currentBitString.append(bitString.charAt(j));

    Integer data = nodes.get(currentBitString.toString().trim());

    if (data != null) {

        fw.write(data);

        foundCode = true;

    }

    if (foundCode) {

        currentBitString = new StringBuilder();

        foundCode = false;

    }

}

edited Nov 11 at 11:55

answered Nov 11 at 10:27

Simulant

11.3k63972

2

Doesn't the break statement just get out of the entire loop and not check if the code is found?
– TheHaruWhoCodes
Nov 11 at 11:21

I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
– TheHaruWhoCodes
Nov 11 at 11:37

The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
– Simulant
Nov 11 at 11:56

I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
– TheHaruWhoCodes
Nov 11 at 12:13

add a comment |

up vote
1
down vote

You can replace the String concatination += with StringBuilder. This allocates less objects and puts less load on the garbage collector.

int c;

StringBuilder bitString = new StringBuilder();

//read each byte from file, reverse it, add to giant bitString

//reads ALL BYTES

while ((c = fin.read()) != -1) {

    byte current = (byte) c;

    StringBuilder currentBitString = new StringBuilder();

    for (int q = 0; q < 8; q++) {

        byte bits = getBit(current, q);

        currentBitString.append(bits);

    }

    bitString.append(currentBitString.reverse());

}

Fill your map with the data for the codes as keys.

Map<String, Integer> nodes = new HashMap<>();

nodes.put(code, data);

Access the data from the map

boolean foundCode = false;

for (int j = 0; j < bitString.length(); j++) {

    currentBitString.append(bitString.charAt(j));

    Integer data = nodes.get(currentBitString.toString().trim());

    if (data != null) {

        fw.write(data);

        foundCode = true;

    }

    if (foundCode) {

        currentBitString = new StringBuilder();

        foundCode = false;

    }

}

edited Nov 11 at 11:55

answered Nov 11 at 10:27

Simulant

11.3k63972

You can replace the String concatination += with StringBuilder. This allocates less objects and puts less load on the garbage collector.

int c;

StringBuilder bitString = new StringBuilder();

//read each byte from file, reverse it, add to giant bitString

//reads ALL BYTES

while ((c = fin.read()) != -1) {

    byte current = (byte) c;

    StringBuilder currentBitString = new StringBuilder();

    for (int q = 0; q < 8; q++) {

        byte bits = getBit(current, q);

        currentBitString.append(bits);

    }

    bitString.append(currentBitString.reverse());

}

Fill your map with the data for the codes as keys.

Map<String, Integer> nodes = new HashMap<>();

nodes.put(code, data);

Access the data from the map

boolean foundCode = false;

for (int j = 0; j < bitString.length(); j++) {

    currentBitString.append(bitString.charAt(j));

    Integer data = nodes.get(currentBitString.toString().trim());

    if (data != null) {

        fw.write(data);

        foundCode = true;

    }

    if (foundCode) {

        currentBitString = new StringBuilder();

        foundCode = false;

    }

}

edited Nov 11 at 11:55

answered Nov 11 at 10:27

Simulant

11.3k63972

edited Nov 11 at 11:55

answered Nov 11 at 10:27

Simulant

11.3k63972

answered Nov 11 at 10:27

Simulant

11.3k63972

answered Nov 11 at 10:27

Simulant

11.3k63972

2

Doesn't the break statement just get out of the entire loop and not check if the code is found?
– TheHaruWhoCodes
Nov 11 at 11:21

I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
– TheHaruWhoCodes
Nov 11 at 11:37

The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
– Simulant
Nov 11 at 11:56

I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
– TheHaruWhoCodes
Nov 11 at 12:13

add a comment |

2

Doesn't the break statement just get out of the entire loop and not check if the code is found?
– TheHaruWhoCodes
Nov 11 at 11:21

I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
– TheHaruWhoCodes
Nov 11 at 11:37

The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
– Simulant
Nov 11 at 11:56

I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
– TheHaruWhoCodes
Nov 11 at 12:13

Doesn't the break statement just get out of the entire loop and not check if the code is found?
– TheHaruWhoCodes
Nov 11 at 11:21

I tried it out and put the break statement in the second loop but it didn't work, there seems to be an infinite loop somewhere
– TheHaruWhoCodes
Nov 11 at 11:37

The break statement is no longer needed as the HashMap removed the second loop to iterate over the array. Updated the code. If the code is not 100% right I think you still get the idea.
– Simulant
Nov 11 at 11:56

I get the idea and I've implemented it, but the time hasn't really gone down for some reason, it still takes a minute to decode a 200kb file.
– TheHaruWhoCodes
Nov 11 at 12:13

add a comment |

up vote
0
down vote

Don't read the whole thing into memory. Process your codes as they are encountered. Read enough bits to decode the next code, decode it, retain the unused bits for the subsequent code, repeat.

Don't use strings of characters to represent bits, where you are representing one bit per character. Use bits to represent bits. The shift, and, and or operators are what you should be using. You would have an integer as a bit buffer, with all the bits you need to decode the next code.

Don't do a search over all code lengths, and inside that a linear search of all codes to find your code! I would be hard pressed to come up with a slower approach. You should use a tree descent or table lookup for decoding. If you generate a canonical Huffman code in the first place, there is an easy lookup approach that can be implemented. See puff.c for an example. The textbook approach (which is slower than what puff.c does) is to build the same Huffman tree on the receiving end, and go down that tree bit-by-bit until you get to a symbol. Emit the symbol and repeat.

You should be able to process 200K of compressed input in a few milliseconds on a single core of a modern processor.

answered Nov 11 at 15:14

Mark Adler

56.6k760107

add a comment |

up vote
0
down vote

Don't read the whole thing into memory. Process your codes as they are encountered. Read enough bits to decode the next code, decode it, retain the unused bits for the subsequent code, repeat.

Don't use strings of characters to represent bits, where you are representing one bit per character. Use bits to represent bits. The shift, and, and or operators are what you should be using. You would have an integer as a bit buffer, with all the bits you need to decode the next code.

Don't do a search over all code lengths, and inside that a linear search of all codes to find your code! I would be hard pressed to come up with a slower approach. You should use a tree descent or table lookup for decoding. If you generate a canonical Huffman code in the first place, there is an easy lookup approach that can be implemented. See puff.c for an example. The textbook approach (which is slower than what puff.c does) is to build the same Huffman tree on the receiving end, and go down that tree bit-by-bit until you get to a symbol. Emit the symbol and repeat.

You should be able to process 200K of compressed input in a few milliseconds on a single core of a modern processor.

answered Nov 11 at 15:14

Mark Adler

56.6k760107

add a comment |

up vote
0
down vote

Don't read the whole thing into memory. Process your codes as they are encountered. Read enough bits to decode the next code, decode it, retain the unused bits for the subsequent code, repeat.

Don't use strings of characters to represent bits, where you are representing one bit per character. Use bits to represent bits. The shift, and, and or operators are what you should be using. You would have an integer as a bit buffer, with all the bits you need to decode the next code.

Don't do a search over all code lengths, and inside that a linear search of all codes to find your code! I would be hard pressed to come up with a slower approach. You should use a tree descent or table lookup for decoding. If you generate a canonical Huffman code in the first place, there is an easy lookup approach that can be implemented. See puff.c for an example. The textbook approach (which is slower than what puff.c does) is to build the same Huffman tree on the receiving end, and go down that tree bit-by-bit until you get to a symbol. Emit the symbol and repeat.

You should be able to process 200K of compressed input in a few milliseconds on a single core of a modern processor.

answered Nov 11 at 15:14

Mark Adler

56.6k760107

Don't read the whole thing into memory. Process your codes as they are encountered. Read enough bits to decode the next code, decode it, retain the unused bits for the subsequent code, repeat.

Don't use strings of characters to represent bits, where you are representing one bit per character. Use bits to represent bits. The shift, and, and or operators are what you should be using. You would have an integer as a bit buffer, with all the bits you need to decode the next code.

Don't do a search over all code lengths, and inside that a linear search of all codes to find your code! I would be hard pressed to come up with a slower approach. You should use a tree descent or table lookup for decoding. If you generate a canonical Huffman code in the first place, there is an easy lookup approach that can be implemented. See puff.c for an example. The textbook approach (which is slower than what puff.c does) is to build the same Huffman tree on the receiving end, and go down that tree bit-by-bit until you get to a symbol. Emit the symbol and repeat.

You should be able to process 200K of compressed input in a few milliseconds on a single core of a modern processor.

answered Nov 11 at 15:14

Mark Adler

56.6k760107

answered Nov 11 at 15:14

Mark Adler

56.6k760107

answered Nov 11 at 15:14

Mark Adler

56.6k760107

answered Nov 11 at 15:14

Mark Adler

56.6k760107

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky