Improve runtime of python code with indexing and string concatenation

I tried hard to improve the runtime of the following code snippet, which turned out to be the CPU-bottleneck in an asyncio-client package that I'm developing:

data = [''] * n

for i, ix in enumerate(indices):

    data[ix] = elements[i]

s = 't'.join(data)

return s

What I do is basically very simple:

elements is a list of str (each <= 7 characters) that I eventually write at specific positions into a tab-separated file.

indices is a list of int giving the positions of each of the elements in the file

If there is no element at a certain position, an empty string is inserted

I finally write the string into a text file using aiofiles.

So far, I tried to use a generator to create the data on the fly, as well as to use numpy for faster indexing, but with no success. Any idea how to make this code run faster would be great. Here is a reproducible example with timing:

import numpy as np

import timeit



n = 1_000_000  # total number of items

k = 500_000  # number of elements to insert

elements = ['ade/gua'] * k  # elements to insert, <= 7 unicode characters

indices = list(range(0, n, 2))  # indices where to insert, sorted

assert len(elements) == len(indices)



# This is where I started

def baseline():

    data = [''] * n

    for i, ix in enumerate(indices):

        data[ix] = elements[i]

    s = 't'.join(data)

    return s



# Generate values on the fly

def generator():

    def f():

        it = iter(enumerate(indices))

        i, ix = next(it)

        for j in range(n):

            if j == ix:

                yield elements[i]

                try:

                    i, ix = next(it)

                except:

                    pass

            else:

                yield ''

    s = 't'.join(f())  # iterating though generation seem too costly

    return s



# Harness numpy

indices_np = np.array(indices)  # indices could also be numpy array

def numpy():

    data = np.full(n, '', dtype='<U7')

    data[indices_np] = elements  # this is faster with numpy

    s = 't'.join(data)  # much slower. array2string or savetxt does not help

    return s



assert baseline() == generator() == numpy()



timeit.timeit(baseline, number=10)  # 0.8463204780127853

timeit.timeit(generator, number=10)  # 2.048296730965376 -> great job

timeit.timeit(numpy, number=10)  # 4.486689139157534 -> life sucks

Edit 1

To address some of the points raised in the comments:

I write the string aiofiles.open(filename, mode='w') as file and file.write()

Indices can generally not be expressed as a range

Indices can assumed to be always sorted at no extra cost.

ASCII characters are sufficient

Edit 2

Based on the answer of Mad Physicist, I tried the following code, with no success.

def buffer_plumbing():

m = len(elements) # total number of data points to insert

k = 7  # each element is 7 bytes long, only ascii 

total_bytes = n - 1 + m * 7  # total number of bytes for the buffer



# find out the number of preceeding gaps for each element

gap = np.empty_like(indices_np)

gap[0] = indices_np[0]  # that many gaps a the beginning

np.subtract(indices_np[1:], indices_np[:-1], out=gap[1:])

gap[1:] -= 1  # subtract one to get the gaps (except for the first)



# pre-allocate a large enough byte buffer

s = np.full(total_bytes , 't', dtype='S1')



# write element into the buffer

start = 0

for i, (g, e) in enumerate(zip(gap, elements)):

    start += g

    s[start: start + k].view(dtype=('S', k))[:] = e

    start += k + 1

return s.tostring().decode('utf-8')



timeit.timeit(buffer_plumbing, number=10)  # 26.82

edited Nov 16 '18 at 12:02

asked Nov 15 '18 at 13:25

dmy

132

I have to ask, what does 'ade/cyt' mean?

– Mad Physicist
Nov 15 '18 at 13:32

You are relocating the string umpteen times when you do the join. Your best optimization may be to avoid constructing it in the first place. What is the final purpose of that string? Are you writing it somewhere?

– Mad Physicist
Nov 15 '18 at 13:36

@MadPhysicist ade/cyt refered to nucleotides (as was biologically wrong, btw). I write it to a text file.

– dmy
Nov 15 '18 at 13:40

Why don't you replace the 'ade/gua' and other strings with integers. For example 1 for 'ade/gua', 2 for 'ade/cyt', etc... And with data = np.zeros(n, dtype=np.int64) you do data[ix] = elements[i]. Or what is your expected output? How many combinations do you expect?

– Scotty1-
Nov 15 '18 at 14:02

1

Can you show how you write the string exactly. I think your biggest speedup will come from not concatenating the whole thing if you're writing it to a file.

– Mad Physicist
Nov 16 '18 at 3:41

|
show 4 more comments

I tried hard to improve the runtime of the following code snippet, which turned out to be the CPU-bottleneck in an asyncio-client package that I'm developing:

data = [''] * n

for i, ix in enumerate(indices):

    data[ix] = elements[i]

s = 't'.join(data)

return s

What I do is basically very simple:

elements is a list of str (each <= 7 characters) that I eventually write at specific positions into a tab-separated file.

indices is a list of int giving the positions of each of the elements in the file

If there is no element at a certain position, an empty string is inserted

I finally write the string into a text file using aiofiles.

import numpy as np

import timeit



n = 1_000_000  # total number of items

k = 500_000  # number of elements to insert

elements = ['ade/gua'] * k  # elements to insert, <= 7 unicode characters

indices = list(range(0, n, 2))  # indices where to insert, sorted

assert len(elements) == len(indices)



# This is where I started

def baseline():

    data = [''] * n

    for i, ix in enumerate(indices):

        data[ix] = elements[i]

    s = 't'.join(data)

    return s



# Generate values on the fly

def generator():

    def f():

        it = iter(enumerate(indices))

        i, ix = next(it)

        for j in range(n):

            if j == ix:

                yield elements[i]

                try:

                    i, ix = next(it)

                except:

                    pass

            else:

                yield ''

    s = 't'.join(f())  # iterating though generation seem too costly

    return s



# Harness numpy

indices_np = np.array(indices)  # indices could also be numpy array

def numpy():

    data = np.full(n, '', dtype='<U7')

    data[indices_np] = elements  # this is faster with numpy

    s = 't'.join(data)  # much slower. array2string or savetxt does not help

    return s



assert baseline() == generator() == numpy()



timeit.timeit(baseline, number=10)  # 0.8463204780127853

timeit.timeit(generator, number=10)  # 2.048296730965376 -> great job

timeit.timeit(numpy, number=10)  # 4.486689139157534 -> life sucks

Edit 1

To address some of the points raised in the comments:

I write the string aiofiles.open(filename, mode='w') as file and file.write()

Indices can generally not be expressed as a range

Indices can assumed to be always sorted at no extra cost.

ASCII characters are sufficient

Edit 2

Based on the answer of Mad Physicist, I tried the following code, with no success.

def buffer_plumbing():

m = len(elements) # total number of data points to insert

k = 7  # each element is 7 bytes long, only ascii 

total_bytes = n - 1 + m * 7  # total number of bytes for the buffer



# find out the number of preceeding gaps for each element

gap = np.empty_like(indices_np)

gap[0] = indices_np[0]  # that many gaps a the beginning

np.subtract(indices_np[1:], indices_np[:-1], out=gap[1:])

gap[1:] -= 1  # subtract one to get the gaps (except for the first)



# pre-allocate a large enough byte buffer

s = np.full(total_bytes , 't', dtype='S1')



# write element into the buffer

start = 0

for i, (g, e) in enumerate(zip(gap, elements)):

    start += g

    s[start: start + k].view(dtype=('S', k))[:] = e

    start += k + 1

return s.tostring().decode('utf-8')



timeit.timeit(buffer_plumbing, number=10)  # 26.82

edited Nov 16 '18 at 12:02

asked Nov 15 '18 at 13:25

dmy

132

I have to ask, what does 'ade/cyt' mean?

– Mad Physicist
Nov 15 '18 at 13:32

You are relocating the string umpteen times when you do the join. Your best optimization may be to avoid constructing it in the first place. What is the final purpose of that string? Are you writing it somewhere?

– Mad Physicist
Nov 15 '18 at 13:36

@MadPhysicist ade/cyt refered to nucleotides (as was biologically wrong, btw). I write it to a text file.

– dmy
Nov 15 '18 at 13:40

Why don't you replace the 'ade/gua' and other strings with integers. For example 1 for 'ade/gua', 2 for 'ade/cyt', etc... And with data = np.zeros(n, dtype=np.int64) you do data[ix] = elements[i]. Or what is your expected output? How many combinations do you expect?

– Scotty1-
Nov 15 '18 at 14:02

1

Can you show how you write the string exactly. I think your biggest speedup will come from not concatenating the whole thing if you're writing it to a file.

– Mad Physicist
Nov 16 '18 at 3:41

|
show 4 more comments

I tried hard to improve the runtime of the following code snippet, which turned out to be the CPU-bottleneck in an asyncio-client package that I'm developing:

data = [''] * n

for i, ix in enumerate(indices):

    data[ix] = elements[i]

s = 't'.join(data)

return s

What I do is basically very simple:

elements is a list of str (each <= 7 characters) that I eventually write at specific positions into a tab-separated file.

indices is a list of int giving the positions of each of the elements in the file

If there is no element at a certain position, an empty string is inserted

I finally write the string into a text file using aiofiles.

import numpy as np

import timeit



n = 1_000_000  # total number of items

k = 500_000  # number of elements to insert

elements = ['ade/gua'] * k  # elements to insert, <= 7 unicode characters

indices = list(range(0, n, 2))  # indices where to insert, sorted

assert len(elements) == len(indices)



# This is where I started

def baseline():

    data = [''] * n

    for i, ix in enumerate(indices):

        data[ix] = elements[i]

    s = 't'.join(data)

    return s



# Generate values on the fly

def generator():

    def f():

        it = iter(enumerate(indices))

        i, ix = next(it)

        for j in range(n):

            if j == ix:

                yield elements[i]

                try:

                    i, ix = next(it)

                except:

                    pass

            else:

                yield ''

    s = 't'.join(f())  # iterating though generation seem too costly

    return s



# Harness numpy

indices_np = np.array(indices)  # indices could also be numpy array

def numpy():

    data = np.full(n, '', dtype='<U7')

    data[indices_np] = elements  # this is faster with numpy

    s = 't'.join(data)  # much slower. array2string or savetxt does not help

    return s



assert baseline() == generator() == numpy()



timeit.timeit(baseline, number=10)  # 0.8463204780127853

timeit.timeit(generator, number=10)  # 2.048296730965376 -> great job

timeit.timeit(numpy, number=10)  # 4.486689139157534 -> life sucks

Edit 1

To address some of the points raised in the comments:

I write the string aiofiles.open(filename, mode='w') as file and file.write()

Indices can generally not be expressed as a range

Indices can assumed to be always sorted at no extra cost.

ASCII characters are sufficient

Edit 2

Based on the answer of Mad Physicist, I tried the following code, with no success.

def buffer_plumbing():

m = len(elements) # total number of data points to insert

k = 7  # each element is 7 bytes long, only ascii 

total_bytes = n - 1 + m * 7  # total number of bytes for the buffer



# find out the number of preceeding gaps for each element

gap = np.empty_like(indices_np)

gap[0] = indices_np[0]  # that many gaps a the beginning

np.subtract(indices_np[1:], indices_np[:-1], out=gap[1:])

gap[1:] -= 1  # subtract one to get the gaps (except for the first)



# pre-allocate a large enough byte buffer

s = np.full(total_bytes , 't', dtype='S1')



# write element into the buffer

start = 0

for i, (g, e) in enumerate(zip(gap, elements)):

    start += g

    s[start: start + k].view(dtype=('S', k))[:] = e

    start += k + 1

return s.tostring().decode('utf-8')



timeit.timeit(buffer_plumbing, number=10)  # 26.82

edited Nov 16 '18 at 12:02

asked Nov 15 '18 at 13:25

dmy

132

I tried hard to improve the runtime of the following code snippet, which turned out to be the CPU-bottleneck in an asyncio-client package that I'm developing:

data = [''] * n

for i, ix in enumerate(indices):

    data[ix] = elements[i]

s = 't'.join(data)

return s

What I do is basically very simple:

elements is a list of str (each <= 7 characters) that I eventually write at specific positions into a tab-separated file.

indices is a list of int giving the positions of each of the elements in the file

If there is no element at a certain position, an empty string is inserted

I finally write the string into a text file using aiofiles.

import numpy as np

import timeit



n = 1_000_000  # total number of items

k = 500_000  # number of elements to insert

elements = ['ade/gua'] * k  # elements to insert, <= 7 unicode characters

indices = list(range(0, n, 2))  # indices where to insert, sorted

assert len(elements) == len(indices)



# This is where I started

def baseline():

    data = [''] * n

    for i, ix in enumerate(indices):

        data[ix] = elements[i]

    s = 't'.join(data)

    return s



# Generate values on the fly

def generator():

    def f():

        it = iter(enumerate(indices))

        i, ix = next(it)

        for j in range(n):

            if j == ix:

                yield elements[i]

                try:

                    i, ix = next(it)

                except:

                    pass

            else:

                yield ''

    s = 't'.join(f())  # iterating though generation seem too costly

    return s



# Harness numpy

indices_np = np.array(indices)  # indices could also be numpy array

def numpy():

    data = np.full(n, '', dtype='<U7')

    data[indices_np] = elements  # this is faster with numpy

    s = 't'.join(data)  # much slower. array2string or savetxt does not help

    return s



assert baseline() == generator() == numpy()



timeit.timeit(baseline, number=10)  # 0.8463204780127853

timeit.timeit(generator, number=10)  # 2.048296730965376 -> great job

timeit.timeit(numpy, number=10)  # 4.486689139157534 -> life sucks

Edit 1

To address some of the points raised in the comments:

I write the string aiofiles.open(filename, mode='w') as file and file.write()

Indices can generally not be expressed as a range

Indices can assumed to be always sorted at no extra cost.

ASCII characters are sufficient

Edit 2

Based on the answer of Mad Physicist, I tried the following code, with no success.

def buffer_plumbing():

m = len(elements) # total number of data points to insert

k = 7  # each element is 7 bytes long, only ascii 

total_bytes = n - 1 + m * 7  # total number of bytes for the buffer



# find out the number of preceeding gaps for each element

gap = np.empty_like(indices_np)

gap[0] = indices_np[0]  # that many gaps a the beginning

np.subtract(indices_np[1:], indices_np[:-1], out=gap[1:])

gap[1:] -= 1  # subtract one to get the gaps (except for the first)



# pre-allocate a large enough byte buffer

s = np.full(total_bytes , 't', dtype='S1')



# write element into the buffer

start = 0

for i, (g, e) in enumerate(zip(gap, elements)):

    start += g

    s[start: start + k].view(dtype=('S', k))[:] = e

    start += k + 1

return s.tostring().decode('utf-8')



timeit.timeit(buffer_plumbing, number=10)  # 26.82

python numpy optimization

edited Nov 16 '18 at 12:02

asked Nov 15 '18 at 13:25

dmy

132

edited Nov 16 '18 at 12:02

asked Nov 15 '18 at 13:25

dmy

132

edited Nov 16 '18 at 12:02

asked Nov 15 '18 at 13:25

dmy

132

asked Nov 15 '18 at 13:25

dmy

132

asked Nov 15 '18 at 13:25

dmy

132

I have to ask, what does 'ade/cyt' mean?

– Mad Physicist
Nov 15 '18 at 13:32

You are relocating the string umpteen times when you do the join. Your best optimization may be to avoid constructing it in the first place. What is the final purpose of that string? Are you writing it somewhere?

– Mad Physicist
Nov 15 '18 at 13:36

@MadPhysicist ade/cyt refered to nucleotides (as was biologically wrong, btw). I write it to a text file.

– dmy
Nov 15 '18 at 13:40

Why don't you replace the 'ade/gua' and other strings with integers. For example 1 for 'ade/gua', 2 for 'ade/cyt', etc... And with data = np.zeros(n, dtype=np.int64) you do data[ix] = elements[i]. Or what is your expected output? How many combinations do you expect?

– Scotty1-
Nov 15 '18 at 14:02

1

Can you show how you write the string exactly. I think your biggest speedup will come from not concatenating the whole thing if you're writing it to a file.

– Mad Physicist
Nov 16 '18 at 3:41

|
show 4 more comments

I have to ask, what does 'ade/cyt' mean?

– Mad Physicist
Nov 15 '18 at 13:32

You are relocating the string umpteen times when you do the join. Your best optimization may be to avoid constructing it in the first place. What is the final purpose of that string? Are you writing it somewhere?

– Mad Physicist
Nov 15 '18 at 13:36

@MadPhysicist ade/cyt refered to nucleotides (as was biologically wrong, btw). I write it to a text file.

– dmy
Nov 15 '18 at 13:40

Why don't you replace the 'ade/gua' and other strings with integers. For example 1 for 'ade/gua', 2 for 'ade/cyt', etc... And with data = np.zeros(n, dtype=np.int64) you do data[ix] = elements[i]. Or what is your expected output? How many combinations do you expect?

– Scotty1-
Nov 15 '18 at 14:02

1

Can you show how you write the string exactly. I think your biggest speedup will come from not concatenating the whole thing if you're writing it to a file.

– Mad Physicist
Nov 16 '18 at 3:41

I have to ask, what does 'ade/cyt' mean?

– Mad Physicist
Nov 15 '18 at 13:32

You are relocating the string umpteen times when you do the join. Your best optimization may be to avoid constructing it in the first place. What is the final purpose of that string? Are you writing it somewhere?

– Mad Physicist
Nov 15 '18 at 13:36

@MadPhysicist ade/cyt refered to nucleotides (as was biologically wrong, btw). I write it to a text file.

– dmy
Nov 15 '18 at 13:40

Why don't you replace the 'ade/gua' and other strings with integers. For example 1 for 'ade/gua', 2 for 'ade/cyt', etc... And with data = np.zeros(n, dtype=np.int64) you do data[ix] = elements[i]. Or what is your expected output? How many combinations do you expect?

– Scotty1-
Nov 15 '18 at 14:02

Can you show how you write the string exactly. I think your biggest speedup will come from not concatenating the whole thing if you're writing it to a file.

– Mad Physicist
Nov 16 '18 at 3:41

|
show 4 more comments

1 Answer
1

active

oldest

votes

You can pre-sort your data, after converting it to a pair of numpy arrays. This will allow you to manipulate a single pre-existing buffer rather than copying strings over and over as you reallocate them. The difference between my suggestion and your attempt is that we will use ndarray.tobytes (or ndarray.tostring) with the assumption that you only have ASCII characters. In fact, you can completely bypass the copy operation involved in converting into a bytes object by using ndarray.tofile directly.

If you have elements in-hand, you know that the total length of your line will be the combined length of the elements and n-1 tab separators. The start of an element in the full string is therefore it's index (the number of tabs that precede it) plus the cumulative length of all the elements that come before it. Here is a simple implementation of a single-buffer fill using mostly Python loops:

lengths = np.array([len(e) for e in elements])

indices = np.asanyarray(indices)

elements = np.array(elements, dtype='S7')

order = np.argsort(indices)



elements = elements[order]

indices = indices[order]

lengths = lengths[order]



cumulative = np.empty_like(lengths)

cumulative[0] = 0

np.cumsum(lengths[:-1], out=cumulative[1:])

cumulative += lengths



s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')

for i, l, e in zip(cumulative, lengths, elements):

    s[i:i + l].view(dtype=('S', l))[:] = e

There are lots of possible optimizations to play with here, such as the possibility of allocating s using np.empty and only filling in the required elements with tabs. This will be left as an excise for the reader.

Another possibility is to avoid converting elements to a numpy array entirely (it probably just wastes space and time). You can then rewrite the for loop as

for i, l, o in zip(cumulative, lengths, order):

    s[i:i + l].view(dtype=('S', l))[:] = elements[o]

You can dump the result into a bytes object with

s = s.tobytes()

s = s.tostring()

You can write the result as-is to a file opened for binary writing. In fact, if you don't need a copy of the buffer in the form of a bytes, you can just write to the file directly:

s.tofile(f)

That will save you some memory and processing time.

In the same vein, you may be better off just writing to a file directly, piece by piece. This saves you not only the need to allocate the full buffer, but also the cumulative lengths. In fact, the only thing you need this way is the diff of successive indices to tell you how many tabs to insert:

indices = np.asanyarray(indices)

order = np.argsort(indices)



indices = indices[order]



tabs = np.empty_like(indices)

tabs[0] = indices[0]

np.subtract(indices[1:], indices[:-1], out=tabs[1:])

tabs[1:] -= 1



for t, o in zip(tabs, order):

    f.write('t' * t)

    f.write(elements[o])

f.write('t' * (n - indices[-1] - 1))

This second approach has two major advantages besides the reduced amount of calculation. The first is that it works with unicode characters rather than ASCII only. The second is that it does not allocate any buffers besides strings of tabs, which should be extremely fast.

In both cases, having elements and indices sorted into ascending order by index will speed things up dramatically. The first case reduces to

lengths = np.array([len(e) for e in elements])

indices = np.asanyarray(indices)



cumulative = np.empty_like(lengths)

cumulative[0] = 0

np.cumsum(lengths[:-1], out=cumulative[1:])

cumulative += lengths



s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')

for i, l, e in zip(cumulative, lengths, elements):

    s[i:i + l].view(dtype=('S', l))[:] = e

And the second becomes just

indices = np.asanyarray(indices)



tabs = np.empty_like(indices)

tabs[0] = indices[0]

np.subtract(indices[1:], indices[:-1], out=tabs[1:])

tabs[1:] -= 1



for t, e in zip(tabs, elements):

    f.write('t' * t)

    f.write(e)

f.write('t' * (n - indices[-1] - 1))

answered Nov 16 '18 at 5:57

Mad Physicist

38k1674108

Thanks for the extensive answer. It tried the last approach, but I think there is a bug, because I need 't' also between consecutive elements as a delimitier. Besides, this massively slows down everything du to the large number of calls to file.write.

– dmy
Nov 16 '18 at 9:41

I tried everything now and its just getting worse. Seems like time to surrender...

– dmy
Nov 16 '18 at 12:03

@dmy. I'll play with it some more once I'm on a desktop. This is interesting. Remember to include the I/O in your timing. That's likely to be a bigger bottleneck than the in-memory stuff.

– Mad Physicist
Nov 16 '18 at 13:54

I'm using asyncio for this reason so I made the CPU to be my bottleneck.

– dmy
Nov 16 '18 at 14:04

If you're writing to a file, you're writing to a file. Writing x number of bytes will take a fixed amount of time, which will generally only go up if you try to parallelize with other disk operations.

– Mad Physicist
Nov 16 '18 at 14:07

|
show 1 more comment

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53320519%2fimprove-runtime-of-python-code-with-indexing-and-string-concatenation%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

lengths = np.array([len(e) for e in elements])

indices = np.asanyarray(indices)

elements = np.array(elements, dtype='S7')

order = np.argsort(indices)



elements = elements[order]

indices = indices[order]

lengths = lengths[order]



cumulative = np.empty_like(lengths)

cumulative[0] = 0

np.cumsum(lengths[:-1], out=cumulative[1:])

cumulative += lengths



s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')

for i, l, e in zip(cumulative, lengths, elements):

    s[i:i + l].view(dtype=('S', l))[:] = e

Another possibility is to avoid converting elements to a numpy array entirely (it probably just wastes space and time). You can then rewrite the for loop as

for i, l, o in zip(cumulative, lengths, order):

    s[i:i + l].view(dtype=('S', l))[:] = elements[o]

You can dump the result into a bytes object with

s = s.tobytes()

s = s.tostring()

You can write the result as-is to a file opened for binary writing. In fact, if you don't need a copy of the buffer in the form of a bytes, you can just write to the file directly:

s.tofile(f)

That will save you some memory and processing time.

indices = np.asanyarray(indices)

order = np.argsort(indices)



indices = indices[order]



tabs = np.empty_like(indices)

tabs[0] = indices[0]

np.subtract(indices[1:], indices[:-1], out=tabs[1:])

tabs[1:] -= 1



for t, o in zip(tabs, order):

    f.write('t' * t)

    f.write(elements[o])

f.write('t' * (n - indices[-1] - 1))

In both cases, having elements and indices sorted into ascending order by index will speed things up dramatically. The first case reduces to

lengths = np.array([len(e) for e in elements])

indices = np.asanyarray(indices)



cumulative = np.empty_like(lengths)

cumulative[0] = 0

np.cumsum(lengths[:-1], out=cumulative[1:])

cumulative += lengths



s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')

for i, l, e in zip(cumulative, lengths, elements):

    s[i:i + l].view(dtype=('S', l))[:] = e

And the second becomes just

indices = np.asanyarray(indices)



tabs = np.empty_like(indices)

tabs[0] = indices[0]

np.subtract(indices[1:], indices[:-1], out=tabs[1:])

tabs[1:] -= 1



for t, e in zip(tabs, elements):

    f.write('t' * t)

    f.write(e)

f.write('t' * (n - indices[-1] - 1))

answered Nov 16 '18 at 5:57

Mad Physicist

38k1674108

Thanks for the extensive answer. It tried the last approach, but I think there is a bug, because I need 't' also between consecutive elements as a delimitier. Besides, this massively slows down everything du to the large number of calls to file.write.

– dmy
Nov 16 '18 at 9:41

I tried everything now and its just getting worse. Seems like time to surrender...

– dmy
Nov 16 '18 at 12:03

@dmy. I'll play with it some more once I'm on a desktop. This is interesting. Remember to include the I/O in your timing. That's likely to be a bigger bottleneck than the in-memory stuff.

– Mad Physicist
Nov 16 '18 at 13:54

I'm using asyncio for this reason so I made the CPU to be my bottleneck.

– dmy
Nov 16 '18 at 14:04

If you're writing to a file, you're writing to a file. Writing x number of bytes will take a fixed amount of time, which will generally only go up if you try to parallelize with other disk operations.

– Mad Physicist
Nov 16 '18 at 14:07

|
show 1 more comment

lengths = np.array([len(e) for e in elements])

indices = np.asanyarray(indices)

elements = np.array(elements, dtype='S7')

order = np.argsort(indices)



elements = elements[order]

indices = indices[order]

lengths = lengths[order]



cumulative = np.empty_like(lengths)

cumulative[0] = 0

np.cumsum(lengths[:-1], out=cumulative[1:])

cumulative += lengths



s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')

for i, l, e in zip(cumulative, lengths, elements):

    s[i:i + l].view(dtype=('S', l))[:] = e

Another possibility is to avoid converting elements to a numpy array entirely (it probably just wastes space and time). You can then rewrite the for loop as

for i, l, o in zip(cumulative, lengths, order):

    s[i:i + l].view(dtype=('S', l))[:] = elements[o]

You can dump the result into a bytes object with

s = s.tobytes()

s = s.tostring()

You can write the result as-is to a file opened for binary writing. In fact, if you don't need a copy of the buffer in the form of a bytes, you can just write to the file directly:

s.tofile(f)

That will save you some memory and processing time.

indices = np.asanyarray(indices)

order = np.argsort(indices)



indices = indices[order]



tabs = np.empty_like(indices)

tabs[0] = indices[0]

np.subtract(indices[1:], indices[:-1], out=tabs[1:])

tabs[1:] -= 1



for t, o in zip(tabs, order):

    f.write('t' * t)

    f.write(elements[o])

f.write('t' * (n - indices[-1] - 1))

In both cases, having elements and indices sorted into ascending order by index will speed things up dramatically. The first case reduces to

lengths = np.array([len(e) for e in elements])

indices = np.asanyarray(indices)



cumulative = np.empty_like(lengths)

cumulative[0] = 0

np.cumsum(lengths[:-1], out=cumulative[1:])

cumulative += lengths



s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')

for i, l, e in zip(cumulative, lengths, elements):

    s[i:i + l].view(dtype=('S', l))[:] = e

And the second becomes just

indices = np.asanyarray(indices)



tabs = np.empty_like(indices)

tabs[0] = indices[0]

np.subtract(indices[1:], indices[:-1], out=tabs[1:])

tabs[1:] -= 1



for t, e in zip(tabs, elements):

    f.write('t' * t)

    f.write(e)

f.write('t' * (n - indices[-1] - 1))

answered Nov 16 '18 at 5:57

Mad Physicist

38k1674108

Thanks for the extensive answer. It tried the last approach, but I think there is a bug, because I need 't' also between consecutive elements as a delimitier. Besides, this massively slows down everything du to the large number of calls to file.write.

– dmy
Nov 16 '18 at 9:41

I tried everything now and its just getting worse. Seems like time to surrender...

– dmy
Nov 16 '18 at 12:03

@dmy. I'll play with it some more once I'm on a desktop. This is interesting. Remember to include the I/O in your timing. That's likely to be a bigger bottleneck than the in-memory stuff.

– Mad Physicist
Nov 16 '18 at 13:54

I'm using asyncio for this reason so I made the CPU to be my bottleneck.

– dmy
Nov 16 '18 at 14:04

If you're writing to a file, you're writing to a file. Writing x number of bytes will take a fixed amount of time, which will generally only go up if you try to parallelize with other disk operations.

– Mad Physicist
Nov 16 '18 at 14:07

|
show 1 more comment

lengths = np.array([len(e) for e in elements])

indices = np.asanyarray(indices)

elements = np.array(elements, dtype='S7')

order = np.argsort(indices)



elements = elements[order]

indices = indices[order]

lengths = lengths[order]



cumulative = np.empty_like(lengths)

cumulative[0] = 0

np.cumsum(lengths[:-1], out=cumulative[1:])

cumulative += lengths



s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')

for i, l, e in zip(cumulative, lengths, elements):

    s[i:i + l].view(dtype=('S', l))[:] = e

Another possibility is to avoid converting elements to a numpy array entirely (it probably just wastes space and time). You can then rewrite the for loop as

for i, l, o in zip(cumulative, lengths, order):

    s[i:i + l].view(dtype=('S', l))[:] = elements[o]

You can dump the result into a bytes object with

s = s.tobytes()

s = s.tostring()

You can write the result as-is to a file opened for binary writing. In fact, if you don't need a copy of the buffer in the form of a bytes, you can just write to the file directly:

s.tofile(f)

That will save you some memory and processing time.

indices = np.asanyarray(indices)

order = np.argsort(indices)



indices = indices[order]



tabs = np.empty_like(indices)

tabs[0] = indices[0]

np.subtract(indices[1:], indices[:-1], out=tabs[1:])

tabs[1:] -= 1



for t, o in zip(tabs, order):

    f.write('t' * t)

    f.write(elements[o])

f.write('t' * (n - indices[-1] - 1))

In both cases, having elements and indices sorted into ascending order by index will speed things up dramatically. The first case reduces to

lengths = np.array([len(e) for e in elements])

indices = np.asanyarray(indices)



cumulative = np.empty_like(lengths)

cumulative[0] = 0

np.cumsum(lengths[:-1], out=cumulative[1:])

cumulative += lengths



s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')

for i, l, e in zip(cumulative, lengths, elements):

    s[i:i + l].view(dtype=('S', l))[:] = e

And the second becomes just

indices = np.asanyarray(indices)



tabs = np.empty_like(indices)

tabs[0] = indices[0]

np.subtract(indices[1:], indices[:-1], out=tabs[1:])

tabs[1:] -= 1



for t, e in zip(tabs, elements):

    f.write('t' * t)

    f.write(e)

f.write('t' * (n - indices[-1] - 1))

answered Nov 16 '18 at 5:57

Mad Physicist

38k1674108

lengths = np.array([len(e) for e in elements])

indices = np.asanyarray(indices)

elements = np.array(elements, dtype='S7')

order = np.argsort(indices)



elements = elements[order]

indices = indices[order]

lengths = lengths[order]



cumulative = np.empty_like(lengths)

cumulative[0] = 0

np.cumsum(lengths[:-1], out=cumulative[1:])

cumulative += lengths



s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')

for i, l, e in zip(cumulative, lengths, elements):

    s[i:i + l].view(dtype=('S', l))[:] = e

Another possibility is to avoid converting elements to a numpy array entirely (it probably just wastes space and time). You can then rewrite the for loop as

for i, l, o in zip(cumulative, lengths, order):

    s[i:i + l].view(dtype=('S', l))[:] = elements[o]

You can dump the result into a bytes object with

s = s.tobytes()

s = s.tostring()

You can write the result as-is to a file opened for binary writing. In fact, if you don't need a copy of the buffer in the form of a bytes, you can just write to the file directly:

s.tofile(f)

That will save you some memory and processing time.

indices = np.asanyarray(indices)

order = np.argsort(indices)



indices = indices[order]



tabs = np.empty_like(indices)

tabs[0] = indices[0]

np.subtract(indices[1:], indices[:-1], out=tabs[1:])

tabs[1:] -= 1



for t, o in zip(tabs, order):

    f.write('t' * t)

    f.write(elements[o])

f.write('t' * (n - indices[-1] - 1))

In both cases, having elements and indices sorted into ascending order by index will speed things up dramatically. The first case reduces to

lengths = np.array([len(e) for e in elements])

indices = np.asanyarray(indices)



cumulative = np.empty_like(lengths)

cumulative[0] = 0

np.cumsum(lengths[:-1], out=cumulative[1:])

cumulative += lengths



s = np.full(cumulative[-1] + n - 1, 't', dtype='S1')

for i, l, e in zip(cumulative, lengths, elements):

    s[i:i + l].view(dtype=('S', l))[:] = e

And the second becomes just

indices = np.asanyarray(indices)



tabs = np.empty_like(indices)

tabs[0] = indices[0]

np.subtract(indices[1:], indices[:-1], out=tabs[1:])

tabs[1:] -= 1



for t, e in zip(tabs, elements):

    f.write('t' * t)

    f.write(e)

f.write('t' * (n - indices[-1] - 1))

answered Nov 16 '18 at 5:57

Mad Physicist

38k1674108

answered Nov 16 '18 at 5:57

Mad Physicist

38k1674108

answered Nov 16 '18 at 5:57

Mad Physicist

38k1674108

answered Nov 16 '18 at 5:57

Mad Physicist

38k1674108

Thanks for the extensive answer. It tried the last approach, but I think there is a bug, because I need 't' also between consecutive elements as a delimitier. Besides, this massively slows down everything du to the large number of calls to file.write.

– dmy
Nov 16 '18 at 9:41

I tried everything now and its just getting worse. Seems like time to surrender...

– dmy
Nov 16 '18 at 12:03

@dmy. I'll play with it some more once I'm on a desktop. This is interesting. Remember to include the I/O in your timing. That's likely to be a bigger bottleneck than the in-memory stuff.

– Mad Physicist
Nov 16 '18 at 13:54

I'm using asyncio for this reason so I made the CPU to be my bottleneck.

– dmy
Nov 16 '18 at 14:04

If you're writing to a file, you're writing to a file. Writing x number of bytes will take a fixed amount of time, which will generally only go up if you try to parallelize with other disk operations.

– Mad Physicist
Nov 16 '18 at 14:07

|
show 1 more comment

Thanks for the extensive answer. It tried the last approach, but I think there is a bug, because I need 't' also between consecutive elements as a delimitier. Besides, this massively slows down everything du to the large number of calls to file.write.

– dmy
Nov 16 '18 at 9:41

I tried everything now and its just getting worse. Seems like time to surrender...

– dmy
Nov 16 '18 at 12:03

@dmy. I'll play with it some more once I'm on a desktop. This is interesting. Remember to include the I/O in your timing. That's likely to be a bigger bottleneck than the in-memory stuff.

– Mad Physicist
Nov 16 '18 at 13:54

I'm using asyncio for this reason so I made the CPU to be my bottleneck.

– dmy
Nov 16 '18 at 14:04

If you're writing to a file, you're writing to a file. Writing x number of bytes will take a fixed amount of time, which will generally only go up if you try to parallelize with other disk operations.

– Mad Physicist
Nov 16 '18 at 14:07

Thanks for the extensive answer. It tried the last approach, but I think there is a bug, because I need 't' also between consecutive elements as a delimitier. Besides, this massively slows down everything du to the large number of calls to file.write.

– dmy
Nov 16 '18 at 9:41

I tried everything now and its just getting worse. Seems like time to surrender...

– dmy
Nov 16 '18 at 12:03

@dmy. I'll play with it some more once I'm on a desktop. This is interesting. Remember to include the I/O in your timing. That's likely to be a bigger bottleneck than the in-memory stuff.

– Mad Physicist
Nov 16 '18 at 13:54

I'm using asyncio for this reason so I made the CPU to be my bottleneck.

– dmy
Nov 16 '18 at 14:04

If you're writing to a file, you're writing to a file. Writing x number of bytes will take a fixed amount of time, which will generally only go up if you try to parallelize with other disk operations.

– Mad Physicist
Nov 16 '18 at 14:07

|
show 1 more comment

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

E,gOSt n1AVXe7R,mPq7U

搜尋此網誌

Vfrdtyky

Improve runtime of python code with indexing and string concatenation

Edit 1

Edit 2

Edit 1

Edit 2

Edit 1

Edit 2

Edit 1

Edit 2

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Bressuire

Vorschmack

Xamarin.iOS Cant Deploy on Iphone

Improve runtime of python code with indexing and string concatenation

Edit 1

Edit 2

Edit 1

Edit 2

Edit 1

Edit 2

Edit 1

Edit 2

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Bressuire

Vorschmack

Xamarin.iOS Cant Deploy on Iphone

1 Answer
1

1 Answer
1

1 Answer
1