[ACCEPTED]-python string join performance-string
Most of the performance issues with string 21 concatenation are ones of asymptotic performance, so 20 the differences become most significant 19 when you are concatenating many long strings. In 18 your sample, you are performing the same 17 concatenation many times. You aren't building 16 up any long string, and it may be that the 15 python interpreter is optimizing your loops. This 14 would explain why the time increases when 13 you move to str.join and path.join - they 12 are more complex functions that are not 11 as easily reduced. (os.path.join does a 10 lot of checking on the strings to see if 9 they need to be rewritten in any way before 8 they are concatenated. This sacrifices some 7 performance for the sake of portability.)
By 6 the way, since file paths are not usually 5 very long, you almost certainly want to 4 use os.path.join for the sake of the portability. If 3 the performance of the concatenation is 2 a problem, you're doing something very odd 1 with your filesystem.
The advice is about concatenating a lot 12 of strings.
To compute s = s1 + s2 + ... + sn,
1) using 11 +. A new string s1+s2 is created, then a 10 new string s1+s2+s3 is created,..., etc, so 9 a lot of memory allocation and copy operations 8 is involved. In fact, s1 is copied n-1 times, s2 7 is copied n-2 time, ..., etc.
2) using "".join([s1,s2,...,sn]). The 6 concatenation is done in one pass, and each 5 char in the strings is copied only once.
In 4 your code, join is called on each iteration, so 3 it's just like using +. The correct way 2 is collect the items in an array, then call 1 join on it.
edit: fixed the typo
It is true you should not use '+'. Your 3 example is quite special, try the same code 2 with:
s1='*'*100000
s2='+'*100000
Then the second version (str.join) is 1 much faster.
Shouldn't it be exactly the other way round 16 ?
Not necessarily. I don't know the internals 15 of Python well enough to comment specifically 14 but some common observations are that your 13 first loop uses a simple operator +
which 12 is probly implemented as a primitive by 11 the runtime. In contrast, the other loops 10 first have to resolve a module name, resolve 9 the variable/class found there and then 8 call a member function on that.
Another note 7 is that your loop might simply be too small 6 to yield significant numbers. Considering 5 your low overall running time, this probably 4 makes your tests useless.
Also, your test 3 case is highly specialized on two short 2 strings. Such cases never give a clear picture 1 of edge case performance.
String concatenation (+
) has an optimized 8 implementation on CPython. But this may 7 not be the case on other architectures like 6 Jython or IronPython. So when you want your 5 code to performe well on these interpreters 4 you should use the .join()
method on strings. os.path.join()
is 3 specifically meant to join filesystem paths. It 2 takes care of different path separators, too. This 1 would be the right way to build a file name.
I would like to add a link to the python 7 wiki, where there are notes regarding string 6 concatenation and also that "this section is somewhat wrong with python2.5. Python 2.5 string concatenation is fairly fast".
I 5 believe that string concatenation had a 4 big improvement since 2.5, and that although 3 str.join is still faster (specially for 2 big strings), you will not see as much improvement 1 as in older Python versions.
http://wiki.python.org/moin/PythonSpeed/PerformanceTips#StringConcatenation
Trying to do the same test again in 2020 3 with Python 3.9, join
remains plenty fast but 2 plain concatenation has seen some improvement 1 too:
from io import StringIO
from array import array
loop_count = 10000
strings = [str(num) for num in range(loop_count)]
bytestrings = [b'%d' % num for num in range(loop_count)]
# 1.1453419709578156 seconds for 1000 repetitions (fastest of 5)
def concat():
out = ''
for s in strings:
out += s
return out
# 1.468063525040634 seconds for 1000 repetitions
# Removing decode() does not make it faster
def bytearray():
out = array('b')
for b in bytestrings:
out.frombytes(b)
return out.tobytes().decode()
# 0.9110414159949869 seconds for 1000 repetitions
def join():
# I am rebuilding the list on purpose: I don't want to include
# the overhead of printing numbers, but I do want to include
# the overhead of building the list
str_list = []
for s in strings:
str_list.append(s)
return ''.join(str_list)
# 1.174764319031965 seconds for 1000 repetitions
def stringio():
io = StringIO()
for s in strings:
io.write(s)
return io.getvalue()
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.