161

Is there some string class in Python like StringBuilder in C#?

Smi
  • 13,151
  • 9
  • 55
  • 63
icn
  • 16,048
  • 37
  • 102
  • 135
  • 9
    This is a duplicate of [Python equivalent of Java StringBuffer](https://stackoverflow.com/questions/19926089/python-equivalent-of-java-stringbuffer). **CAUTION: The answers here are way out of date and have, in fact, become misleading.** See [that other question](https://stackoverflow.com/questions/19926089/python-equivalent-of-java-stringbuffer) for answers that are more relevant to modern Python versions (certainly 2.7 and above). – Jean-François Corbett Nov 20 '17 at 08:52

8 Answers8

128

There is no one-to-one correlation. For a really good article please see Efficient String Concatenation in Python:

Building long strings in the Python progamming language can sometimes result in very slow running code. In this article I investigate the computational performance of various string concatenation methods.

TLDR the fastest method is below. It's extremely compact, and also pretty understandable:

def method6():
  return ''.join([`num` for num in xrange(loop_count)])
Sameer Alibhai
  • 2,992
  • 4
  • 33
  • 36
Andrew Hare
  • 333,516
  • 69
  • 632
  • 626
  • 32
    Note that this article was written based on Python 2.2. The tests would likely come out somewhat differently in a modern version of Python (CPython usually successfully optimizes concatenation, but you don't want to depend on this in important code) and a generator expression where he uses a list comprehension would be worthy of consideration. – Mike Graham Mar 10 '10 at 06:35
  • 6
    It would be good to pull in some highlights in that article, at the least a couple of the implementations (to avoid link rot problems). – jpmc26 Jul 29 '14 at 22:22
  • 4
    Method 1: resultString += appendString is the fastest according to tests by @Antoine-tran below – Justas Dec 31 '15 at 17:47
  • 7
    Your quote doesn't at all answer the question. Please include the relevant parts in your answer itself, to comply with new guidelines. – Nic Oct 21 '16 at 16:48
41

Relying on compiler optimizations is fragile. The benchmarks linked in the accepted answer and numbers given by Antoine-tran are not to be trusted. Andrew Hare makes the mistake of including a call to repr in his methods. That slows all the methods equally but obscures the real penalty in constructing the string.

Use join. It's very fast and more robust.

$ ipython3
Python 3.5.1 (default, Mar  2 2016, 03:38:02) 
IPython 4.1.2 -- An enhanced Interactive Python.

In [1]: values = [str(num) for num in range(int(1e3))]

In [2]: %%timeit
   ...: ''.join(values)
   ...: 
100000 loops, best of 3: 7.37 µs per loop

In [3]: %%timeit
   ...: result = ''
   ...: for value in values:
   ...:     result += value
   ...: 
10000 loops, best of 3: 82.8 µs per loop

In [4]: import io

In [5]: %%timeit
   ...: writer = io.StringIO()
   ...: for value in values:
   ...:     writer.write(value)
   ...: writer.getvalue()
   ...: 
10000 loops, best of 3: 81.8 µs per loop
GrantJ
  • 7,428
  • 1
  • 48
  • 45
  • 2
    Yes, the `repr` call dominates the runtime, but there's no need to make the mistake personal. – Alex Reinking Aug 17 '18 at 21:43
  • 9
    @AlexReinking sorry, nothing personal meant. I'm not sure what made you think it was personal. But if it was the use of their names, I used those only to refer to the user's answers (matches usernames, not sure if there's a better way). – GrantJ Aug 18 '18 at 19:15
  • 1
    good timing example that separates data initialization and concatenation operations – aiodintsov Jun 29 '19 at 22:37
29

I have used the code of Oliver Crow (link given by Andrew Hare) and adapted it a bit to tailor Python 2.7.3. (by using timeit package). I ran on my personal computer, Lenovo T61, 6GB RAM, Debian GNU/Linux 6.0.6 (squeeze).

Here is the result for 10,000 iterations:

method1:  0.0538418292999 secs
process size 4800 kb
method2:  0.22602891922 secs
process size 4960 kb
method3:  0.0605459213257 secs
process size 4980 kb
method4:  0.0544030666351 secs
process size 5536 kb
method5:  0.0551080703735 secs
process size 5272 kb
method6:  0.0542731285095 secs
process size 5512 kb

and for 5,000,000 iterations (method 2 was ignored because it ran tooo slowly, like forever):

method1:  5.88603997231 secs
process size 37976 kb
method3:  8.40748500824 secs
process size 38024 kb
method4:  7.96380496025 secs
process size 321968 kb
method5:  8.03666186333 secs
process size 71720 kb
method6:  6.68192911148 secs
process size 38240 kb

It is quite obvious that Python guys have done pretty great job to optimize string concatenation, and as Hoare said: "premature optimization is the root of all evil" :-)

Antoine-tran
  • 323
  • 3
  • 3
  • 3
    Apparently Hoare does not accept that: http://hans.gerwitz.com/2004/08/12/premature-optimization-is-the-root-of-all-evil.html – Pimin Konstantin Kefaloukos Dec 11 '12 at 13:13
  • 6
    It is not a premature optimization to avoid fragile, interpreter-dependant optimizations. If you ever want to port to PyPy or risk hitting [one of the many subtle failure cases](http://stackoverflow.com/questions/24040198/cpython-string-addition-optimisation-failure-case) for the optimization, do things the right way. – Veedrac Nov 03 '14 at 21:46
  • 1
    Looks like Method 1 is easier for the compiler to optimize. – mbomb007 Apr 29 '15 at 18:21
22

Python has several things that fulfill similar purposes:

  • One common way to build large strings from pieces is to grow a list of strings and join it when you are done. This is a frequently-used Python idiom.
    • To build strings incorporating data with formatting, you would do the formatting separately.
  • For insertion and deletion at a character level, you would keep a list of length-one strings. (To make this from a string, you'd call list(your_string). You could also use a UserString.MutableString for this.
  • (c)StringIO.StringIO is useful for things that would otherwise take a file, but less so for general string building.
Mike Graham
  • 69,495
  • 14
  • 96
  • 129
16

Using method 5 from above (The Pseudo File) we can get very good perf and flexibility

from cStringIO import StringIO

class StringBuilder:
     _file_str = None

     def __init__(self):
         self._file_str = StringIO()

     def Append(self, str):
         self._file_str.write(str)

     def __str__(self):
         return self._file_str.getvalue()

now using it

sb = StringBuilder()

sb.Append("Hello\n")
sb.Append("World")

print sb
Thomas Watson
  • 585
  • 4
  • 8
6

you can try StringIO or cStringIO

Dominic K
  • 6,775
  • 10
  • 51
  • 62
ghostdog74
  • 307,646
  • 55
  • 250
  • 337
0

There is no explicit analogue - i think you are expected to use string concatenations(likely optimized as said before) or third-party class(i doubt that they are a lot more efficient - lists in python are dynamic-typed so no fast-working char[] for buffer as i assume). Stringbuilder-like classes are not premature optimization because of innate feature of strings in many languages(immutability) - that allows many optimizations(for example, referencing same buffer for slices/substrings). Stringbuilder/stringbuffer/stringstream-like classes work a lot faster than concatenating strings(producing many small temporary objects that still need allocations and garbage collection) and even string formatting printf-like tools, not needing of interpreting formatting pattern overhead that is pretty consuming for a lot of format calls.

Mastermind
  • 51
  • 6
-6

In case you are here looking for a fast string concatenation method in Python, then you do not need a special StringBuilder class. Simple concatenation works just as well without the performance penalty seen in C#.

resultString = ""

resultString += "Append 1"
resultString += "Append 2"

See Antoine-tran's answer for performance results

Community
  • 1
  • 1
Justas
  • 5,268
  • 2
  • 31
  • 35