String.Join vs. StringBuilder: which is faster?
在之前关于将
简短回答:视情况而定。
长答案:如果您已经有一个字符串数组要连接在一起(使用分隔符),那么
如果事先没有将字符串作为数组,那么使用
编辑:这是一个对
编辑:当我在家的时候,我会写一个基准,这对于
这是我的测试平台,使用
1 2 | Join: 9420ms (chk: 210710000 OneBuilder: 9021ms (chk: 210710000 |
(更新
1 2 | Join: 11635ms (chk: 210710000 OneBuilder: 11385ms (chk: 210710000 |
(更新re 2048*64*150)
1 2 | Join: 11620ms (chk: 206409600 OneBuilder: 11132ms (chk: 206409600 |
并且启用了优化测试:
1 2 | Join: 11180ms (chk: 206409600 OneBuilder: 10784ms (chk: 206409600 |
如此之快,但并非如此;装备(在控制台、释放模式下运行等):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | using System; using System.Collections.Generic; using System.Diagnostics; using System.Text; namespace ConsoleApplication2 { class Program { static void Collect() { GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced); GC.WaitForPendingFinalizers(); GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced); GC.WaitForPendingFinalizers(); } static void Main(string[] args) { const int ROWS = 500, COLS = 20, LOOPS = 2000; int[][] data = new int[ROWS][]; Random rand = new Random(123456); for (int row = 0; row < ROWS; row++) { int[] cells = new int[COLS]; for (int col = 0; col < COLS; col++) { cells[col] = rand.Next(); } data[row] = cells; } Collect(); int chksum = 0; Stopwatch watch = Stopwatch.StartNew(); for (int i = 0; i < LOOPS; i++) { chksum += Join(data).Length; } watch.Stop(); Console.WriteLine("Join: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum); Collect(); chksum = 0; watch = Stopwatch.StartNew(); for (int i = 0; i < LOOPS; i++) { chksum += OneBuilder(data).Length; } watch.Stop(); Console.WriteLine("OneBuilder: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum); Console.WriteLine("done"); Console.ReadLine(); } public static string Join(int[][] array) { return String.Join(Environment.NewLine, Array.ConvertAll(array, row => String.Join(",", Array.ConvertAll(row, x => x.ToString())))); } public static string OneBuilder(IEnumerable<int[]> source) { StringBuilder sb = new StringBuilder(); bool firstRow = true; foreach (var row in source) { if (firstRow) { firstRow = false; } else { sb.AppendLine(); } if (row.Length > 0) { sb.Append(row[0]); for (int i = 1; i < row.Length; i++) { sb.Append(',').Append(row[i]); } } } return sb.ToString(); } } } |
我不这么认为。通过Reflector,
我创建了两种测试方法来比较它们:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | public static string TestStringJoin(double[][] array) { return String.Join(Environment.NewLine, Array.ConvertAll(array, row => String.Join(",", Array.ConvertAll(row, x => x.ToString())))); } public static string TestStringBuilder(double[][] source) { // based on Marc Gravell's code StringBuilder sb = new StringBuilder(); foreach (var row in source) { if (row.Length > 0) { sb.Append(row[0]); for (int i = 1; i < row.Length; i++) { sb.Append(',').Append(row[i]); } } } return sb.ToString(); } |
我对每种方法运行了50次,通过一个大小为
1 2 3 4 5 6 7 | // with zeros: TestStringJoin took 00:00:02.2755280 TestStringBuilder took 00:00:02.3536041 // with random values: TestStringJoin took 00:00:05.6412147 TestStringBuilder took 00:00:05.8394650 |
将数组的大小增加到
1 2 3 4 5 6 7 | // with zeros: TestStringJoin took 00:00:03.7146628 TestStringBuilder took 00:00:03.8886978 // with random values: TestStringJoin took 00:00:09.4991765 TestStringBuilder took 00:00:09.3033365 |
结果是可重复的(几乎;由不同的随机值引起的小波动)。显然,大多数情况下,
这是我用来测试的代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | const int Iterations = 50; const int Rows = 2048; const int Cols = 64; // 512 static void Main() { OptimizeForTesting(); // set process priority to RealTime // test 1: zeros double[][] array = new double[Rows][]; for (int i = 0; i < array.Length; ++i) array[i] = new double[Cols]; CompareMethods(array); // test 2: random values Random random = new Random(); double[] template = new double[Cols]; for (int i = 0; i < template.Length; ++i) template[i] = random.NextDouble(); for (int i = 0; i < array.Length; ++i) array[i] = template; CompareMethods(array); } static void CompareMethods(double[][] array) { Stopwatch stopwatch = Stopwatch.StartNew(); for (int i = 0; i < Iterations; ++i) TestStringJoin(array); stopwatch.Stop(); Console.WriteLine("TestStringJoin took" + stopwatch.Elapsed); stopwatch.Reset(); stopwatch.Start(); for (int i = 0; i < Iterations; ++i) TestStringBuilder(array); stopwatch.Stop(); Console.WriteLine("TestStringBuilder took" + stopwatch.Elapsed); } static void OptimizeForTesting() { Thread.CurrentThread.Priority = ThreadPriority.Highest; Process currentProcess = Process.GetCurrentProcess(); currentProcess.PriorityClass = ProcessPriorityClass.RealTime; if (Environment.ProcessorCount > 1) { // use last core only currentProcess.ProcessorAffinity = new IntPtr(1 << (Environment.ProcessorCount - 1)); } } |
除非1%的差异在整个程序运行所需的时间方面变得非常重要,否则这看起来像是微观优化。我会编写最可读/最易懂的代码,不用担心1%的性能差异。
大约一个月前,阿特伍德有一篇与此相关的文章:
http://www.codingsurror.com/blog/archives/001218.html
对。如果你做的连接不止两个,那会快得多。
执行string.join时,运行时必须:
如果进行两次联接,则必须复制数据两次,依此类推。
StringBuilder为一个缓冲区分配了备用空间,因此可以在不复制原始字符串的情况下追加数据。由于缓冲区中还有剩余空间,因此可以直接将附加字符串写入缓冲区。然后它只需要在末尾复制整个字符串一次。