Please note, new blog at http://www.acheron.org/darryl/

Java vs. ColdFusion: String Concatenation and CSV Generation

For a little while now I've been playing around with Java, and trying to replicate some common ColdFusion concepts in the language. Recently I came across a problem where generating a CSV file was taking a very long time, often making the server unresponsive. The code in question built a large string using concatenation, and then wrote the string to a file. It occurred to me that in-line Java might be a better tool for the job. A work colleague suggested using the StringBuffer/StringBuilder (5.0) class, as it is much more efficient than the String class. As you will see below, I ultimately ended up using the BufferedWriter class, and the execution time went from 5 minutes to 40 seconds!

Testing the theory

I decided to run some tests, to see how much quicker (or slower) in-line Java is than ColdFusion. The two tests are: I created a test script that appended a string to an existing string 1000 or 10,000 (or more) times. I also created a set of test scripts that built a string and/or appended a string to a CSV file.

ColdFusion vs. Java: String concatenation

I concluded that Java is a lot better at string concatenation than ColdFusion. This is especially true when dealing with very large strings. I deliberately did not test the CFSAVECONTENT tag as it not a string concatenation method. When concatenating a string 1000 times, there wasn't too much between them. ColdFusion took 172ms, whereas the Java class took 78ms. However, when increasing to 10,000 interations, ColdFusion took 17547ms (17s), and Java only took 406ms! I didn't dare test ColdFusion any further, and I know from experience that it would have taken quite some time.

ColdFusion vs. Java: CSV generation

I pitted Java's BufferedWriter class against ColdFusion's CFSAVECONTENT and CFFILE tags. I used the CFSAVECONTENT tag, because it is a lot quicker than concatenation. The Java test simply appended data to a file as it went, whereas the ColdFusion test built the string using CFSAVECONTENT and then wrote it to a file. Ultimately I concluded that the BufferedWriter class is a lot more efficient than the ColdFusion method. Building a 1000 row CSV file took Java 15ms, whereas ColdFusion took 47ms. When building a 10,000 row CSV file, the gap increased -- Java 64ms, ColdFusion 328ms. A 1,000,000 row file 49.6MB took Java 6547ms, whereas ColdFusion took 10860ms. In the 1,000,000 row test, the amount of memory used by CFSAVECONTENT was quite high. The server became sluggish after this, and memory did not seem to be freed. Using the BufferedWriter class however, you could barely notice any memory usage (as you would expect). The good thing about using the BufferedWriter class is that it uses very little memory. As soon as the initial buffer size (8192 characters) is reached, it is written to the file, and the buffer (memory) is then flushed/released. It is a fair statement that if you're generating a CSV file, then you should use Java. It is a much more scalable solution.

Results

Code Examples

Generating CSV with CFSAVECONTENT Note: I have moved some code to its own line for the sake of readibility. <cfscript> variables.iNrTimesToLoop = url.nr; variables.sStringToConcat = "The quick brown fox jumped over the fence: "; variables.sFileName = "#getcurrenttemplatepath()#_test_#gettickcount()#.txt"; start = gettickcount(); </cfscript> <cfset variables.sstring = ""> <cfoutput> <cfsavecontent variable="variables.sString"> <cfloop from="1" to="#variables.iNrTimesToLoop#" index="variables.x"> #variables.sStringToConcat##variables.x##chr(13)##chr(10)# </cfloop> </cfsavecontent> </cfoutput> <cffile action="append" file="#variables.sFileName#" output="#variables.sString#"> <cfscript> end = gettickcount(); total = end-start; </cfscript> <cfoutput> Total time: #total#ms </cfoutput> Java StringBuffer class oStringBuffer = CreateObject("java", "java.lang.StringBuffer").init(JavaCast("int",initSize)); // Append a string oStringBuffer.append("A string"); // Get string back from buffer sString = oStringBuffer.toString(); Java BufferedWriter class oFileWriter = CreateObject("java", "java.io.FileWriter").init("filename",JavaCast("boolean","true")); oBufferedWriter = CreateObject("java", "java.io.BufferedWriter").init(oFileWriter); // Write a string to buffer oBufferedWriter.write("A string");

By Anonymous Anonymous, at 7/05/2005 11:11:00 pm  

Can you post the CFML code? Or email it to me directly (mnimer@macromedia.com). Thanks!



By Anonymous Anonymous, at 7/06/2005 12:24:00 am  

I ran into this problem a few years back and it has to do with the way Cold Fusion writes variables. Specifically, it looks to me like it doesn't concatenate so much as it rewrites the old variable plus the new portion to the new variable. So the longer the string gets, the longer it takes to concatenate it.
This can be confirmed by using cfflush in conjuntction with writing out a "." between every loop interation. You will see that the process slows down over time. The way I found around this is to concatenate a temp variable in each interation, and concatenate to the master variable at the end of each loop. That is if you don't want to bother using Java. It sounds like your solution works for you.



By Anonymous Anonymous, at 7/06/2005 01:01:00 am  

I have written a udf called QueryToCSV2 to show the same concept. You can get this udf from cflib.org

http://cflib.org/udf.cfm?ID=1197



By Anonymous Anonymous, at 7/06/2005 06:58:00 am  

has anyone tested this against concantenation in the DB using good old SQL?



By Blogger Darryl Lyons, at 7/06/2005 10:31:00 pm  

Qasim, Nice UDF. I initialised our CSV generation component with a writer component -- which could either be a StringBuffer or a BufferedWriter class.



By Blogger Darryl Lyons, at 7/06/2005 10:32:00 pm  

Mike, I'll try to get the full code examples online soon.



By Blogger Darryl Lyons, at 7/07/2005 08:09:00 pm  

I've added a CFML code example -- its the test I used originally for the CFSAVECONTENT method.



By Anonymous Anonymous, at 7/14/2005 03:12:00 am  

This is a problem that I have struggled with for a long time. I have found that using cffile with action="append" is the best method. After implementing your method by using the BufferedReader class, it seems to me that the two methods are doing roughly the same thing--appending small strings to the end of a file, instead of appending small strings into one large string and writing it to a file. However, even with these two methods I find that the process is using an excessive amount of memory (300mb for a query of 50,000 records). Am I doing something wrong? Is there a way to reuse one chunk of memory to process all of the small strings individually? Right now a new string object is created for every string that is appended to the file. I have heard that you can rewrite the toString() method on the StringBuffer class so that it won't create a new string object, but I don't know how to do it and I imagine it would make my CF code much less portable. I would love any help on this one!!!



By Blogger Darryl Lyons, at 7/14/2005 06:37:00 pm  

Using the BufferedWriter class is not the same as CFFILE action=APPEND. ColdFusion is essentially opening the file every time (to my knowledge), whereas BufferedWriter writes to a stream.

The best way to achieve what you want is to use the Java.io.BufferedWriter class directly instead of your string concatenation. This way the string is written to the Buffer up until 8094 characters, and then is written to the file. I've found this method uses far less memory than regular string concatenation.

You can email me at darryllyons at fastmail.com.au if you want me to send you a code sample (or I can put up a more detailed post?)



By Anonymous Anonymous, at 8/24/2005 11:24:00 pm  

With the move to CFMX, string concatenation seemed to get a bit slower--very much slower on large strings. I have found an easy workaround that performs well. Instead of concatenating a string, perform an ArrayAppend(someArray,"someString" ) function inside your loop. Then when you are ready to output the data or write to a file, use the ArrayToList(someArray,"#chr(13)##chr(10)#" ) function. In a test I performed, the new ArrayAppend approach took 93 milliseconds and the old took 86657 milliseconds. Can Java do better than that? The test loop was 10000 iterations and the total size of the string was 1211Kb. ColdFusion seems to work with arrays much more efficiently than strings.



By Blogger Darryl Lyons, at 8/25/2005 06:22:00 pm  

Greg,

I'll have to look into that one -- good find.



By Anonymous Anonymous, at 2/03/2006 03:08:00 am  

Darryl,
I have a need to process large files as well, and I am trying to adapt your java bufferedWriter methodology, but I am not proficient in java at all. Could you post or send me the complete source that utilizes the java classes? I see and understand the CF example fine, but don't quite know how to incorporate the java into your CF example. Thanks a Lot!

bjk@glengrp.com



By Blogger 123, at 8/30/2010 01:30:00 pm  

http://www.submitwww.com/user/view/voted/login/hardysed123
http://www.submitwww.com/user/view/voted/login/hermes123
http://www.surfurls.com/user.php?login=freeboots&view=history
http://www.taagz.com/user.php?login=123mbt&view=history
http://www.taagz.com/user.php?login=bootpoint&view=history
http://www.taagz.com/user.php?login=breilting321&view=history
http://www.taagz.com/user.php?login=Chanel123&view=history
http://www.taagz.com/user.php?login=Chanel321&view=history
http://www.taagz.com/user.php?login=coach123&view=history
http://www.taagz.com/user.php?login=coach321&view=history



» Post a Comment