The immutable form is provided as an experiment based on discussions from the boost list (see References). It has the advantage of never changing and hence is thread safe and can optimize some operations. However, it is slower on some kinds of operations. See Performance for more details.
Some of the key functions provided include:
#include "super_string/super_string.hpp" //... super_string s(" (456789) [123] 2006-10-01 abcdef "); s.to_upper(); cout << s << endl; s.trim(); //lop off the whitespace on both sides cout << s << endl; double dbl = 1.23456; s.append(dbl); //append any streamable type s+= " "; cout << s << endl; date d(2006, Jul, 1); s.insert_at(28, d); //insert any streamable type cout << s << endl; //find the yyyy-mm-dd date format if (s.contains_regex("\\d{4}-\\d{2}-\\d{2}")) { //replace parens around digits with square brackets [the digits] s.replace_all_regex("\\(([0-9]+)\\)", "__[$1]__"); cout << s << endl; //split the string on white space to process parts super_string::string_vector out_vec; unsigned int count = s.split_regex("\\s+", out_vec); if (count) { for(int i=0; i < out_vec.size(); ++i) { out_vec[i].replace_first("__",""); //get rid of first __ in string cout << i << " " << out_vec[i] << endl; } } } //wide strings too... wsuper_string ws(L" hello world "); ws.trim_left(); wcout << ws << endl;
In the immutable form all mutating functions must be assigned back to the target string. In this case always the same string.
#include "super_string/const_super_string.hpp" //... //const_super_string is immutable const_super_string s(" (456789) [123] 2006-10-01 abcdef "); s = s.to_upper(); //" (456789) [123] 2006-10-01 ABCDEF " cout << s << endl; s = s.trim(); //"(456789) [123] 2006-10-01 ABCDEF" cout << s << endl; double dbl = 1.23456; s = s.append(dbl); //"(456789) [123] 2006-10-01 ABCDEF1.23456" cout << s << endl; //find the yyyy-mm-dd date format if (s.contains_regex("\\d{4}-\\d{2}-\\d{2}")) { //replace parens around digits with square brackets [the digits] s = s.replace_all_regex("\\(([0-9]+)\\)", "__[$1]__"); cout << s << endl; //split the string on white space to process parts const_super_string::string_vector out_vec; unsigned int count = s.split_regex("\\s+", out_vec); if (count) { for(int i=0; i < out_vec.size(); ++i) { out_vec[i].replace_first("__",""); //get rid of first __ in string cout << i << " " << out_vec[i] << endl; } } } //wide strings too... wconst_super_string ws(L" hello world "); ws.trim_left(); wcout << ws << endl; return 0; }
Functional Requirements
Non-Functional Requirements
Overall, this class is mostly a convience wrapper around functions available in boost.string_algo and boost.regex.
Files:
Test code:
Example code:
Docs
Here's an example of how the type-based interface results in clearer client code:
std::string s1("foo"); std::string s2("bar); std::string s3("foo"); //The next line makes me go read the docs again, every time replace_all(s1,s2,s3); //which string is modified exactly? or s1.replace_all(s2, s3); //obvious which string is modified here
Another reason for super_string is the simplification of documenation. Generic libraries have many template parameters which often makes it difficult to focus on the user documentation. Just take regex_replace as a case in point:
template <class OutputIterator, class BidirectionalIterator, class traits, class charT> OutputIterator regex_replace(OutputIterator out, BidirectionalIterator first, BidirectionalIterator last, const basic_regex<charT, traits>& e, const basic_string<charT>& fmt, match_flag_type flags = match_default); template <class traits, class charT> basic_string<charT> regex_replace(const basic_string<charT>& s, const basic_regex<charT, traits>& e, const basic_string<charT>& fmt, match_flag_type flags = match_default);
My first reaction when I read this is, wow, interesting, but how do I use it? It's hard for even an experienced guy like me to see the forest from the template tree's here. So I scroll down to the example and start reading the example code. Ok, now I see it and I can go back, consume it, ponder more...then realize, ok I guess it's the second signature because I'm using an std::string...now I can go write some code. (Of course, I normally don't do it like this because I go and look up some regex code I've already written).
Now lets compare JavaString.replaceAll short description.
String replaceAll(String regex, String replacement)
Replaces each substring of this string that matches the given
regular expression with the given replacement.
Wow, ok I don't need to see the example code, I can write code now. I might need to read more about the regex string rules, but no biggie they follow expected conventions. After 2 minutes I'm testing code.
Of course, JavaString.replaceAll is just lame compared to what regex can do. But, you know, it covers most of what I use for typical day to day string processing. It's clean, easy, fast -- I can focus on other parts of my app rather than the template parameters for the string function.
Now lets examine the hastily created *pre-alpha* super_string docs:
template<class char_type> void basic_super_string< char_type >::replace_all_regex( const base_string_type & match_regex, const base_string_type & replace_format)
Replace the all instance of the match_string with the replace_format.
super_string s("(abc)3333()(456789) [123] (1) (cde)"); //replace parens around digits with #--the digits--# s.replace_all_regex("\\(([0-9]+)\\)", "#--$1--#"); //s == "(abc)3333()#--456789--# [123] #--1--# (cde)"
Right from the start there's only one signature and only one template parameter to document -- char_type is pretty easy to understand, doesn't even really require explanation -- but really the docs would be nicer without that distraction. The context is string processing, so I don't have to worry about explaining the regex function can work on vector<char> or whatever sequence I want. I've ditched a couple parameters of function parameters -- always going for the regex defaults. So super_string is more like JavaString -- very limited compared to full up regex or string_algo, but it's easier to document and use for common cases.
Most of the explanation comes from this discussion thread from the Boost list.
Put another way, just because a function isn't a member of a class doesn't mean it isn't part of the programmer interface. If you consider the size of the interface defined by boost string algorithms, boost regex, and boost format the super_string is small in comparison.
The following is the output of the performance program on Linux compiled with gcc-4.0 with -O3.
500000 iterations of const append test: 0 --> 00:00:02.455757 500000 iterations of const append test: 1 --> 00:00:02.525834 500000 iterations of const append test: 2 --> 00:00:02.515612 500000 iterations of const append test: 3 --> 00:00:02.483702 500000 iterations of const append test: 4 --> 00:00:02.512949 500000 iterations of const append test: 5 --> 00:00:02.508725 500000 iterations of const append test: 6 --> 00:00:02.534567 500000 iterations of const append test: 7 --> 00:00:02.513080 500000 iterations of const append test: 8 --> 00:00:02.515201 500000 iterations of const append test: 9 --> 00:00:02.506260 const append test --> 10 trials 500000 iterations/trial total elapsed: 00:00:25.071687 500000 iterations of mutable append test: 0 --> 00:00:02.136063 500000 iterations of mutable append test: 1 --> 00:00:02.190057 500000 iterations of mutable append test: 2 --> 00:00:02.149970 500000 iterations of mutable append test: 3 --> 00:00:02.185335 500000 iterations of mutable append test: 4 --> 00:00:02.188219 500000 iterations of mutable append test: 5 --> 00:00:02.206448 500000 iterations of mutable append test: 6 --> 00:00:02.274351 500000 iterations of mutable append test: 7 --> 00:00:02.192235 500000 iterations of mutable append test: 8 --> 00:00:02.193230 500000 iterations of mutable append test: 9 --> 00:00:02.188578 mutable append test --> 10 trials 500000 iterations/trial total elapsed: 00:00:21.904486 1000000 iterations of const trim test: 0 --> 00:00:01.484179 1000000 iterations of const trim test: 1 --> 00:00:01.488741 1000000 iterations of const trim test: 2 --> 00:00:01.489407 1000000 iterations of const trim test: 3 --> 00:00:01.498976 1000000 iterations of const trim test: 4 --> 00:00:01.501453 1000000 iterations of const trim test: 5 --> 00:00:01.501348 1000000 iterations of const trim test: 6 --> 00:00:01.503969 1000000 iterations of const trim test: 7 --> 00:00:01.512235 1000000 iterations of const trim test: 8 --> 00:00:01.501978 1000000 iterations of const trim test: 9 --> 00:00:01.495517 const trim test --> 10 trials 1000000 iterations/trial total elapsed: 00:00:14.977803 1000000 iterations of mutable trim test: 0 --> 00:00:01.160393 1000000 iterations of mutable trim test: 1 --> 00:00:01.160549 1000000 iterations of mutable trim test: 2 --> 00:00:01.165937 1000000 iterations of mutable trim test: 3 --> 00:00:01.173893 1000000 iterations of mutable trim test: 4 --> 00:00:01.173779 1000000 iterations of mutable trim test: 5 --> 00:00:01.182267 1000000 iterations of mutable trim test: 6 --> 00:00:01.173212 1000000 iterations of mutable trim test: 7 --> 00:00:01.172464 1000000 iterations of mutable trim test: 8 --> 00:00:01.171245 1000000 iterations of mutable trim test: 9 --> 00:00:01.159949 mutable trim test --> 10 trials 1000000 iterations/trial total elapsed: 00:00:11.693688 100000 iterations of const contains regex test: 0 --> 00:00:02.701513 100000 iterations of const contains regex test: 1 --> 00:00:02.729457 100000 iterations of const contains regex test: 2 --> 00:00:02.704867 100000 iterations of const contains regex test: 3 --> 00:00:02.700814 100000 iterations of const contains regex test: 4 --> 00:00:02.702312 100000 iterations of const contains regex test: 5 --> 00:00:02.699262 100000 iterations of const contains regex test: 6 --> 00:00:02.703698 100000 iterations of const contains regex test: 7 --> 00:00:02.703122 100000 iterations of const contains regex test: 8 --> 00:00:02.704925 100000 iterations of const contains regex test: 9 --> 00:00:02.694748 const contains regex test --> 10 trials 100000 iterations/trial total elapsed: 00:00:27.044718 100000 iterations of mutable contains regex test: 0 --> 00:00:02.781685 100000 iterations of mutable contains regex test: 1 --> 00:00:02.759013 100000 iterations of mutable contains regex test: 2 --> 00:00:02.762457 100000 iterations of mutable contains regex test: 3 --> 00:00:02.761785 100000 iterations of mutable contains regex test: 4 --> 00:00:02.761454 100000 iterations of mutable contains regex test: 5 --> 00:00:02.761979 100000 iterations of mutable contains regex test: 6 --> 00:00:02.760832 100000 iterations of mutable contains regex test: 7 --> 00:00:02.763009 100000 iterations of mutable contains regex test: 8 --> 00:00:02.760816 100000 iterations of mutable contains regex test: 9 --> 00:00:02.760578 mutable contains regex test --> 10 trials 100000 iterations/trial total elapsed: 00:00:27.633608 100000 iterations of const split regex test: 0 --> 00:00:01.636014 100000 iterations of const split regex test: 1 --> 00:00:01.636228 100000 iterations of const split regex test: 2 --> 00:00:01.635880 100000 iterations of const split regex test: 3 --> 00:00:01.632247 100000 iterations of const split regex test: 4 --> 00:00:01.637720 100000 iterations of const split regex test: 5 --> 00:00:01.691922 100000 iterations of const split regex test: 6 --> 00:00:01.682272 100000 iterations of const split regex test: 7 --> 00:00:01.657319 100000 iterations of const split regex test: 8 --> 00:00:01.654802 100000 iterations of const split regex test: 9 --> 00:00:01.652675 const split regex test --> 10 trials 100000 iterations/trial total elapsed: 00:00:16.517079 100000 iterations of mutable split regex test: 0 --> 00:00:01.766323 100000 iterations of mutable split regex test: 1 --> 00:00:01.766020 100000 iterations of mutable split regex test: 2 --> 00:00:01.739266 100000 iterations of mutable split regex test: 3 --> 00:00:01.766605 100000 iterations of mutable split regex test: 4 --> 00:00:01.767521 100000 iterations of mutable split regex test: 5 --> 00:00:01.739031 100000 iterations of mutable split regex test: 6 --> 00:00:01.767348 100000 iterations of mutable split regex test: 7 --> 00:00:01.765335 100000 iterations of mutable split regex test: 8 --> 00:00:01.740199 100000 iterations of mutable split regex test: 9 --> 00:00:01.765060 mutable split regex test --> 10 trials 100000 iterations/trial total elapsed: 00:00:17.582708
version 2