[ACCEPTED]-How could I speed up comparison of std::string against string literals?-stl
Similar to Dietmar's solution, but with 8 slightly less editing: you can wrap the 7 string (once) instead of each literal
#include <string>
#include <cstring>
struct FastLiteralWrapper {
std::string const &s;
explicit FastLiteralWrapper(std::string const &s_) : s(s_) {}
template <std::size_t ArrayLength>
bool operator== (char const (&other)[ArrayLength]) {
std::size_t const StringLength = ArrayLength - 1;
return StringLength == s.size()
&& std::memcmp(s.data(), other, StringLength) == 0;
}
};
and 6 your code becomes:
const std:string someStdString = "blahblahblah";
// just for the context of the comparison:
FastLiteralWrapper someString(someStdString);
if( someString == "(" ) {
//do something
} else if( someString == ")" ) {
//do something else
} else if// this chain can be very long
NB. the fastest solution 5 - at the cost of more editing - is probably 4 to build a (perfect) hash or trie mapping 3 string literals to enumerated constants, and 2 then just switch
on the looked-up value. Long 1 if
/else if
chains usually smell bad IMO.
Well, aside from C++14's string_literal
, you could easily 4 code up a solution:
For comparison with a 3 single character, use a character literal 2 and:
bool operator==(const std::string& s, char c) { return s.size() == 1 && s[0] == c; }
For comparison with a string literal, you 1 can use something like this:
template<std::size_t N> bool operator==(const std::string& s, char const (&literal)[N]) { return s.size() == N && std::memcmp(s.data(), literal, N-1) == 0; }
Disclaimer:
- The first might even be superfluous,
- Only do this if you measure an improvement over what you had.
If you have long chain of string literals 22 to compare to there is likely some potential 21 to deal with comparing prefixes to group 20 common processing. Especially when comparing 19 a known set of strings for equality with 18 an input string, there is also the option 17 to use a perfect hash and key the operations off an 16 integer produced by those.
Since the use 15 of a perfect hash will probably have the 14 best performance but also requires major 13 changes of the code layout, an alternative 12 could be to determine the size of the string 11 literals at compile time and use this size 10 while comparing. For example:
class Literal {
char const* d_base;
std::size_t d_length;
public:
template <std::size_t Length>
Literal(char const (&base)[Length]): d_base(base), d_length(Length - 1) {}
bool operator== (std::string const& other) const {
return other.size() == this->d_length
&& !other.memcmp(this->d_base, other.c_str(), this->d_length);
}
bool operator!=(std::string const& other) const { return !(*this == other); }
};
bool operator== (std::string const& str, Literal const& literal) {
return literal == str;
}
bool operator!= (std::string const& str, Literal const& literal) {
return !(str == literal);
}
Obviously, this 9 assumes that your literals don't embed null 8 characters ('\0') other than the implicitly 7 added terminating null character as the 6 static length would otherwise be distorted. Using 5 C++11 constexpr
it would be possible to guard against 4 that possibility but the code gets somewhat 3 more complicated without any good reason. You'd 2 then compare your strings using something 1 like
if (someString == Literal("(")) {
...
}
else if (someString == Literal(")")) {
...
}
The fastest string comparison you can get 25 is by interning the strings: Build a large 24 hash table that contains all strings that are 23 ever created. Ensure that whenever a string 22 object is created, it is first looked up 21 from the hash table, only creating a new 20 object if no preexisting object is found. Naturally, this 19 functionality should be encapsulated in 18 your own string class.
Once you have done 17 this, string comparison is equivalent to 16 comparing their addresses.
This is actually 15 quite an old technique first popularized 14 with the LISP language.
The point, why this 13 is faster, is that every string only has 12 to be created once. If you are careful, you'll 11 never generate the same string twice from 10 the same input bytes, so string creation 9 overhead is controlled by the amount of 8 input data you work through. And hashing 7 all your input data once is not a big deal.
The 6 comparisons, on the other hand, tend to 5 involve the same strings over and over again 4 (like your comparing to literal strings) when 3 you write some kind of a parser or interpreter. And 2 these comparisons are reduced to a single 1 machine instruction.
2 other ideas :
A) Build a FSA using a lexical 15 analyser tool like flex, so the string is 14 converted to an integer token value, depending 13 what it matches.
B) Use length, to break 12 up long elseif chains, possibly partly table 11 driven
Why not get the length of the string 10 something, at the top then just compare 9 against the literals it could possibly match.
If 8 there's a lot of them, it may be worth making 7 it table driven and use a map and function 6 pointers. You could just special case the 5 single character literals, for example perhaps 4 using a function lookup table.
Finding non-matches 3 fast and the common lengths may suffice, and 2 not require too much code restructuring, but 1 be more maintainable as well as faster.
int len = strlen (something);
if ( ! knownliterallength[ len]) {
// not match
...
} else {
// First char may be used to index search, or literals are stored in map with *func()
switch (len)
{
case 1: // Could use a look table index by char and *func()
processchar( something[0]);
break;
case 2: // Short strings
case 3:
case 4:
processrunts( something);
break
default:
// First char used to index search, or literals are stored in map with *func()
processlong( something);
break
}
}
This is not the prettiest solution but it 8 has proved quite fast when there is a lot 7 of short strings to be compared (like operators 6 and control characters/keywords in a script 5 parser?).
Create a search tree based on 4 string length and only compare characters. Try 3 to represent known strings as an enumeration 2 if this makes it cleaner in the particular 1 implementation.
Short example:
enum StrE {
UNKNOWN = 0 ,
RIGHT_PAR ,
LEFT_PAR ,
NOT_EQUAL ,
EQUAL
};
StrE strCmp(std::string str)
{
size_t l = str.length();
switch(l)
{
case 1:
{
if(str[0] == ')') return RIGHT_PAR;
if(str[0] == '(') return LEFT_PAR;
// ...
break;
}
case 2:
{
if(str[0] == '!' && str[1] == '=') return NOT_EQUAL;
if(str[0] == '=' && str[1] == '=') return EQUAL;
// ...
break;
}
// ...
}
return UNKNOWN;
}
int main()
{
std::string input = "==";
switch(strCmp(input))
{
case RIGHT_PAR:
printf("right par");
break;
case LEFT_PAR:
printf("left par");
break;
case NOT_EQUAL:
printf("not equal");
break;
case EQUAL:
printf("equal");
break;
case UNKNOWN:
printf("unknown");
break;
}
}
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.