benchmark/top_tweet/README.md
The top_tweet benchmark finds the most-retweeted tweet in a twitter API response.
This scenario tends to measure an implementation's laziness: its ability to avoid parsing unneeded values, without knowing beforehand which values are needed.
To find the top tweet, an implementation needs to iterate through all tweets, remembering which one had the highest retweet count. While it scans, it will find many "candidate" tweets with the highest retweet count up to that point. However, While the implementation iterates through tweets, it will have many "candidate" tweets. Essentially, it has to keep track of the "top tweet so far" while it searches. However, only the text and screen_name of the final top tweet need to be parsed. Therefore, JSON parsers that can only parse values on the first pass (such as DOM or streaming parsers) will be forced to parse text and screen_name of every candidate (if not every single tweet). Parsers which can delay parsing of values until later will therefore shine in scenarios like this.
The benchmark will be called with run(padded_string &json, int64_t max_retweet_count, top_tweet_result &result).
The benchmark must:
The abridged schema (objects contain more fields than listed here):
{
"statuses": [
{
"text": "i like to tweet", // text containing UTF-8 and escape characters
"user": {
"screen_name": "AlexanderHamilton" // string containing UTF-8 (and escape characters?)
},
"retweet_count": 2, // uint32
},
...
]
}