So that means it is more likely that those 100% hexagons are out of a very small number of tweets overall rather than a statisticaly large number of tweets containing the keyword.
50,00 by 200 is only 250 tweets per hex if evenly distributed (not likely), so it wouldn't take much to create a number of hexs with very small tweet counts.
If you can't increase the sample size it would at least make sense to report the total number of tweets per hex, the map is not very informative otherwise. And the legend should probably read proportion of tweets with keyword, not frequency right?
We randomly sample 50,000 tweets (for every query), then take the keyword tweets per hexagon normalized by total tweets per hexagon. I think there is something like 200 hexagons, so we feel like it is a fair sample size. The database is too big to use all the tweets.
I understand what you are saying, and I quickly changed the denominator to the whole population and I got some funky results back due to the classification scheme provided by D3js. I think it would be a good idea to look into changing the functionality to represent a total NYC tweet normalization.... Or add a function to re-symbolize using raw point data and/or proportional symbols.