C ++矢量计算每个元素的出现 [英] C++ vector counting each element's occurance
问题描述
我有一个类型为 vector< unsigned>
的向量,我想找出每个元素在该向量中出现了多少次。
I have a vector of type vector<unsigned>
and I want to find out how many times each element occurred in this vector.
此向量可能很大,所以循环遍历并不是一个好主意。
This vector could be pretty large, so looping through it wouldn't be a good idea I guess.
最有效的方法是什么?
推荐答案
除非向量已经排序(或至少分组为相同的形式)元素都在一起),不可避免地要查看向量中的每个项目(即使已排序,除非您期望有很多重复项,否则仍然很可能看到每个项目都是首选方法) 1 。
Unless the vector is already sorted (or at least grouped so identical elements are all together), looking at every item in the vector is unavoidable (and even if it is sorted, unless you expect a lot of duplicates, chances are pretty good that looking at every item will be the preferred method anyway)1.
一种显而易见的方法是遍历向量,并使用 std :: unordered_map $计数元素c $ c>:
One obvious method would be to walk through the vector, and count element with a std::unordered_map
:
std::unordered_map<unsigned, size_t> counts;
for (auto v : your_vector)
++counts[v];
然后,您可以(例如)打印出值:
Then you can (for example) print out the values:
for (auto const &p : counts)
std::cout << "The value: " << p.first << " occurred " << p.second << "times\n";
- 如果您这样做(期望)有很多重复项,并且这些项是有序的,您可以使用二进制搜索来查找当前值运行的结尾。理论上的收支平衡点是,如果给定值的重复项的平均数量等于总体大小的以2为底的对数,那么如果重复项的数量大于该值,则二进制搜索将需要较少的比较。现代的CPU从可预测的线性访问模式中获得了足够的收益,您可能实际上需要平均每个值的更多重复(平均)才能使二分查找成为一个赢。
- If you do (expect to) have lots of duplicates, and the items are ordered, you can use a binary search to find the end of the run of the current value. The theoretical break-even point for this is if the average number of duplicates of a given value is equal to the base 2 logarithm of the overall size, so if the number of duplicates is greater than that, a binary search will require fewer comparisons. A modern CPU gains enough from a predictable, linear access pattern that you'd probably need substantially more duplicates of each value (on average) for a binary search to be a win though.
这篇关于C ++矢量计算每个元素的出现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!