在MongoDb中查找两个集合之间的交集 [英] Finding intersection between two collections in MongoDb

查看:1687
本文介绍了在MongoDb中查找两个集合之间的交集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个非常大的文档集(超过30000个文档),一个包含从文本文件中提取的单词(集合名称"word"),另一个包含从字典中提取的单词(集合名称"dictionary").

I have two very large(30000+ documents) collections, one contains words extracted from a text file(collection name 'word') and one contains words from a dictionary(collection name 'dictionary').

如何获取两个集合中都存在的单词?

How can I get the words that exist in both collections?

(我已经简化了这种情况,单词"集合中的文档包含有关单词的元数据,因此每个单词必须是一个单独的文档.)

(I've simplified the situation, documents inside the 'word' collection contain metadata about the words, so each word has to be a separate document.)

推荐答案

将两个集合都复制到一个集合中(如有必要,请包含一个鉴别符字段,这样您就可以知道每个实例中包含哪种类型的文档).

Copy both collections into a single collection (include a discriminator field if necessary so you can tell what kind of document you have in each instance).

对该集合运行map-reduce

Run map-reduce on that collection

在Map中,根据要映射的文档是实例还是字典条目,发出单词作为关键字和值,例如{instance:1, dict:0}{instance:0, dict:1}. (您可以根据需要在此处将更多字段添加到值中.)

In Map, emit the word as the key and a value, say {instance:1, dict:0} or {instance:0, dict:1} depending on whether the document being mapped is an instance or a dictionary entry. (You could add more fields here into the values as necessary.)

在Reduce中,像往常一样累积分数.

In Reduce, accumulate the scores (as usual).

现在执行查询以查找instance > 0dict > 0,您将同时拥有这两个词.

Now do a query looking for instance > 0 and dict > 0 and you have all of the words that are in both.

这篇关于在MongoDb中查找两个集合之间的交集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆