特定单词的NLTK搭配 [英] NLTK collocations for specific words

查看:239
本文介绍了特定单词的NLTK搭配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道如何使用NLTK来获得二元组和三元组的搭配,并将它们应用于我自己的语料库.代码如下.

I know how to get bigram and trigram collocations using NLTK and I apply them to my own corpora. The code is below.

但是我不确定(1)如何获取特定单词的搭配? (2)NLTK是否具有基于对数似然比的搭配指标?

I'm not sure however about (1) how to get the collocations for a particular word? (2) does NLTK have a collocation metric based on Log-Likelihood Ratio?

import nltk
from nltk.collocations import *
from nltk.tokenize import word_tokenize

text = "this is a foo bar bar black sheep  foo bar bar black sheep foo bar bar black  sheep shep bar bar black sentence"

trigram_measures = nltk.collocations.TrigramAssocMeasures()
finder = TrigramCollocationFinder.from_words(word_tokenize(text))

for i in finder.score_ngrams(trigram_measures.pmi):
    print i

推荐答案

尝试以下代码:

import nltk
from nltk.collocations import *
bigram_measures = nltk.collocations.BigramAssocMeasures()
trigram_measures = nltk.collocations.TrigramAssocMeasures()

# Ngrams with 'creature' as a member
creature_filter = lambda *w: 'creature' not in w


## Bigrams
finder = BigramCollocationFinder.from_words(
   nltk.corpus.genesis.words('english-web.txt'))
# only bigrams that appear 3+ times
finder.apply_freq_filter(3)
# only bigrams that contain 'creature'
finder.apply_ngram_filter(creature_filter)
# return the 10 n-grams with the highest PMI
print finder.nbest(bigram_measures.likelihood_ratio, 10)


## Trigrams
finder = TrigramCollocationFinder.from_words(
   nltk.corpus.genesis.words('english-web.txt'))
# only trigrams that appear 3+ times
finder.apply_freq_filter(3)
# only trigrams that contain 'creature'
finder.apply_ngram_filter(creature_filter)
# return the 10 n-grams with the highest PMI
print finder.nbest(trigram_measures.likelihood_ratio, 10)

它使用似然度量,还过滤掉不包含生物"一词的Ngram.

It uses the likelihood measure and also filters out Ngrams that don't contain the word 'creature'

这篇关于特定单词的NLTK搭配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆