从文件中提取单词 [英] extract words from a file
本文介绍了从文件中提取单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试从一组文件中创建一个单词词典.有没有一种简单的方法可以打印文件中的所有单词,每行一个?
解决方案
你可以使用 grep
:
-E '\w+'
搜索词-o
只打印匹配行的部分
如果你只想打印每个单词一次,不考虑大小写,你可以使用 sort
-u
每个单词只打印一次-f
告诉sort
在比较单词时忽略大小写
I'm trying to create a dictionary of words from a collection of files. Is there a simple way to print all the words in a file, one per line?
解决方案
You could use grep
:
-E '\w+'
searches for words-o
only prints the portion of the line that matches
% cat temp Some examples use "The quick brown fox jumped over the lazy dog," rather than "Lorem ipsum dolor sit amet, consectetur adipiscing elit" for example text. # if you don't care whether words repeat % grep -o -E '\w+' temp Some examples use The quick brown fox jumped over the lazy dog rather than Lorem ipsum dolor sit amet consectetur adipiscing elit for example text
If you want to only print each word once, disregarding case, you can use sort
-u
only prints each word once-f
tellssort
to ignore case when comparing words
# if you only want each word once % grep -o -E '\w+' temp | sort -u -f adipiscing amet brown consectetur dog dolor elit example examples for fox ipsum jumped lazy Lorem over quick rather sit Some text than The use
这篇关于从文件中提取单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文