Python – 对乱码字中的字进行高效狩猎
我想你可以将其分类为拼字游戏风格的问题,但是由于朋友提到英国电视测验节目倒数计时,它开始了.演出中的各种表演包括参赛者被提出了一组乱码的信件,他们必须提出最长的话.我的朋友提到的是“RAEPKWAEN”. 在相当短的时间内,我用Python鞭打了一些东西来处理这个问题,使用PyEnchant来处理字典查找,但是我注意到它真的不能扩展这些. 这是我目前所在的 #!/usr/bin/python from itertools import permutations import enchant from sys import argv def find_longest(origin): s = enchant.Dict("en_US") for i in range(len(origin),-1): print "Checking against words of length %d" % i pool = permutations(origin,i) for comb in pool: word = ''.join(comb) if s.check(word): return word return "" if (__name__)== '__main__': result = find_longest(argv[1]) print result 就像他们在节目中使用的9个字母的例子,9个阶乘= 362,880和8个阶乘= 40,320就好了.在这个规模上,即使它必须检查所有可能的排列和字长度,这不是很多. 然而,一旦你达到了14个字符,这是87,178,291,200个可能的组合,这意味着你依赖运气,迅速找到一个14个字符的字. 用上面的例子,我的机器大约需要12 1/2秒才能找到“reawaken”.使用14个字符的加密字,我们可以在23天的时间内谈话,只是为了检查所有可能的14个字符的排列. 有没有更有效的方法来处理这个? 解决方法实施 Jeroen Coupé想法 his answer与字母数:from collections import defaultdict,Counter def find_longest(origin,known_words): return iter_longest(origin,known_words).next() def iter_longest(origin,known_words,min_length=1): origin_map = Counter(origin) for i in xrange(len(origin) + 1,min_length - 1,-1): for word in known_words[i]: if check_same_letters(origin_map,word): yield word def check_same_letters(origin_map,word): new_map = Counter(word) return all(new_map[let] <= origin_map[let] for let in word) def load_words_from(file_path): known_words = defaultdict(list) with open(file_path) as f: for line in f: word = line.strip() known_words[len(word)].append(word) return known_words if __name__ == '__main__': known_words = load_words_from('words_list.txt') origin = 'raepkwaen' big_origin = 'raepkwaenaqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnm' print find_longest(big_origin,known_words) print list(iter_longest(origin,5)) 输出(对于我的小58000字dict): counterrevolutionaries ['reawaken','awaken','enwrap','weaken','weaker','apnea','arena','awake','aware','newer','paean','parka','pekan','prank','prawn','preen','renew','waken','wreak'] 笔记: >这是简单的实现,没有优化. UPDATE 如果我们只需要找到一次词,并且我们有字典,按长度排序的单词,例如通过这个脚本: with open('words_list.txt') as f: words = f.readlines() with open('words_by_len.txt','w') as f: for word in sorted(words,key=lambda w: len(w),reverse=True): f.write(word) 我们可以找到最长的字,而不是加载完整的dict到内存: from collections import Counter import sys def check_same_letters(origin_map,word): new_map = Counter(word) return all(new_map[let] <= origin_map[let] for let in word) def iter_longest_from_file(origin,file_path,min_length=1): origin_map = Counter(origin) origin_len = len(origin) with open(file_path) as f: for line in f: word = line.strip() if len(word) > origin_len: continue if len(word) < min_length: return if check_same_letters(origin_map,word): yield word def find_longest_from_file(origin,file_path): return iter_longest_from_file(origin,file_path).next() if __name__ == '__main__': origin = sys.argv[1] if len(sys.argv) > 1 else 'abcdefghijklmnopqrstuvwxyz' print find_longest_from_file(origin,'words_by_len.txt') (编辑:甘南站长网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |