almost 3 years ago

在學習Hadoop ecosystem時,都會把算一篇文章的字數.作為他的hello words。不過說來慚愧,hadoop玩了一小陣子,但是卻還沒有實際的用python來實作(非map reduce)。趁著這個機會,把這個練習作一次。

文章來源是愛麗絲夢遊仙境(Alice's Adventure in wonderland)

複習技能

  • sys.stdin與sys.stdout使用
  • collections模組下的Counter使用
  • 巢狀串列生成式(nested list comprehension)

程式碼

## most_common_words counting

# usage : most_common_words.py num_words

from collections import Counter
import sys

try:
    num_words = sys.argv[1]    

except:
    print "usage : most_common_words.py num_words"
    sys.exit(1)
    
counter = Counter( word.lower()  #lowercase words

    for line in sys.stdin #

    for word in line.strip().split() #split on space

    if word) # skip empty word


## nested list comprehension is same as below

# for line in sys.stdin:

#     for word in line.strip().split():

#         print word.lower()


for word,counts in counter.most_common(int(num_words)):
    sys.stdout.write(str(counts))
    sys.stdout.write("\t")
    sys.stdout.write(str(word))
    sys.stdout.write("\n")

## same as below


# for word,counts in counter.most_common(int(num_words)):

#     print "word:{},\tcounts:{}".format(word,counts)
← 資料科學趨勢 pandas資料整理(練習) →
 
comments powered by Disqus