A stemmer for Hindi implemented in Python.
This program implements the suffix-stripping algorithm described in "A Lightweight Stemmer for Hindi" by Ananthakrishnan Ramanathan and Durgesh D Rao.
The file (hindi_stemmer.py) may be used as a standalone program or as a module. When used as a program, it reads text from stdin and writes the stemmed text to stdout. Examples:
$ echo "ख़रीदारों के लिए मार्ग दर्शिका" | ./hindi_stemmer.py खरीदार के लिए मार्ग दर्शिक
The program reads from stdin and writes to stdout. Command syntax:
$ ./hindi_stemmer.py < input.txt > output.txt
hindi_stemmer_rev0.tar.gz
The program requires Python ≥ 3.1. Tested on linux.
This code is released under a Creative Commons Attribution 3.0 Unported License.
Luís Gomes luismsgomes@gmail.com http://hlt.di.fct.unl.pt/luis/ updated 19 November 2010