There are four types of stemming strategies:
- Porter: or Reduction stemming — A transforming algorithm that reduces any of the forms of a word such as "runs, running, ran", to its elemental root e.g., "run". Porter stemming must be performed both at insertion time and at query time.
- Lucene-Hunspell aims to provide features such as stemming, decompounding, spellchecking, normalization, term expansion, etc. taking advantage of the existing lexical resources already created and widely-used in projects like OpenOffice. This is still alpha-version but with an impressive list of supported languages
- Expansion stemming — Takes a root word and 'expands' it to all of its various forms — can be used either at insertion time or at query time. One way to approach this is by using the SynonymFilterFactory
- KStem an alternative to Porter for developers looking for a less agressive stemmer.
page revision: 0, last edited: 19 Sep 2010 00:04