[ACCEPTED]-Stemming English words with Lucene-porter-stemmer

Accepted answer
Score: 27

SnowballAnalyzer is deprecated, you can 2 use Lucene Porter Stemmer instead:

 PorterStemmer stem = new PorterStemmer();
 String result = stem.getCurrent();

Hope this 1 help!

Score: 23
import org.apache.lucene.analysis.PorterStemmer;
String stemTerm (String term) {
    PorterStemmer stemmer = new PorterStemmer();
    return stemmer.stem(term);

See here for more details. If stemming is all 3 you want to do, then you should use this instead 2 of Lucene.

Edit: You should lowercase term before 1 passing it to stem().

Score: 6

Why aren't you using the "EnglishAnalyzer"? It's 2 simple to use it and I think it'd solve 1 your problem:

EnglishAnalyzer en_an = new EnglishAnalyzer(Version.LUCENE_34);
QueryParser parser = new QueryParser(Version.LUCENE_34, "your_field", en_an);
String str = "amenities";
System.out.println("result: " + parser.parse(str)); //amenit

Hope it helps you!

Score: 5

The previous example applies stemming to 9 a search query, so if you are interesting 8 to stem a full text you can try the following:

import java.io.*;
import org.apache.lucene.analysis.*;
import org.apache.lucene.analysis.tokenattributes.*;
import org.apache.lucene.analysis.snowball.*;
import org.apache.lucene.util.*;
public class Stemmer{
    public static String Stem(String text, String language){
        StringBuffer result = new StringBuffer();
        if (text!=null && text.trim().length()>0){
            StringReader tReader = new StringReader(text);
            Analyzer analyzer = new SnowballAnalyzer(Version.LUCENE_35,language);
            TokenStream tStream = analyzer.tokenStream("contents", tReader);
            TermAttribute term = tStream.addAttribute(TermAttribute.class);

            try {
                while (tStream.incrementToken()){
                    result.append(" ");
            } catch (IOException ioe){
                System.out.println("Error: "+ioe.getMessage());

        // If, for some reason, the stemming did not happen, return the original text
        if (result.length()==0)
        return result.toString().trim();

    public static void main (String[] args){
        Stemmer.Stem("Michele Bachmann amenities pressed her allegations that the former head of her Iowa presidential bid was bribed by the campaign of rival Ron Paul to endorse him, even as one of her own aides denied the charge.", "English");

The 7 TermAttribute class has been deprecated 6 and will not longer be supported in Lucene 5 4, but the documentation is not clear on 4 what to use at its place.

Also in the first 3 example the PorterStemmer is not available 2 as a class (hidden) so you cannot use it 1 directly.

Hope this helps.

Score: 3

Here is how you can use Snowball Stemmer 1 in JAVA:

import org.tartarus.snowball.ext.EnglishStemmer;

EnglishStemmer english = new EnglishStemmer();
String[] words = tokenizer("bank banker banking");
for(int i = 0; i < words.length; i++){
Score: 0

Ling pipe provides a number of tokenizers . They 3 can be used for stemming and stop word removal 2 . Its a simple and a effective means of 1 stemming.

Score: 0

Since the PorterStemmer is not public, we 8 ca't call the stem function of PorterStemmer.

Instead 7 we can KStemmer/KStemFilter to stemming 6 the words to its root word.

Below is the 5 scala code snippet which accepts the string 4 and transforms to stemmed string

import org.apache.lucene.analysis.core.WhitespaceTokenizer import 3 org.apache.lucene.analysis.en.KStemFilter

import 2 java.io.StringReader

object Stemmer { def 1 stem(input:String):String={

val stemmed_string = new StringBuilder()

val inputReader = new StringReader(input.toLowerCase)

val whitespaceTokenizer = new WhitespaceTokenizer()

val kStemmedTokenStream = new KStemFilter(whitespaceTokenizer)
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute

val charTermAttribute = kStemmedTokenStream.addAttribute(classOf[CharTermAttribute])

while (kStemmedTokenStream.incrementToken) {
  val term = charTermAttribute.toString
  stemmed_string.append(term+" ")



More Related questions