Uploaded image for project: 'JSword'
  1. JSword
  2. JS-251 Fix all Language problems
  3. JS-195

Conflict between translations of iso639.properties and language names used in ConfigurableSnowballAnalyzer

    Details

      Description

      After applying fix JS-192 (iso639full.properties was always used and iso639.properties always ignored)
      I now find that JS-189 (SnowballAnalyzer configured for unavailable stemmer Spanish (Español)) is occurring again.

      Reason
      The reason appears to be that iso639full.properties contains
      es=Spanish

      But iso639_en.properties contains
      es=Spanish (Espa\u00F1ol)

      Also iso639.properties contains
      es=Espa\u00F1ol
      (There are also a lot of other differences e.g. French, German, ..)

      ConfigurableSnowballAnalyzer contains a list of language stemmers that only match the language names in iso369full.properties and no other iso* file:
      private static Pattern allowedStemmers = Pattern.compile("(Danish|Dutch|English|Finnish|French|German2|German|Italian|Kp|Lovins|Norwegian|Porter|Portuguese|Russian|Spanish|Swedish)");
      which only matches the country names in iso369full.properties.

      The fix looks non-trivial; I tried using the language code instead of the name but got the error:
      java.lang.ClassNotFoundException: org.tartarus.snowball.ext.esStemmer

      I am going to roll back the fix for JS-192 until DM has a chance to look at this.

        Attachments

          Activity

            People

            • Assignee:
              dmsmith DM Smith
              Reporter:
              mjdenham Martin Denham
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: