This repository contains lexicons for the elasticsearch-analysis-lemmagen plugin which provides lemmatizer as elasticsearch token filter.
Lexicons are organized into two directories:
- free - lexicons for 11 languages (
bg,cs,en,et,fr,hu,ro,sk,sl-rozaj,sl,uk) which are distributed under Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) - non-free - lexicons for 5 languages (
fa,mk,pl,ru,sr) which are distributed under Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). Lexicons located innon-freedirectory can't be used commercially.
For more information see the README.markdown in particular directory.