Adding Languages to Solr

After installing of Solr and setting up its instances it will be possible to execute indexing and BI search by values in Russian and English. If required, the list of languages can be extended. To do this, add a new instance of Solr and improve configuration files:

To add a new instance of Solr for a new language, execute the following operations:

  1. Create a copy of the sourceDataSchema_en.xml configuration file in the Conf folder.

  2. Rename the copied file by changing language postfix. For example, for German - sourceDataSchema_de.xml.

  3. Open the page with the list of language analyzers supported in Solr in the browser: http://wiki.apache.org/solr/LanguageAnalysis. Find configuration for the required language, for example, German:

    <filter class="solr.SnowballPorterFilterFactory" language="German2" />
  4. Open the copied file sourceDataSchema_de.xml for edit and find the strings:

    <filter class="solr.KStemFilterFactory"/>:
<...>
<fieldType name="name_searcher" class="solr.TextField">

<analyzer type="index">

<tokenizer class="solr.WhitespaceTokenizerFactory"/>

<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>

<filter class="solr.LowerCaseFilterFactory"/>

<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

<filter class="solr.KStemFilterFactory"/>

</analyzer>

<analyzer type="query">

<tokenizer class="solr.WhitespaceTokenizerFactory"/>

<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>

<filter class="solr.LowerCaseFilterFactory"/>

<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

<filter class="solr.KStemFilterFactory"/>

</analyzer>

</fieldType>
<...>
  1. Replace all the found strings with the configuration strings of the required language:

<...>
<fieldType name="name_searcher" class="solr.TextField">

<analyzer type="index">

<tokenizer class="solr.WhitespaceTokenizerFactory"/>

<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>

<filter class="solr.LowerCaseFilterFactory"/>

<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

<filter class="solr.SnowballPorterFilterFactory" language="German2" />

</analyzer>

<analyzer type="query">

<tokenizer class="solr.WhitespaceTokenizerFactory"/>

<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>

<filter class="solr.LowerCaseFilterFactory"/>

<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

<filter class="solr.SnowballPorterFilterFactory" language="German2" />

</analyzer>

</fieldType>
<...>
  1. Save the file.

Configuration file for the added language is ready. To create the instance of Solr for this language, open the solr.xml file foe edit in the solr-4.4.0\solr\app folder. Copy the strings responsible for the current instance of Solr, insert it and correct for the added language, for example:

<core schema="sourceDataSchema_de.xml" instanceDir="BISearch_SourceData\" name="SourceData_de1" config="sourceData_solrconfig.xml" dataDir="indexData/SourceData_de1"/>

Specify name of the created configuration file in the schema attribute, specify name of the Solr instance for the added language in the name attribute, specify the folder, in which index files of the added language are stored, in the dataDir attribute. List of all attributes and their purpose are specified in the Item 5 of the Solr Instances Setup subsection.

Restart Tomcat, if everything is correct, the manager web page will look as follows:

To use added languages, you also should set up the desktop application or BI server.

See also:

Installing and Setting Up Software