Search-info.properties
From ZekrWiki
This file contains a number of regular expression patterns for enhancing searches. These enhancements range from matching similar characters such as A, À, and Ä to recognizing diacritics and punctuations.
The original and global search-info.properties can be found in <zekr installation folder>/res/config folder and the one for each user (which is copied once from the original location into users' workspace) is in user's workspace under config folder:
<workspace>/config/search-info.properties.
Elements
This file contains replacement patterns and stopwords for different languages, so each element (such as search.replace.pattern may have a .xy extension, resulting in search.replace.pattern.xy for language xy. Here is a short description of its elements:
- search.version: this element specifies the version of this file. If you install a new version of Zekr, your search-info.properties file is updated from <zekr installation folder>/res/config/search-info.properties.
- search.pattern.replace: this element may be more than one pattern. Its value may be of the form <some regex>=value or <someregex1>=value1,<someregex2>=value2. The second one is equal to two search.pattern.replace, but may be specified in one line. When searching for something, Zekr applies these replacement patterns on your search string. It also applies these patterns in filtering sura names in Goto dialog. Applying this patter means replacing value on all occurrences of <someregex>.
- search.pattern.punct: Punctuations used in a language. \\p{Punct} matches punctuations in most of European language. For other languages, user may add more punctuations. For example in the example in this page, these characters are also added for Persian and Arabic: ،؛«»؟. Punctuations are ignored when user searches for whole-word match. For example searching for "Salaamun Alaikum" will also find "Salaamun, Alaikum" (if comma is added as text language punctuation).
- search.pattern.letter: A regular expression matching range of letters in a language. For example, for English, this range would be [a-z] (there is no need to specify other cases, e.g. [a-zA-Z], as search already ignores letter cases).
- search.pattern.diacritics: A regular expression matching diacritics of a language. Diacritics after a letter of this language are ignored by default, so سلام will also finds سَلام.
Example
Here is an example of this file used for Zekr 0.7.6:
# Zekr replace patterns and stop words to be used in search
search.version = 0.7.6
# Generic patterns
search.pattern.replace = [\u00e0-\u00e5\u0100-\u0105]=a,[\u00e8-\u00eb\u0112-\u011b]=e,[\u00ec-\u00ef\u0128-\u0131]=i
search.pattern.replace = [\u00f2-\u00f6\u00f8\u014c-\u0151]=o,[\u00f9-\u00fc\u0168-\u0173]=u
# remove al, etc. for sura names in goto form
search.pattern.replace = a.{1\,3}-=
# default stop words. To be used when no stopword is specified by language.
search.stopword =
# default punctuation marks. To be used when no punctuation is specified by language.
search.pattern.punct = [\\p{Punct}]
############ Persian Language
# YEH shapes
search.pattern.replace.fa = [\u0626\u0649\u064a]=\u06cc
# KAF shapes
search.pattern.replace.fa = [\u06aa-\u06ae\u0643]=\u06a9
# ALEF shapes
search.pattern.replace.fa = [\u0623\u0625]=\u0627
# diacritics
search.pattern.diacr.fa = [\u064b-\u0655\u0670]
# punctuation marks
search.pattern.punct.fa = [\\p{Punct}\u2010-\u2014،؛«»؟\u200c\u200b]
# letter range
search.pattern.letter.fa = [\u0621-\u064a\u0671-\u06d3]
search.stopword.fa = به,از,تا,در,می,که,هر,ها,را,و,با,آن
############ Arabic Language
# TEH shapes
search.pattern.replace.ar = \u0629=\u062a
# ALEF shapes
search.pattern.replace.ar = [\u0622\u0623\u0625\u0671]=\u0627
# YEH shapes
search.pattern.replace.ar = [\u06cc\u06d2\u06e6\u0626\u0649]=\u064a
# WAW shapes
search.pattern.replace.ar = [\u0624\u06e5]=\u0648
# KAF shapes
search.pattern.replace.ar = [\u06aa\u06a9]=\u0643
# remove extra signs (waqf, aya/sajda/hizb marks, etc.)
search.pattern.replace.ar = [\u06d6-\u06ed]=
# remove extra whitespace
#search.pattern.replace.ar = \\s+=\u0020
# diacritics
search.pattern.diacr.ar = [\u064b-\u0653\u0670]
# punctuation marks
search.pattern.punct.ar = [\\p{Punct}\u2010-\u2014،؛«»؟\u200c\u200b]
# letter range
search.pattern.letter.ar = [\u0621-\u064a\u0671-\u06d3]
############ Urdu
# diacritics
search.pattern.diacr.ur = [\u064b-\u0653\u0670]
# punctuation marks
search.pattern.punct.ur = [\\p{Punct}\u2010-\u2014،؛«»؟\u200c\u200b]
# letter range
search.pattern.letter.ur = [\u0621-\u064a\u0671-\u06d3]
############ Bosnian
search.pattern.replace.bs = ć=c,č=c,đ=dj,š=s,ž=z
search.pattern.punct.bs = [\\p{Punct}]
# remove el, al, ... for sura names in goto form
search.pattern.replace.bs = e..?-=,a..?-=
############ English
#search.stopword.en = an,and,are,as,at,be,but,by,for,if,in,into,is,it,no,not,of,on,or,s,such,t,that,the,their,then,there,these,they,this,to,was,will,with
search.stopword.en = a,as,at,be,by,if,in,is,it,no,on,or,s,t,to
# punctuation marks
#search.pattern.replace.en = ['"`<>\,\\.;:\\[\\]{}\\(\\)\\-\\+\\!\\?]=
search.pattern.punct.en = [\\p{Punct}]
# remove al, a. for sura names in goto form
search.pattern.replace.en = a..?-=
############ Turkish
# search.stopword.tr = ve,ile,bir,pek,daha,gibi,bu,şu,o,bunlar,şunlar,onlar,ben,sen,biz,siz,bizler,sizler,onlar,değil,daha
search.stopword.tr = ve,bu,şu,o,ben,sen,biz,siz
# replace accented letters
search.pattern.replace.tr = â=a,û=u,î=i,Î=İ
# remove el, al, ... for sura names in goto form
search.pattern.replace.tr = e..?-=,a..?-=
# punctuation marks
search.pattern.punct.tr = [\\p{Punct}\u2018\u2019\u201C\u201D\u2025\u0226]
############ Russian
search.pattern.replace.ru = а.{2}-=

