The Indexer

Indexing is not a very fast process, and the speed may range from 30Kb to 250Kb per second, depending on index size and computer power. The indexer shouldn't be started too often, and frequency of starts depends on web-site update frequency. For static sites one execution of the indexer will do.

During indexing process three files are created:

The indexer can also create statistics file - stats.log, which can be processed right after having the server indexed to store information in database.

Two indexing modes a available:


Starting Indexer

To start the Indexer it is necessary to run searchctl(.exe) with the following options:

Example

WindowsUnix,Linux

C:\searchctl.exe localhost
или
C:\searchctl.exe --config=D:\www\ disk

./searchctl name_of_task
or
./searchctl.exe --config=/home/www/ disk

Working with 'search.conf' file

All indexer settings are stored in 'search.conf' file. The file has the following structure:


[Job name_of_task]
[Action1]
Parameter1	Value1
Parameter2	Value2
Parameter3	Value3
[Action2]
Parameter1	Value1
Parameter2	Value2
Parameter3	Value3

There should be no empty lines and comments in the configuration file.


Action 'Index'

Action Index - index site. This action starts indexing system. At least one parameter should be specified in HTTP indexing mode and at least two - in local drive indexing mode.

More about parameters:


URL <url>

URL	url

Address starting with 'http://...' in HTTP-mode, or local path in local drive mode.

Example:

For HTTP:
URL	http://www.novgorod.ru/frisbee/

For disk (Windows): 
URL	c:/pub/home/frisbee/

For disk (Unix): 
URL	/pub/home/frisbee/

Extensions <ext>

Extensions ext1,ext2,ext3

Sets a list of extensions of files to be indexed. Can be used in local drive mode only, and is ignored in HTTP indexing mode. Extensions are separated by "," (comma).

Example:

Extensions htm,html,shtml,shtm

Path <path>

Path path

Spesifies working directory. Index files and a log-file are saved to this directory.

Example:

Path c:\www\novgorod
or
Path /home/www/novgorod

CharSet <cset>

CharSet cset

Sets the way character coding of the files to be indexed will be identified. The values may be:

Example:

CharSet ByHTTPHeader

MaxFiles <num>

MaxFiles num

Sets maximum number of files to be indexed, 10000 by default. Be careful when selecting value, because many servers contain huge numbers of links, for example http://news.novgorod.ru/

Example:

MaxFiles 50

Statistic <stat>

Statistic stat

Sets the way reports are saved. Reports are generated at the end of action Index and are saved to file stats.log. Available options:

Statistics are saved to file stats.log.

Example:

Statistic Append

Exclude <excl>

Exclude excl1,excl2,excl3

Sets a list of words to be excluded. Addresses containing at least one of excluded words are not included in indexing queue. Words are separated by "," (comma)

Example:

Exclude editpost.php?,reply.php?,admin/

AddOption <opt>

AddOption opt

Sets indexing method. Can be used in HTTP indexing mode only. The following values are available:

Example:

AddOption SubPages

Language <lng>

Sets language. If this parameter is specified a field 'Accept-Language' is included in HTTP header. This variable may effect document content on some sites.

Example:

Language ru

AFrom <path>

AFrom path
Sets substring which will be replaced in URL by string specified in parameter ATo.

Example:

AFrom  /home/dir/mysite/
ATo    http://search.codenet.ru/

ATo <url>

ATo url
Sets substring which will replace AFrom in URL. Used together with AFrom.

Example:

AFrom http://127.0.0.1/
ATo   http://www.codenet.ru/

or

AFrom c:/documents/www/www.codenet.ru/
ATo   http://www.codenet.ru/

StartWord <word>

StartWord word

Sets starting word. Page description will be composed of words following the starting one. Hence, it is possible to exclude menus and the like from description. The starting word is obligatory.

Example:

StartWord about

MetaDescription <yesno>

MetaDescription yesno

Sets page description method. Description can be displayed in search results with help of the special symbol %E. Available values are "Yes" or "No". Default is 'No'. If 'Yes' is used, the system attempts to get description from '<META name="description...' tag. If tag can not be found or the value is 'No', description is composed of the first words in the document (see. startword)

Example:

MetaDescription Yes

MetaRobots <yesno>

MetaRobots yesno

If the parameter has value "No", the tag '<META name="robots"...' is ignored, otherwise the tag is analysed for presence of NOINDEX, NOFOLLOW, NONE. More details can be found in section Use of "Robots" META-tags. Default value is "Yes"

Example:

MetaRobots No

UseRobotsTxt <yesno>

UseRobotsTxt <yesno>

If set to "Yes", indexing rules are taken from file 'robots.txt', stored in web-server root directory. Default value is "No". More information about working with 'robots.txt' is available in section robots.txt - Exclusions Standard for Robots. Robot's name is "CNSearch".

Example:

UseRobotsTxt yes

Working through proxy-server

Starting with version 0.91 an option of working through proxy-server became available. 4 new directives were added ProxyServer, ProxyPort, ProxyLogin, and ProxyPassword


ProxyServer <serv>

ProxyServer server

Specifies proxy-server. The indexer connects directly by default. Works with ProxyPort.

Example:

ProxyServer proxy.domain.ru

ProxyPort <port>

ProxyPort port

Sets proxy port. Works with ProxyServer.

Example:

ProxyPort 8080

ProxyLogin <login>

ProxyLogin login

Sets proxy login. Used only in case the proxy server requires authorization. Works with ProxyPassword.

Example:

ProxyLogin alex

ProxyPassword <password>

ProxyPassword password

Sets proxy password. Used only in case the proxy server requires authorization. Works with ProxyLogin.

Example:

ProxyPassword qwerty

[Runner] - Start External Application.

Runner is used to execute external applications. An external application can process a log-file and store its contents in database or copy index files and so on.


Filename <file>

Filename file

Sets name of the file to execute.

Example:

Filename /home/alex/parser.pl

Params <prm>

Sets command line parameters for Filename.

Example:

Params --user=root --password=jfiekf
Up

Back | Contents | Proceed