Koozali.org: home of the SME Server
Obsolete Releases => SME 7.x Contribs => Topic started by: holck on September 19, 2009, 10:57:24 PM
-
I have a number of PDF-documents on my server and wanted my users to be able to search the contents of these documents. So I installed ksearch from http://www.kscripts.com/, and after some adjustments it seems to work fine. So two questions:
- Should I (and others) try to make this into a contrib?
- Do you know of other similar (or better) solutions?
-
Thanks, start with a howto
this is easier to refine than a contrib, and makes creating a contrib later easier
-
This is my first attempt at a HowTo so please bear with me and help improve it ...
I needed a document search facility for my users, essentially to make them able to search through various notes, memos etc. available on the web server. I found a usable script at www.kscripts.com, and have adjusted it a bit to make it more feasible for the SME-server, so I have produced a new file package you can get here: http://ibsgaardenprivat.dk/ksearch1.5b.tgz (http://ibsgaardenprivat.dk/ksearch1.5b.tgz)
Here is a copy of my new README, part of the file package:
== GENERAL INSTALLATION INSTRUCTIONS: ==
You will need a text editor, and access to your server to edit and run scripts. See faqs.html for details.
The contents of the directory "search" will be copied to a newly created directory on the web server "/opt/ksearch".
- $sudo yum install xpdf (if you want to index PDF files)
- Open search_form.html
- In line 14 change "../index.html" to the URL to the web page you want the user to return to, after searching
- In line 19 change "/ksearch/ksearch.cgi" to the URL to the script ksearch.cgi
- Open search_tips.html
- In line 18 change "../index.html" to the URL to the web page you want the user to return to, after searching
- Open configuration/configuration.pl, necessary changes:
- Line 13: $INDEXER_START is the path to the directory in which files will be searched, including sub-directories. The directory may be the ibay's html directory or any sub-directory of this. All files in this directory must of course be accessible from WWW.
- Line 17: $BASE_URL is the URL pointing to the directory in line 13
- Line 20: $SEARCH_URL is the absolute URL to ksearch.cgi
- Line 23: $KSEARCH_DIR is the file path to the ksearch directory
- Line 26: $KSEARCH_URL is the URL to the ksearch directory
- Line 31: If you want to restrict access to indexer.cgi (and hence ability to initiate the indexing process) to certain domains, set @VALID_REFERERS to a list of acceptable domains. NOTE: There is a difference between http://www.mydomain.com and http://mydomain.com. An empty list means that all domains are accepted.
- Line 32: $INDEXER_URL is the absolute URL to indexer.cgi
- Line 33: $PASSWORD is a self-chosen password required to access indexer.cgi
- Line 72: $LOG_SEARCH is the path to search_log.txt, used for logging searches
- All other configuration.pl changes are optional. If you don't know what they are, then don't change them.
- Ignore Files and Folders: ignore_files.txt.
Add the full path of files/folders you do NOT want to index to the ignore files list, on separate lines. =NOTE=: After indexing, you may discover files/folders you don't want to include in your search engine. You may later come back and add files/folders -- however, you'll need to re-index your website using indexer.cgi
- Stop Terms: stop_terms.txt
Add terms you want to IGNORE to the search engine stop terms list, on separate lines. =NOTE=: After indexing, you may discover terms you don't want to include in your search engine. You may later come back and add terms to the file -- however, you'll need to re-index your website using indexer.cgi
- Copy the contents of the directory "search" to /opt/ksearch:
$sudo mkdir /opt/ksearch
$sudo cp -R search/* /opt/ksearch/
The 5 files not included in directory "search" (CHANGELOG.txt, GNU.txt, HISTORY.txt, README.txt, and FAQs.html) are for personal reference, troubleshooting, and future use, and need not be copied.
- Change the ownership of all copied files to www.www:
$sudo chown -R www.www /opt/ksearch
Using the chmod command, set permissions for each copied file and directory as follows
$sudo chmod 755 /opt/ksearch/*.cgi /opt/ksearch/indexer.pl
$sudo chmod 744 /opt/ksearch/configuration/*
$sudo chmod 755 /opt/ksearch/ks_images
$sudo chmod 644 /opt/ksearch/ks_images/*
$sudo chmod 644 /opt/ksearch/*html
$sudo chmod 644 /opt/ksearch/templates/*
- Make an addition to httpd.conf by creating the file
/etc/e-smith/templates-custom/etc/httpd/conf/httpd.conf/98Ksearch
With the following contents:
Alias /ksearch /opt/ksearch
<Directory /opt/ksearch >
Options +ExecCGI
order deny,allow
deny from all
allow from { "$localAccess $externalSSLAccess"; }
</Directory>
Expand the template:
$sudo /sbin/e-smith/expand-template /etc/httpd/conf/httpd.conf
Restart httpd:
$sudo /etc/init.d/httpd-e-smith restart
- Run the INDEXER:
Open your browser and run the indexer script, e.g.: http://www.MyWebsite.com/ksearch/indexer.cgi
The time required will depend on the size of your site and your server's CPU.
=NOTE=: You need to use the same URL path as specified in configuration.pl line 28, @VALID_REFERERS.
- Test it out:
Open the search_form.html (e.g. http://www.MyWebsite.com/ksearch/search_form.html)
Run a search. Questions or problems, FIRST read the enclosed FAQs.html file.
- As an alternative to doing indexing via a browser and the indexer.cgi script, you may do indexing from a command line with indexer.pl. For this to work, you will probably need to change the line in indexer.pl, starting with "my $configuration_file" to make sure it points to the correct configuration file.
-
Please put your howto in the wiki, in the category howto's the forums as documentation finds a better place there. Thanks in advance.
-
I think that editing directly files is not a good idea because:
- each time you upgrade you will loose your wiork
- each time you upgrade maybe the files will be different so editing them will be difficult
my 2c
-
I have tried to add this as a HowTo: http://wiki.contribs.org/Document_search (http://wiki.contribs.org/Document_search)
-
I have tried to add this as a HowTo: http://wiki.contribs.org/Document_search (http://wiki.contribs.org/Document_search)
Thanks very much, some quick advises as I really need to be doing something else right now:
Use preformatted text (indent with a space) for command instructions as well as the content of files. Please do not use all caps in headers, let the wiki formatting do it's work on the headers, oh and while we are on the topic of headers please do not use second (==) level, but start with third (===).
Traditionally user commands (even the original author's when not immediately relevant to the instructions) are placed on the discussion/talk pages.
Thanks for your work so far.
-
Thanks for the comments and suggestions.
I agree with Stefano that in general it is not a good idea to make your own changes to others' code, for the reasons he mentions, and I will contact ksearch and ask them to include my changes. But there were errors in source code and tar-archive, and some of my changes made the code and instructions more convenient for the SME-server.
I will follow Cactus' recommendations, but can't figure out how to use pre-formatted text in lists, and I don't know what is meant by "user commands"?
-
I agree with Stefano that in general it is not a good idea to make your own changes to others' code, for the reasons he mentions, and I will contact ksearch and ask them to include my changes. But there were errors in source code and tar-archive, and some of my changes made the code and instructions more convenient for the SME-server.
I would ask them to set an inclusion file where to store all the variables.. in such way we can easily generate (via a template) it and the integration with SME would be easyer
ciao
-
I would ask them to set an inclusion file where to store all the variables.. in such way we can easily generate (via a template) it and the integration with SME would be easyer
ciao
I have not taken a look at the installation routines very closely, but if ksearch provides and RPM, we can write a howto how to make a template for them which we can use if someone is ever to create a (integrational) smeserver-ksearch rpm.