Koozali.org: home of the SME Server

Document search facility?

Offline holck

  • *
  • 322
  • +1/-0
Document search facility?
« on: September 19, 2009, 10:57:24 PM »
I have a number of PDF-documents on my server and wanted my users to be able to search the contents of these documents. So I installed ksearch from http://www.kscripts.com/, and after some adjustments it seems to work fine. So two questions:
  • Should I (and others) try to make this into a contrib?
  • Do you know of other similar (or better) solutions?
......

Offline stephen noble

  • *
  • 607
  • +1/-0
    • Dungog
Re: Document search facility?
« Reply #1 on: September 20, 2009, 02:23:54 AM »
Thanks, start with a howto
this is easier to refine than a contrib, and makes creating a contrib later easier

Offline holck

  • *
  • 322
  • +1/-0
Document search facility: How To
« Reply #2 on: September 30, 2009, 05:23:52 PM »
This is my first attempt at a HowTo so please bear with me and help improve it ...

I needed a document search facility for my users, essentially to make them able to search through various notes, memos etc. available on the web server. I found a usable script at www.kscripts.com, and have adjusted it a bit to make it more feasible for the SME-server, so I have produced a new file package you can get here: http://ibsgaardenprivat.dk/ksearch1.5b.tgz

Here is a copy of my new README, part of the file package:


== GENERAL INSTALLATION INSTRUCTIONS: ==

You will need a text editor, and access to your server to edit and run scripts. See faqs.html for details.

The contents of the directory "search" will be copied to a newly created directory on the web server "/opt/ksearch".

  • $sudo yum install xpdf (if you want to index PDF files)
  • Open search_form.html
    • In line 14 change "../index.html" to the URL to the web page you want the user to return to, after searching
    • In line 19 change "/ksearch/ksearch.cgi" to the URL to the script ksearch.cgi
  • Open search_tips.html
    • In line 18 change "../index.html" to the URL to the web page you want the user to return to, after searching
  • Open configuration/configuration.pl, necessary changes:
    • Line 13: $INDEXER_START is the path to the directory in which files will be searched, including sub-directories. The directory may be the ibay's html directory or any sub-directory of this. All files in this directory must of course be accessible from WWW.
    • Line 17: $BASE_URL is the URL pointing to the directory in line 13
    • Line 20: $SEARCH_URL is the absolute URL to ksearch.cgi
    • Line 23: $KSEARCH_DIR is the file path to the ksearch directory
    • Line 26: $KSEARCH_URL is the URL to the ksearch directory
    • Line 31: If you want to restrict access to indexer.cgi (and hence ability to initiate the indexing process) to certain domains, set @VALID_REFERERS to a list of acceptable domains. NOTE: There is a difference between http://www.mydomain.com and http://mydomain.com. An empty list means that all domains are accepted.
    • Line 32: $INDEXER_URL is the absolute URL to indexer.cgi
    • Line 33: $PASSWORD is a self-chosen password required to access indexer.cgi
    • Line 72: $LOG_SEARCH is the path to search_log.txt, used for logging searches
    • All other configuration.pl changes are optional. If you don't know what they are, then don't change them.
  • Ignore Files and Folders: ignore_files.txt.
    Add the full path of files/folders you do NOT want to index to the ignore files list, on separate lines. =NOTE=: After indexing, you may discover files/folders you don't want to include in your search engine.  You may later come back and add files/folders -- however, you'll need to  re-index your website using indexer.cgi
  • Stop Terms:  stop_terms.txt
    Add terms you want to IGNORE to the search engine stop terms list, on separate lines. =NOTE=: After indexing, you may discover terms you don't want to include in your search engine.  You may later come back and add terms to the file -- however, you'll need to re-index your website using indexer.cgi
  • Copy the contents of the directory "search" to /opt/ksearch:
                   $sudo mkdir /opt/ksearch
                    $sudo cp -R search/* /opt/ksearch/

    The 5 files not included in directory "search" (CHANGELOG.txt, GNU.txt, HISTORY.txt, README.txt, and FAQs.html) are for personal reference, troubleshooting, and future use, and need not be copied.
  • Change the ownership of all copied files to www.www:
                    $sudo chown -R www.www /opt/ksearch

    Using the chmod command, set permissions for each copied file and directory as follows
                    $sudo chmod 755 /opt/ksearch/*.cgi /opt/ksearch/indexer.pl
                    $sudo chmod 744 /opt/ksearch/configuration/*
                    $sudo chmod 755 /opt/ksearch/ks_images
                    $sudo chmod 644 /opt/ksearch/ks_images/*
                    $sudo chmod 644 /opt/ksearch/*html
                    $sudo chmod 644 /opt/ksearch/templates/*

  • Make an addition to httpd.conf by creating the file
                    /etc/e-smith/templates-custom/etc/httpd/conf/httpd.conf/98Ksearch
    With the following contents:
                    Alias /ksearch /opt/ksearch
                    <Directory /opt/ksearch >
                            Options +ExecCGI
                            order deny,allow
                            deny from all
                            allow from { "$localAccess $externalSSLAccess"; }
                    </Directory>


    Expand the template:
                    $sudo /sbin/e-smith/expand-template /etc/httpd/conf/httpd.conf

    Restart httpd:
                    $sudo /etc/init.d/httpd-e-smith restart
                   
  • Run the INDEXER:
                    Open your browser and run the indexer script, e.g.: http://www.MyWebsite.com/ksearch/indexer.cgi
                    The time required will depend on the size of your site and your server's CPU.
                            =NOTE=: You need to use the same URL path as specified in configuration.pl line 28, @VALID_REFERERS.
  • Test it out:
                    Open the search_form.html (e.g. http://www.MyWebsite.com/ksearch/search_form.html)
                    Run a search.  Questions or problems, FIRST read the enclosed FAQs.html file.
  • As an alternative to doing indexing via a browser and the indexer.cgi script, you may do indexing from a command line with indexer.pl. For this to work, you will probably need to change the line in indexer.pl, starting with "my $configuration_file" to make sure it points to the correct configuration file.
« Last Edit: September 30, 2009, 05:31:59 PM by holck »
......

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: Document search facility?
« Reply #3 on: September 30, 2009, 05:54:24 PM »
Please put your howto in the wiki, in the category howto's the forums as documentation finds a better place there. Thanks in advance.
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: Document search facility?
« Reply #4 on: September 30, 2009, 06:16:29 PM »
I think that editing directly files is not a good idea because:
- each time you upgrade you will loose your wiork
- each time you upgrade maybe the files will be different so editing them will be difficult

my 2c

Offline holck

  • *
  • 322
  • +1/-0
Re: Document search facility?
« Reply #5 on: September 30, 2009, 10:49:14 PM »
I have tried to add this as a HowTo: http://wiki.contribs.org/Document_search
......

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: Document search facility?
« Reply #6 on: September 30, 2009, 11:08:42 PM »
I have tried to add this as a HowTo: http://wiki.contribs.org/Document_search
Thanks very much, some quick advises as I really need to be doing something else right now:

Use preformatted text (indent with a space) for command instructions as well as the content of files. Please do not use all caps in headers, let the wiki formatting do it's work on the headers, oh and while we are on the topic of headers please do not use second (==) level, but start with third (===).

Traditionally user commands (even the original author's when not immediately relevant to the instructions) are placed on the discussion/talk pages.

Thanks for your work so far.
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline holck

  • *
  • 322
  • +1/-0
Re: Document search facility?
« Reply #7 on: October 01, 2009, 08:52:22 AM »
Thanks for the comments and suggestions.

I agree with Stefano that in general it is not a good idea to make your own changes to others' code, for the reasons he mentions, and I will contact ksearch and ask them to include my changes. But there were errors in source code and tar-archive, and some of my changes made the code and instructions more convenient for the SME-server.

I will follow Cactus' recommendations, but can't figure out how to use pre-formatted text in lists, and I don't know what is meant by "user commands"?
......

Offline Stefano

  • *
  • 10,894
  • +3/-0
Re: Document search facility?
« Reply #8 on: October 01, 2009, 09:00:13 AM »
I agree with Stefano that in general it is not a good idea to make your own changes to others' code, for the reasons he mentions, and I will contact ksearch and ask them to include my changes. But there were errors in source code and tar-archive, and some of my changes made the code and instructions more convenient for the SME-server.

I would ask them to set an inclusion file where to store all the variables.. in such way we can easily generate (via a template) it and the integration with SME would be easyer

ciao

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: Document search facility?
« Reply #9 on: October 01, 2009, 11:44:29 AM »
I would ask them to set an inclusion file where to store all the variables.. in such way we can easily generate (via a template) it and the integration with SME would be easyer

ciao
I have not taken a look at the installation routines very closely, but if ksearch provides and RPM, we can write a howto how to make a template for them which we can use if someone is ever to create a (integrational) smeserver-ksearch rpm.
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)