Koozali.org: home of the SME Server

Case insensitive web URLs in apache webserver -- samba shares

Offline n0lqu

  • **
  • 31
  • +0/-0
There was an old thread discussing this (http://forums.contribs.org/index.php?topic=6457.0) and I was wanted to bring it up again to come up with the best solution for us and others in the same boat and see if there were any other techniques to recommend and/or suggestions on how to implement them.  The problem is we've moving a large website from a webserver on an OS that was not case sensitive, and have large quantities of URL links that don't match the case of the actual html files, so when we move to the Apache web server in SME, a lot of links will break.

Possible solutions:

1. Fix all the links.  Obviously best in the long run, but impractical at the moment.

2. Activate the mod_speling.so module as described in the link above.  Although it's main feature is one-character spelling correction, a feature we don't necessarily want, especially as it displays a list to the end user in the event multiple files are matched, it also provides case insensitivity.  Looks like starting around Apache 2.1/2.2, there is a "CheckCaseOnly on" directive which could turn off the "speling" feature and leave only the case feature active, but I think SME 7.3 is running Apache 2.0.  A side effect is this could possibly help us do #1 by detecting those URL's that are bad, assuming it logs those cases where it got activated.

3. Access the web pages as a SAMBA share.  Since SAMBA files are case-insensitive, a request from Apache for a file would succeed regardless of the case.

4. Any other ideas?

I think I can handle #2 from the other thread, so I'm especially interested in the mechanics of how we could do #3.  If I were put the html files in an ibay 'files' directory rather that the 'html' directory of an ibay, what would be the best way within SME to get Apache to see them via a SAMBA share on the same machine?

Thanks!

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: Case insensitive web URLs in apache webserver -- samba shares
« Reply #1 on: May 21, 2008, 09:42:25 PM »
There was an old thread discussing this (http://forums.contribs.org/index.php?topic=6457.0) and I was wanted to bring it up again to come up with the best solution for us and others in the same boat and see if there were any other techniques to recommend and/or suggestions on how to implement them.  The problem is we've moving a large website from a webserver on an OS that was not case sensitive, and have large quantities of URL links that don't match the case of the actual html files, so when we move to the Apache web server in SME, a lot of links will break.
In the end I think you are best off by converting all links indeed, but if you are linking to files (only), which is not completely clear to me from this conversation you might be helped by converting all files to lower (or upper if you really prefer) in the tree below the current directory level:

Code: [Select]
find ./ -type f -exec rename 'y/A-Z/a-z/' {} \;
and then setup a RewriteMap in the httpd.conf file using the template system something like this:

Code: [Select]
RewriteEngine On
RewriteMap lc int:tolower
RewriteRule (.*) ${lc:$1} [R]

The first line might not be necessary as the rewrite engine is already turned on most likely. The others will convert all URLs to lowercase, but it will clearly have an impact on your CPU usage if you have a lot of page requests, therefore I think you should try and fix stuff at the root indeed by converting case.

Be sure to have a backup as I have not tested the instructions above, no warranties implied.
« Last Edit: May 21, 2008, 09:45:34 PM by cactus »
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline n0lqu

  • **
  • 31
  • +0/-0
Re: Case insensitive web URLs in apache webserver -- samba shares
« Reply #2 on: May 21, 2008, 11:53:40 PM »
Thanks for a #4 solution; it looks another good and simple option for people in my boat.  My biggest concern is that it would disallow the use of uppercase ever in the future.  i.e. I rename all the currently existing files to be lowercase, then someone comes along and creates a new html file with a mixed case filename, tries to access it via the URL and finds the page doesn't come up, even though the URL looks exactly right.

I'm not sure what you mean by "but if you are linking to files (only)" -- what else might I be linking to?  I'm talking about Apache looking at a website that's a bunch of files and directories in an ibay's 'html' directory.  Or are you thinking of directories?  Yes, they are also likely to be mixed-case and inconsistent as well.  Some directories and files are all UPPERCASE, from back in the MS-DOS days of uppercase 8.3 type filename limitations.

The other concern over fixing all our internal links to be case-correct is that doesn't address possibly case-incorrect external links -- people's bookmarks, search engines, etc. that we have no control over.  This is a website that has been in existence since the stone ages.  :-)  Your suggestion, as well as options 2-3, all of which do basically make the site case-insensitive to the world, would allow those external links to continue working, whereas the "correct" method of fixing our website to conform to a case sensitive web server would not.  So while I'd like to fix our links to be correct to clean things up, I think ultimately I would like it to be case-forgiving forever.

So far, the #3 samba solution looks the closest to allowing our website to be case-agnostic without complications like trying to be flexible with spelling (which we don't really want to do), presenting the end user with weird "we found multiple matches, pick one" pages, or forcing our web designers to use lower-case filenames.  I just don't know the simplest/best way to get this set up within the framework of the SME Server.

[As an aside: I don't see much point in a file system or a website being case-sensitive, actually.  Humans aren't, really, and filenames/URL's are mostly for the benefit of humans (otherwise we'd be accessing files by their sector numbers, ip addresses, and things like that).  Would be really nice if a person could set a parameter in Linux (and any other case-sensitive OS or FS) and just make it case insensitive, even nicer if that were the default.  But I don't want this thread to turn into a case-sensitive vs case-ignoring debate, so I'll shut up about it.]

Offline cactus

  • *
  • 4,880
  • +3/-0
    • http://www.snetram.nl
Re: Case insensitive web URLs in apache webserver -- samba shares
« Reply #3 on: May 22, 2008, 07:37:22 AM »
Thanks for a #4 solution; it looks another good and simple option for people in my boat.  My biggest concern is that it would disallow the use of uppercase ever in the future.  i.e. I rename all the currently existing files to be lowercase, then someone comes along and creates a new html file with a mixed case filename, tries to access it via the URL and finds the page doesn't come up, even though the URL looks exactly right.
There will not be many mixed case urls, generally accepted and therefore more or less default, is all lowercase, and I do not see the need to create mixed case urls. You should educate your users to use all lowercase filenames and all lowercase urls, since the web is mostly lowercase from the Netscape days.

I'm not sure what you mean by "but if you are linking to files (only)" -- what else might I be linking to?  I'm talking about Apache looking at a website that's a bunch of files and directories in an ibay's 'html' directory.  Or are you thinking of directories?
You could also make use of dynamically generated content, using POST and GET methods which means our web pages are accepting variables, the case of these get parameters should be handled by your scripts and mixed case parameters are possible, as this should be handled by the website scripts and not by the webserver. If you do not use these methods you do not need to bother about this.

Yes, they are also likely to be mixed-case and inconsistent as well.  Some directories and files are all UPPERCASE, from back in the MS-DOS days of uppercase 8.3 type filename limitations.
I do not know that DOS ever used only uppercase, but you can convert directories to lowercase as well. I do not come across much mixed or uppercase urls...

As an aside: I don't see much point in a file system or a website being case-sensitive, actually.  Humans aren't, really, and filenames/URL's are mostly for the benefit of humans (otherwise we'd be accessing files by their sector numbers, ip addresses, and things like that).  Would be really nice if a person could set a parameter in Linux (and any other case-sensitive OS or FS) and just make it case insensitive, even nicer if that were the default.  But I don't want this thread to turn into a case-sensitive vs case-ignoring debate, so I'll shut up about it.
I think you are taking things much to lightly here. A file in uppercase/mixed-case is most likely not the same as a file in all lowercase, they could very well have a different content. The fact that M$ systems assume that there is no case sensitivity and match them to the same should not mean any OS should do so. The underlying OS of your webserver is case-sensitive, so the webserver should be as well, how would it otherwise now if it should present INDEX.HTM or index.htm or any otherwise spelled index.htm file when a user is requesting iNdEx.Htm?
Be careful whose advice you buy, but be patient with those who supply it. Advice is a form of nostalgia, dispensing it is a way of fishing the past from the disposal, wiping it off, painting over the ugly parts and recycling it for more than its worth ~ Baz Luhrmann - Everybody's Free (To Wear Sunscreen)

Offline n0lqu

  • **
  • 31
  • +0/-0
Re: Case insensitive web URLs in apache webserver -- samba shares
« Reply #4 on: September 10, 2009, 06:46:21 PM »
I did succeed in making our main web server case insensitive; let's see if I can remember what I did for those who might be trying to do the same thing (leaving the philosophical issues to another place and time).  Your mileage may vary.

First, I created a file /etc/auto.smb.servername and put the following into it:
Code: [Select]
username=someuser
password=userpassword
Give it a valid user and password on your system that has read access to the ibay you're using, and set the permissions on the file to protect it from snoops (chmod 0x600 /etc/auto.smb.servername)

Next, I added the following line in /etc/fstabs (replace servername with your server's name, and ibayname with the name of your ibay):
Code: [Select]
//servername/ibayname/files  /home/e-smith/files/ibays/ibayname/html  cifs  credentials=/etc/auto.smb.servername 0 0
What this will do is take the /html directory (it should be empty) in your ibay, and map it to the /files directory in the same ibay.  Once you mount this (mount //servername/ibayname/files), the /files and /html will seem to show the same files, except that the /html one, which is what the web server uses, will be case insensitive.  Your users will modify files in /files, and if they are accessing it as a network drive via samba it'll also be case insensitive, minimizing the possibility of having two files with the same name but different cases.

Some caveats.  On our system, the mounted /html folder is read-only.  This is probably a good thing security-wise; makes it less likely some piece of malicious or buggy web code might try to change the web site.  If you have scripts that do need to change stuff, they will need to know to look in /files instead of /html.  Users too, for that matter.  Also, in my example, we kept everything within one ibay.  It might be less confusing to your users to have a non-world ibay they access, and a second ibay which is the one open to the world and the users don't need to know about, and then map the /files from the user ibay to the /html of the world ibay.

Offline kruhm

  • *
  • 680
  • +0/-0
Re: Case insensitive web URLs in apache webserver -- samba shares
« Reply #5 on: September 13, 2009, 10:45:50 PM »
Quote
The problem is we've moving a large website from a webserver on an OS that was not case sensitive, and have large quantities of URL links that don't match the case of the actual html files, so when we move to the Apache web server in SME, a lot of links will break.

Possibly just sed the files:

Code: [Select]
cd /to/your/ibay/html/
sed -i s/OldLinkFormat.com/newlinkformat.com/g *

Careful, make a backup and test.