Rectangle 27 0

apache Index.php as custom error page?


@Zarathos: If a crawler follows a link to a file that doesn't exist, it will just assume that it doesn't exist. If later the file does appear (exist), the crawler will eventually follow that link again and determine that the page exists now and index it. You seem to be confused how the index.php file is working though. The index.php is only the file being used to print the page and not the actual page the bot is looking at. As long as you send a HTTP status code when your index.php processes an error page, you should be fine.

Good note about resending status codes.

Remember, if you use a PHP file as your error document, you need to resend the HTTP status code using the header function inside PHP to ensure proper page detection, like so:

So, for example, if I print a 403 error on my page, I also need to send 403 header right? For what concerns crawlers and temporary errors it's ok, I understand... but does it also work if user enters, for example, "/non-existant.gif" and gets redirected to "/index.php?error=400"? What would happen? Crawler hits index.php and receives a 404 status code... isn't that bad for indexing? And what kind of "robots" meta tag should I use in both cases?

The only really safe way to display a custom page for a 500 status code is to use plain text or use a basic .html or .shtml file that doesn't try to access other things on your server, so you don't keep triggering more 500 status codes in the page load.

Usually if a crawler encounters a 500, it will just ignore the page temporarily. A 500 code is recoverable, it doesn't necessarily mean there is no page there, just that the server is messed up at the moment. The bots are smart and can determine what error codes mean what, as long as the page is always sending the status code in the page header.

Usually if there is a 500 status code then Apache has messed something up and it can't run your index.php file, resulting in another 500 status code. Apache continues this error loop for a few iterations before it finally says "no more loops" and sending its own error page.

When I print my homepage (so I'm not on /index.php because an ErrorDocument directive called it), I would like the crawlers to navigate every link and index it, so I put "index, follow" on robots meta. But if I'm using /index.php to print a custom error, I would like to change that header to "noindex, nofollow" because I don't want an erro page to be indexed and followed by crawlers. Being always the same file... could this tag swap cause problems to my site indexing?

When a user hits a file they don't have access too or doesn't exist, whatever the reason is, the page doesn't actually get redirected. The URL in the address bar stays the same and the file that's executed gets changed by Apache. This file can store data about the error, display a custom page, but it also needs to send an appropriate error code (such as a 404) so bots or whoever accessed it will stop trying to access it. You're not saying index.php doesn't exist, you're saying that path they accessed doesn't exist. I'm not sure what you mean by robots meta tag.

Note