MSN Search and referer logs
November 22nd, 2006 Ryan Jones
If anybody from Microsoft or MSN search reads my blog, give me a shout (ryan at no slang dot com). MSN really needs a URL removal function other than robots.txt and what not. Let me explain why:
To any developers out there, here’s some important advice too:
Always Password Protect, and make Robots.txt files FIRST, even if your site is still just a prototype. I learned that lesson the hard way.
It seems MSN has a perfect cached image of some internal web applications of mine that aren’t meant for external eyes. The page in question (while still in development) had links to external websites on it, and somebody working on the site clicked them. The problem is, this external website’s referer log is indexed by MSN search and publically accessible (FYI it’s not a good idea to share your server logs with the world). Thus, the URL of the internal application (complete with parameters) shows up in their stats page.
For some reason, it only took a day for MSN to index this URL, and now what should be password protected and robots.txt excluded information is showing up for some obscure searches.
The problem is this site isn’t linked anywhere anymore, so spiders aren’t likely to return anytime soon. There’s also no way to remove a site from MSN other than by using .htaccess and robots.txt (I’ve done that). A user can’t see the current site, but that cached version looks like it’s going to be there for a long time.
Lesson learned I guess, always prevent spiders from accessing stuff… even if it’s not linked anywhere and especially if it’s still in development.
Entry Filed under: Uncategorized