-
AuthorPosts
-
aardcom Friend
aardcom
- Join date:
- February 2008
- Posts:
- 183
- Downloads:
- 66
- Uploads:
- 3
- Thanks:
- 46
- Thanked:
- 30 times in 4 posts
July 5, 2011 at 3:45 pm #165971I came across a suggestion for an alternative robots.txt but it resulted in losing some indexes. The suggested alternative was as follows:
Sitemap: http://www.example.com/sitemap.xml
User-agent: *
Allow: /index.php?option=com_xmap&sitemap=1&view=xml
Allow: /sitemap.xml
Allow: /index.php
Allow: /index.html
Allow: /index.htm
Allow: /home
Disallow: /User-agent: Googlebot-Image
Disallow: /The idea being that google would index the sitemap links and one doesn’t need to worry about google linking to pages you do not want indexed. Seems like a good concept. In practice though we found that pages such as http://www.example.com/sitelink did not get indexed as they appear to blocked by the robots.txt even though they are present on the sitemap.
Anyone have suggestions or alternatives to the standard robots.txt when using SEO friendly urls. We’re trying to avoid listing all links we do not want indexed, by allowing only those we do. Any other suggestions welcome as well.
We realize the googlebot-image dissallow command is a preference some want indexed and others do not, but we’re more concerned with the main issue.
July 6, 2011 at 7:43 am #400057I also think need something more friendly
July 22, 2011 at 12:59 pm #402540robots.txt is use for tell to Google about which pages you want to crawl or which are not. Everyone must use robots.txt for own website even they want to crawl all pages of their own website.
aardcom Friendaardcom
- Join date:
- February 2008
- Posts:
- 183
- Downloads:
- 66
- Uploads:
- 3
- Thanks:
- 46
- Thanked:
- 30 times in 4 posts
July 23, 2011 at 1:37 pm #402687We understand the purpose, but it’s about using a simpler form of the robots.txt and only giving access to pages that should be accessed not listing those that shouldn’t. By listing all the folders you don’t want crawled you provide a list of folders to potential hackers. There must be another way to write up a good robots.txt
September 27, 2011 at 1:27 am #415172Where should I put this robot and how to tell Google that I put it there?
November 20, 2011 at 10:57 am #425471Robots.txt should be always in the root directory. And there is no need to submit.
November 22, 2011 at 2:12 pm #425900<em>@taniasharma 254962 wrote:</em><blockquote>robots.txt is use for tell to Google about which pages you want to crawl or which are not. Everyone must use robots.txt for own website even they want to crawl all pages of their own website.</blockquote>
I agree with you. Use Robots.txt will help much in this job.aardcom Friendaardcom
- Join date:
- February 2008
- Posts:
- 183
- Downloads:
- 66
- Uploads:
- 3
- Thanks:
- 46
- Thanked:
- 30 times in 4 posts
November 22, 2011 at 3:57 pm #425927My original post was in regards to a better way to write a robots.txt file. Most I’ve seen require adding what not to view as opposed to what should be viewed. I would think that would be better and easier to maintain over the long term but can’t seem to figure out a way to do so when utilizing SEF links.
-
AuthorPosts
This topic contains 8 replies, has 6 voices, and was last updated by aardcom 13 years, 1 month ago.
We moved to new unified forum. Please post all new support queries in our New Forum