Viewing 8 posts - 1 through 8 (of 8 total)
  • Author
    Posts
  • aardcom Friend
    #165971

    I came across a suggestion for an alternative robots.txt but it resulted in losing some indexes. The suggested alternative was as follows:

    Sitemap: http://www.example.com/sitemap.xml
    User-agent: *
    Allow: /index.php?option=com_xmap&sitemap=1&view=xml
    Allow: /sitemap.xml
    Allow: /index.php
    Allow: /index.html
    Allow: /index.htm
    Allow: /home
    Disallow: /

    User-agent: Googlebot-Image
    Disallow: /

    The idea being that google would index the sitemap links and one doesn’t need to worry about google linking to pages you do not want indexed. Seems like a good concept. In practice though we found that pages such as http://www.example.com/sitelink did not get indexed as they appear to blocked by the robots.txt even though they are present on the sitemap.

    Anyone have suggestions or alternatives to the standard robots.txt when using SEO friendly urls. We’re trying to avoid listing all links we do not want indexed, by allowing only those we do. Any other suggestions welcome as well.

    We realize the googlebot-image dissallow command is a preference some want indexed and others do not, but we’re more concerned with the main issue.

    adela01 Friend
    #400057

    I also think need something more friendly

    taniasharma Friend
    #402540

    robots.txt is use for tell to Google about which pages you want to crawl or which are not. Everyone must use robots.txt for own website even they want to crawl all pages of their own website.

    aardcom Friend
    #402687

    We understand the purpose, but it’s about using a simpler form of the robots.txt and only giving access to pages that should be accessed not listing those that shouldn’t. By listing all the folders you don’t want crawled you provide a list of folders to potential hackers. There must be another way to write up a good robots.txt

    mammam Friend
    #415172

    Where should I put this robot and how to tell Google that I put it there?

    thedigger Friend
    #425471

    Robots.txt should be always in the root directory. And there is no need to submit.

    portstracking Friend
    #425900

    <em>@taniasharma 254962 wrote:</em><blockquote>robots.txt is use for tell to Google about which pages you want to crawl or which are not. Everyone must use robots.txt for own website even they want to crawl all pages of their own website.</blockquote>
    I agree with you. Use Robots.txt will help much in this job.

    aardcom Friend
    #425927

    My original post was in regards to a better way to write a robots.txt file. Most I’ve seen require adding what not to view as opposed to what should be viewed. I would think that would be better and easier to maintain over the long term but can’t seem to figure out a way to do so when utilizing SEF links.

Viewing 8 posts - 1 through 8 (of 8 total)

This topic contains 8 replies, has 6 voices, and was last updated by  aardcom 12 years, 12 months ago.

We moved to new unified forum. Please post all new support queries in our New Forum