User Rating: 5 / 5

Star ActiveStar ActiveStar ActiveStar ActiveStar Active
 

save spiderWhen established a multilanguage Joomla web-sites (using native language feature), I met a high CPU load with even the system cache plugin enabled. Looking into logs, I found there was a number of bot entries like

141.8.132.62 - - [12/Feb/2016:16:22:52 +0200] "GET /component/k2/itemlist/user/7243?lang=en HTTP/1.1" 404 25270 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
141.8.132.62 - - [12/Feb/2016:16:22:54 +0200] "GET /en/component/k2/itemlist/user/918 HTTP/1.1" 404 25270 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
178.154.189.28 - - [12/Feb/2016:16:22:56 +0200] "GET /component/k2/itemlist/user/9753?lang=ru HTTP/1.1" 404 25911 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
141.8.132.62 - - [12/Feb/2016:16:22:58 +0200] "GET /en/component/k2/itemlist/user/67 HTTP/1.1" 404 25270 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
141.8.132.62 - - [12/Feb/2016:16:23:01 +0200] "GET /ru/component/k2/itemlist/user/7363 HTTP/1.1" 404 25911 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
141.8.132.62 - - [12/Feb/2016:16:23:02 +0200] "GET /ru/component/k2/itemlist/user/13355 HTTP/1.1" 404 25911 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
178.154.189.28 - - [12/Feb/2016:16:23:04 +0200] "GET /en/component/k2/itemlist/user/3032 HTTP/1.1" 404 25270 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
141.8.132.62 - - [12/Feb/2016:16:23:07 +0200] "GET /ru/component/k2/itemlist/user/437\\\\\\\\\\\\\\' HTTP/1.1" 404 25911 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"

I didn't use K2, but it didn't matter. What mattered, the native joomla robots.txt didn't cut off those bots from scanning URLs looking like folders due to language prefix /en or /ru appeared in the URL. So I updated my robots.txt like below (I didn't use wild cards, maybe that would be better). Bots disappeared from logs, CPU usage came to normal level. So updated robots.txt for English and Katsapian language was:

# If the Joomla site is installed within a folder such as at
# e.g. www.example.com/joomla/ the robots.txt file MUST be
# moved to the site root at e.g. www.example.com/robots.txt
# AND the joomla folder name MUST be prefixed to the disallowed
# path, e.g. the Disallow rule for the /administrator/ folder
# MUST be changed to read Disallow: /joomla/administrator/
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/orig.html
#
# For syntax checking, see:
# http://tool.motoricerca.info/robots-checker.phtml

User-agent: *
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/

Disallow: /ru/administrator/
Disallow: /ru/bin/
Disallow: /ru/cache/
Disallow: /ru/cli/
Disallow: /ru/components/
Disallow: /ru/includes/
Disallow: /ru/installation/
Disallow: /ru/language/
Disallow: /ru/layouts/
Disallow: /ru/libraries/
Disallow: /ru/logs/
Disallow: /ru/modules/
Disallow: /ru/plugins/
Disallow: /ru/tmp/

Disallow: /en/administrator/
Disallow: /en/bin/
Disallow: /en/cache/
Disallow: /en/cli/
Disallow: /en/components/
Disallow: /en/includes/
Disallow: /en/installation/
Disallow: /en/language/
Disallow: /en/layouts/
Disallow: /en/libraries/
Disallow: /en/logs/
Disallow: /en/modules/
Disallow: /en/plugins/
Disallow: /en/tmp/


Please register to post comments.

Found a typo? Please select it and press Ctrl + Enter.
FaLang translation system by Faboba
Електронна бібліотека «Exlibris»: історичні дослідження, мемуаристика, публіцистика, художня література Українська етнографія: книги, курси лекцій, статті й матеріали, мапи, фотоальбоми, веб-ресурси Правителі України: портрети, життєписи, матеріали Твори Адріана Кащенка: романтика козацьких часів Народна війна 1917-1932

See a typo?

Select the text block and press

CTRL+Enter

Comments

  • NotificationAry - get emails on Joomla content is submitted, added or changed

    Berndi Berndi 30.06.2019 11:45
    The fields %TO_NAME% and %TO_USERNAME% are empty

    Read more...

     
  • NotificationAry - get emails on Joomla content is submitted, added or changed

    Berndi Berndi 30.06.2019 11:43
    My fault, I made it

    Read more...

     
  • NotificationAry - get emails on Joomla content is submitted, added or changed

    Berndi Berndi 30.06.2019 10:46
    I did. Now I'm asking me how to put the surename into the email... :-|

    Read more...

     
  • NavigationAry - navigate between Joomla menu items like between articles using "prev - next"

    Berndi Berndi 29.06.2019 15:59
    I also miss content in %TO_NAME%

    Read more...

     
  • NotificationAry - get emails on Joomla content is submitted, added or changed

    Berndi Berndi 29.06.2019 15:15
    Hi gruz, I have the problem, that when I use the %AUTHOR% tag, the plugin dosen't send the author ...

    Read more...