Print
Category: Joomla FAQ
Hits: 8987

User Rating: 5 / 5

Star ActiveStar ActiveStar ActiveStar ActiveStar Active
 

save spiderWhen established a multilanguage Joomla web-sites (using native language feature), I met a high CPU load with even the system cache plugin enabled. Looking into logs, I found there was a number of bot entries like

141.8.132.62 - - [12/Feb/2016:16:22:52 +0200] "GET /component/k2/itemlist/user/7243?lang=en HTTP/1.1" 404 25270 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; + http://yandex.com/bots )"
141.8.132.62 - - [12/Feb/2016:16:22:54 +0200] "GET /en/component/k2/itemlist/user/918 HTTP/1.1" 404 25270 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; + http://yandex.com/bots )"
178.154.189.28 - - [12/Feb/2016:16:22:56 +0200] "GET /component/k2/itemlist/user/9753?lang=ru HTTP/1.1" 404 25911 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; + http://yandex.com/bots )"
141.8.132.62 - - [12/Feb/2016:16:22:58 +0200] "GET /en/component/k2/itemlist/user/67 HTTP/1.1" 404 25270 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; + http://yandex.com/bots )"
141.8.132.62 - - [12/Feb/2016:16:23:01 +0200] "GET /ru/component/k2/itemlist/user/7363 HTTP/1.1" 404 25911 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; + http://yandex.com/bots )"
141.8.132.62 - - [12/Feb/2016:16:23:02 +0200] "GET /ru/component/k2/itemlist/user/13355 HTTP/1.1" 404 25911 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; + http://yandex.com/bots )"
178.154.189.28 - - [12/Feb/2016:16:23:04 +0200] "GET /en/component/k2/itemlist/user/3032 HTTP/1.1" 404 25270 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; + http://yandex.com/bots )"
141.8.132.62 - - [12/Feb/2016:16:23:07 +0200] "GET /ru/component/k2/itemlist/user/437\\\\\\\\\\\\\\' HTTP/1.1" 404 25911 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; + http://yandex.com/bots )"

I didn't use K2, but it didn't matter. What mattered, the native joomla robots.txt didn't cut off those bots from scanning URLs looking like folders due to language prefix /en or /ru appeared in the URL. So I updated my robots.txt like below (I didn't use wild cards, maybe that would be better). Bots disappeared from logs, CPU usage came to normal level. So updated robots.txt for English and Katsapian language was:

# If the Joomla site is installed within a folder such as at
# e.g. www.example.com/joomla/ the robots.txt file MUST be
# moved to the site root at e.g. www.example.com/robots.txt
# AND the joomla folder name MUST be prefixed to the disallowed
# path, e.g. the Disallow rule for the /administrator/ folder
# MUST be changed to read Disallow: /joomla/administrator/
#
# For more information about the robots.txt standard, see:
#  http://www.robotstxt.org/orig.html 
#
# For syntax checking, see:
#  http://tool.motoricerca.info/robots-checker.phtml 

User-agent: *
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/

Disallow: /ru/administrator/
Disallow: /ru/bin/
Disallow: /ru/cache/
Disallow: /ru/cli/
Disallow: /ru/components/
Disallow: /ru/includes/
Disallow: /ru/installation/
Disallow: /ru/language/
Disallow: /ru/layouts/
Disallow: /ru/libraries/
Disallow: /ru/logs/
Disallow: /ru/modules/
Disallow: /ru/plugins/
Disallow: /ru/tmp/

Disallow: /en/administrator/
Disallow: /en/bin/
Disallow: /en/cache/
Disallow: /en/cli/
Disallow: /en/components/
Disallow: /en/includes/
Disallow: /en/installation/
Disallow: /en/language/
Disallow: /en/layouts/
Disallow: /en/libraries/
Disallow: /en/logs/
Disallow: /en/modules/
Disallow: /en/plugins/
Disallow: /en/tmp/


Found a typo? Please select it and press Ctrl + Enter.
FaLang translation system by Faboba