Adding 'robots.txt' for webMethods API Portal

What is a robots.txt file?

Robots.txt is a text file webmasters create to instruct search engine robots how to crawl pages on a website. The robots.txt file is part of the the robots exclusion protocol (REP), a group of web standards that regulate how robots crawl the web, access and index content. 

In practice, robots.txt files indicate whether certain user agents can or cannot crawl parts of a website. These crawl instructions are specified by “disallowing” or “allowing” the behavior of certain (or all) user agents.

Basic format:

User-agent: [user-agent name]
Disallow: [URL string not to be crawled]

Where does robots.txt go on a site?

In order to be found, a robots.txt file must be placed in a website’s top-level directory(/). Robots.txt is case sensitive: the file must be named “robots.txt” (not Robots.txt, robots.TXT, or otherwise).The /robots.txt file is a publicly available.

Sample robots file

How to add webMethods API Portal?

As part of this tutorial we will try adding below sample robots.txt content to API Portal

User-agent: Google
Disallow:

User-agent: *
Disallow: /

 

Above robots.txt configuration file allows single bot named "Google" and disallows all other bots on complete API Portal content.

Step I

Please add below snippet of configuration in httpd-custom.conf / httpd-ssl-custom.conf

LoadModule alias_module modules/mod_alias.so

Alias "/docs" "<directoryContainingRobotsFile>"
RewriteCond %{REQUEST_URI} ^/robots.txt
RewriteRule "^/robots\.txt$" "/docs/robots.txt" [PT,L]

The Alias directive with httpd allows documents to be stored in the local filesystem other than under the DocumentRoot(Usually be $$SoftwareAG/API_Portal/server/bin/agentLocalRepo/.unpacked/httpd-run-prod-*-runnable.zip/httpd/htdocs). Here we are creating a alias "/docs" which points to a directory containing robots.txt. Please be sure the directory contains anyother important files. 

We are creating a rewrite rule for accessing robots.txt to serve it from the created alias when the request URI matches to /robots.txt.

The changes has to be applied in both httpd-custom.conf/httpd-custom-ssl.conf files inorder to be served from ssl/non-ssl contexts.

Step  II

Open ACC console and reconfigure loadbalancer runnable as below

reconfigure <loadbalancerInstanceId> HTTPD.modjk.exclude.cop="apidocs","docs"

loadbalacnerInstanceId could be either loadbalancer_s/loadbalancer_m/loadbalancer_l depending on your install. 

Step III

Now restart the loadbalancer runnable and you will see robots.txt when you access https://api.xyz.com/robots.txt

 

robots.PNG