Google is crawling random concrete files (and flagging them as not mobile-friendly). Can I stop the bots from crawling everthing?

I was notified by Google that my site has “mobile usability” issues (text too small to read, etc), but when I looked into it, all the actual pages of my site are fine—it flagged 16 random concrete files (e.g., …/concrete/src/File/StorageLocation/Configuration/LocalConfiguration.php).

Why is Google even crawling these back-end files, and can I stop it?? It also appears that all the files that are flagged show a PHP error when accessed directly, though I don’t know if that’s just because they’re not intended to be accessed directly.

Side note: when looking into this I also noticed that Concrete directories on my site appear to be directly browsable—that shouldn’t be the case, should it?

At a guess, the .htaccess is missing or incorrect and maybe your robots.txt is missing. Both these files should be in the web root.

Good guess but both appear to be there. I suppose I could just add more “Disallow” rules to the robots.txt file?

Every code file that is not a class (and many that are) will begin with a test
defined('C5_EXECUTE') or die('Access Denied.');
So anything trying to run code directly will see that error.

You will need to dig deeper into the web server configuration because it shouldn’t be serving the code files to anyone. Try your host technical support for help.

Most of the time, that will not be an issue (exec or die), but if your configuration data in /application/config is exposed that could reveal your database login.

1 Like

I added a “deny from all” rule via .htaccess files, for the /concrete/ and /application/ directories, which seems to have largely broken my website (and has not resulted in Google marking the files as mobile-friendly).

Any better suggestion for how to deny direct access to these files in a concrete-friendly way?

Have a look at the .htacces the core generates for pretty urls. Make sure you don’t undo any of that. For example, denying all in /application/ would break images/files and anything cached such as aggregated css and js.

Your problem is likely a server and php configuration issue. Not a concrete issue and likely something that is wrong before apache reads your .htaccess.

@JohntheFish definitely has the right idea - if Google is your problem, it should generally be able to be handled by robots.txt with disallow rules. You might be seeing some lag time after adding those rules, so I believe you can also use Google Webmaster Tools or console or something along those lines and expedite the process by telling Google not to index that path.