OneSignal Index Issue


(Serdar) #1

Hi,

I’m using Onesignal on my website. And Google shows OneSignal indexes of my website.

How can I remove them from the Google?


(Serdar) #2

Should I add following code to the robots.txt?

Disallow: /wp-content/plugins/onesignal-free-web-push-notifications


({ Beautiful Code!; }) #3

Your issue is similar to this thread! Check it out!


#4

Hi dear @Turk

According to Google Webmaster Guidelines, to noindex any URL the first basic rule is

  • Never block it via robots.txt

Why? Because, robots.txt prevent crawling. If Googlebot cannot crawl, they cannot see the HTTP header or parse HTML to see the noindex directive added via meta tag.

Disallow: / 

is not equal to

<meta name="robots" content="noindex">

or

<meta name="googlebot" content="noindex">

Or HTTP header

X-Robots-Tag: noindex

Or explicitly

HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
(…)
X-Robots-Tag: googlebot: nofollow
X-Robots-Tag: otherbot: noindex, nofollow
(…)

Why??

Because, the noindex directive explicitly tells Bots to do not index specific URL.

Unfortunately, some wrong information trending over Internet has made people believe that robots.txt prevent indexing.

I see webmasters simply block URL by using Disavow directive at the robots.txt that simply block crawling. URL would be still visible in the SERPs.

Solution
To noindex any URL,

  • first you shouldn’t block that URL via robots.txt
  • Second, you should use noindex directive

Notable Reference (which most people HATE reading)


#5
$url = "//{$_SERVER['HTTP_HOST']}{$_SERVER["REQUEST_URI"]}";
if (preg_match("#/onesignal-free-web-push-notifications/#", $url))
{
	header( "X-Robots-Tag: noindex, follow", true );
}

To noindex alleged Onesignal path, add above code in the functions.php


({ Beautiful Code!; }) #6

Will the above code for my case as well? My blog is based on Vultr Cloud VPS LEMP Stack, No Apache server installed! (no .htaccess file)


#7

Assuming, you are using WordPress, and a theme that has functions.php file. Just there you need to add that code in the last line. Alternatively, you may place it via Code Snippet plugin.


(Serdar) #8

Thank you bro :slight_smile:
I have added your code to functions.php of my child theme.

And robots.txt looks like this according to your recommendations:

User-agent: *
Disallow: /?s=*
Disallow: /wp-admin/
Disallow: /xmlrpc.php
Allow: /wp-admin/admin-ajax.php
Allow: /wp-admin/images/
Allow: /wp-admin/css/
Allow: /wp-admin/js/

Sitemap: https://kriptobyte.com/sitemap_index.xml

#9

You’re welcome!

Everything is good but

Disallow: /xmlrpc.php

This is not something which should be there. I guess, I never recommended it.


(Serdar) #10

You’re right! sorry,

I have added Disallow: /xmlrpc.php line according to Perishablepress

But I can remove that :slight_smile:


#11

Once that functions.php code added,

Please make sure to verify HTTP header response, it should return a line

x-robots-tag => noindex, follow

Ref: screenshot (an example) representing when I did test for path :arrow_right: https://www.gulshankumar.net/wp-content/plugins/onesignal-free-web-push-notifications/readme.txt


#12

What does it mean?
When Googlebot next time will crawl specific path related to OneSignal, it will do noindex. It may take sometime.

So, this is the proper way to do. Any question? Please feel free to ask. Thanks


(Serdar) #13

I also redirected wp-content pages to 403

56

Is that bad idea?


#14

403 is a vague kind of HTTP response for the Googlebot (Ref: 1). Untill this issue resolved you should avoid it.

  1. Tweet by the Google representative, webmasters trend analyst

(Serdar) #15

Thank you so much :slight_smile:

I have realized now, readme.txt files can not open because of the following code:

# SECURE LOOSE FILES 
<IfModule mod_alias.c>
	RedirectMatch 403 (?i)(^#.*#|~)$
	RedirectMatch 403 (?i)/readme\.(html|txt)
	RedirectMatch 403 (?i)\.(ds_store|well-known)
	RedirectMatch 403 (?i)/wp-config-sample\.php
	RedirectMatch 403 (?i)\.(7z|bak|bz2|com|conf|dist|fla|git|inc|ini|log|old|psd|rar|tar|tgz|save|sh|sql|svn|swo|swp)$
</IfModule>

#16

Yes, that is true. You can try checking some other file existing in the OneSignal directory.

Here’s one, I just found. It is a plain CSS code. It would be helpful in the debugging.

/wp-content/plugins/onesignal-free-web-push-notifications/views/css/icons.css


(Serdar) #17

Although I put the code to the functions.php of my child theme, header looks like this:

01

I don’t know why it is :open_mouth:

https://kriptobyte.com/wp-content/plugins/onesignal-free-web-push-notifications/views/css/icons.css


#18

Yes, I see. Can you please try adding same Snippet using below plugin? Make sure to save and activate it.


(Amit Tiwari) #19

Why should block search result in robots.txt?

  1. Disallow: /*?s=
  2. Disallow: /?s=*
    What is the difference between 1 & 2 ?

#20

This will block these type of URLs

http://example.com/anything_comes_here?s=

Doesn’t adhere to WordPress permalink structure. As far I know, WordPress doesn’t use this kind of permalink with query string.

This is suppose to block crawling for the Search Results page. For example, this one.
https://www.gulshankumar.net/?s=WordPress