How to exclude Unpublished Pages from SearchIndex?

concrete 8.5.7

I have News as Pages with collectionDatePublic (core attr) that can be set to the future to schedule news publishing.

However, if the url is guessed the page is reachable even before the public date and even worse, the page shows up in the search long before it reaches its public date.

How can i fix this? Is this intended behaviour?

I tried to hack concrete/src/Page/Controller/PageController::getSearchableCotnent() to return an empty string but that changed nothing. (the text that gets found by the search is in the page description or the page title, i can exclude blocks from the search index)

You could schedule view permissions on the page instead - I think that would stop it showing up in the search index / being reachable via direct URL.

Thanks for answering Evan.

I do not think so. In my Experience (just tested with a naked 9.1.3 app) pages that are not yet published are picked up by the autonav and search.

From your reply i guess that is not intended.

Update from me:

I tried to create NewsPages with publish date in the future as draft. They then do not get indexed and do not show up in the search BUT they won’t get out of the draft state when the publish date is reached and stay inaccessible to the Visitors. So that is of no use.

I now checked after creating the Page with publish date in the future:

Page::getCurrentPage()->getPageIndexScore()
0
Page::getCurrentPage()->getPageIndexContent()
""

But the Page shows up in the search so that leaves me with questions.

(the page is created like this for reference:)

$newsListPage = Page::getCurrentPage();
$newsItemPT = PageType::getByHandle('news_item');
$template = PageTemplate::getByHandle('full');

$news_site = $newsListPage->add($newsItemPT, array(
  'cName' => $_POST['news_title'],
  'cDescription' => trim($_POST['news_description']),
  'cDatePublic' => "{$_POST['news_pub_date_dt']} {$_POST['news_pub_date_h']}:{$_POST['news_pub_date_m']}",
), $template);
$news_site->setAttribute('thumbnail', File::getByID($_POST['news_image_id']));

I just want the client to be able to schedule news publishing that do not appear in the search until the page is public…

More Update:

I could find a fix for my Problem. I can change the two functions “pagesToQueue()” in jobs/index_search.php and jobs/indes_search_all.php to only include approved and public Pages:

        // Find all pages that need indexing
        $query = $qb
            ->select('p.cID')
            ->from('Pages', 'p')
            ->leftJoin('p', 'CollectionSearchIndexAttributes', 'a', 'p.cID = a.cID')
            ->leftJoin('p', 'Collections', 'c', 'p.cID = c.cID')
            ->leftJoin('p', 'PageSearchIndex', 's', 'p.cID = s.cID')
            ->leftJoin('p', 'CollectionVersions', 'cv', 'p.cID = cv.cID')   //we need this to add the where down below
            ->where('cIsActive = 1')
            ->andWhere('cv.cvIsApproved = 1')                    // this is new
            ->andWhere('cv.cvDatePublic < NOW()')                //this is new
            ->andWhere($qb->expr()->orX(
                'a.ak_exclude_search_index is null',
                'a.ak_exclude_search_index = 0'
            ))
            ->andWhere($qb->expr()->orX(
                'cDateModified > s.cDateLastIndexed',
                "(UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(s.cDateLastIndexed) > {$timeout})",
                's.cID is null',
                's.cDateLastIndexed is null'
            ))->execute();

Soo that leaves me frustrated and with an open question: Is it the intended way to index unpublished sites for search?

@donat I don’t imagine that’s the intended behavior - probably worth opening an issue in the core project to get some attention / discussion on that because I agree that doesn’t seem desirable.