Including the Document Library block in search engine

Does anyone know if there’s a reason the Document Library doesn’t by default have a getSearchableContent function, and therefore isn’t included in the page search index?

I’m wondering if it’s a performance concern, or whether it’s just not something that’s been considered.

I whipped up a solution for this earlier via an override, which looks like this:

<?php
namespace Application\Block\DocumentLibrary;

use Concrete\Block\DocumentLibrary\Controller as DocumentLibraryBlockController;
use Concrete\Core\File\File;
use Concrete\Core\File\FolderItemList;

class Controller extends DocumentLibraryBlockController
{
    public function getSearchableContent()
    {
        $list = new FolderItemList();
        $list = $this->setupFolderFileSetFilter($list);
        $list = $this->setupFolderFileFolderFilter($list);
        $list->ignorePermissions();
        $list->setItemsPerPage(100000);

        $list = $this->setupFolderAdvancedSearch($list);

        $pagination = $list->getPagination();
        $results = $pagination->getCurrentPageResults();

        $output = '';

        foreach($results as $f) {
            if ($f instanceof \Concrete\Core\Tree\Node\Type\File) {
                $fileID = $f->getTreeNodeFileID();
            } else {
                $fileID = $f->getTreeNodeID();
            }

            $file = File::getByID($fileID);

            if ($file) {
                $output .= $file->getTitle() . ' ' . $file->getDescription() . ' ';
            }

        }

        return $output;
    }
}

I’m not aware of any reason, but if you submitted a pull request to add that, if there is a reason, you’ll hear about it :stuck_out_tongue:

The above is a reasonable solution for some sites in some circumstances, but is not a sufficiently broad solution for all applications.

With a lot of documents, it would create a big spike in processing when the block is saved and within one slice of the indexing job (or task), then run out of resources.

Bundling the searchable content for all documents into the index for one page that only shows 10 or 20 documents at a time could be misleading when the document in question is on the 30th pagination within the block. So the solution would also need to directly link into a pagination of the block from the search results.

I had these thoughts as well, and perhaps these are the reason why we don’t have it in the core by default.

For what my client needed and has on their site it works fine, but yes, I could easily see how it would cause a processing overhead that could max out the indexing.