Working on finding a flat-file calendaring agent, I came-across LuxFind
LuxFind is very good at quickly and recursively finding matches, though it dumps full URLs and line-matches (great for debugging purposes, but neither of which I ultimately wanted in-production).
"Search" in HTMLy only queries the file basename (tags & slug), and not the content. Totally-Fantastic! But, this doesn't really allow me to drill-down to a particular context.
Since I had (already) been working on trying to incorporate the idea of searching for content inside HTMLy blog posts, this piqued my interest....
So, I thought: can I adapt/modify the ideas in LuxFind to work with HTMLy in searching the contents of all the blog posts?
I modified the original script to match whole-words only, have the title be the slug name, and pull the year and month out--individually--so the resulting href='site/year/month/the-respective-blog-post'.
[Here is my modified search in all its full-screen glory.]
My re-write is very extremely particular to searching for specific, whole-word content in HTMLy.
There are two important sections.
A) The functions:
$site = "https://andyrew.info/blog/"; // where is the blog?
$rootDir = "./content/andyrew/blog/"; // folder from which the files should be searched
function getFilePaths($dir) { // get all files to be searched
global $subdirs, $filePaths;
if ($items = preg_grep("~^[^.].+~",scandir($dir))) { // get folders and files
foreach($items as $item) {
if (is_dir($dir."/".$item)) { // is folder?
getFilePaths($dir."/".$item);
} elseif ($item !== basename(__FILE__) and preg_match("~\.md$~i",$item)) { // valid, individual files only
$filePaths[] = "$dir/$item";
}
}
}
}
function searchFile($file, $findRx) { // searches $file with find regex
global $counter, $site, $year, $month, $title;
$findRxWhole = '\b' . preg_quote($findRx, '~') . '\b'; // Add word boundaries (whole-words only)
$findRx = "~{$findRxWhole}~iu"; // Case-insensitive and UTF-8 mode
$lines = file($file); // load array with all hit-lines
$hit = 0; // process lines
$fileName = pathinfo(substr($file, strpos($file, "/")))['filename']; // full filename
$date = '/^(\d{4})-(\d{2})/'; // isolate the first XXXX-YY (YEAR-MO) group from the fileName
if (preg_match($date, $fileName, $matches)) {
$year = $matches[1]; // XXXX
$month = $matches[2]; // YY
}
$title = preg_replace('(^.*_)', '', $fileName); // remove date-timestamp_tags_ from filename
foreach ($lines as &$line) {
$line = preg_replace($findRx,"║$0║",$line,-1,$count); //enclose hit in ║ chars
if ($count) {
$hit = 1;
} else {
$line = '';
}
}
if ($hit) {
$counter++;
showHits($site, $file, $title, $year, $month); // search hits
}
}
function showHits($site, $file, $title, $year, $month) { // displays the hit(s)
$fLink = ($title ? "<span class='fTitle'>{$title}</span>\n\n" : '');
$hRef = $site.$year.'/'.$month.'/'.$title.''; // specific to HTMLy site/year/month/slug format
echo "<p class='openLink' title='{$hRef}'><a href='{$hRef}' target='_blank'>{$fLink}</a></p>\n\n";
}
B) The Main Program:
$url = $_SERVER['HTTPS'] == 'on' ? 'https://' : 'http://'; // get full URL
$url .= $_SERVER['SERVER_NAME'] . $_SERVER['REQUEST_URI']; // get full URL
$find = trim($_REQUEST['s'] ?? ''); // get form input
$rootDir = rtrim($rootDir,'/'); // globals
$webPath = dirname($url); // globals
$filePaths = []; // file paths to search
echo "<div class='head'>
<form name='form' action='' method='post'>
<p>\n";
echo "<input type='text' name='s' placeholder='Search blog...' value='".htmlspecialchars($find,ENT_QUOTES)."' title='please type search text...'>
</p>
</form>
</div>\n";
$counter = 0;
if (isset($_REQUEST['s'])) {
if (strlen($find) >= 3) {
$findRx = str_replace(['\?','\*'],['.','.+?'],preg_quote($find)); // prepare find regex (escape spec. chars and substitute wildcards)
echo "<p class='head'>Search Results:</p><br />\n"; // Starts searching
echo "<div id='scroll' class='hitSet'>\n";
getFilePaths($rootDir); // get files to be searched
natcasesort($filePaths); // natural sort file list
foreach ($filePaths as $file) {
searchFile($file,$findRx);
}
echo "</div>\n";
echo "<br><div class='results'>Blog Posts Found: {$counter}</div>\n";
} else {
echo "<br><div class='head hilite'>Enter a search text of at least three characters!</div>";
}
}
With all this in-mind, I pulled-out the "search" function out from functions.php
// Return search page.
function get_keyword($keyword, $page, $perpage)
{
$posts = get_blog_posts();
$tmp = array();
$words = explode(' ', $keyword);
foreach ($posts as $index => $v) {
$arr = explode('_', $v['basename']);
$filter = $arr[1] . ' ' . $arr[2];
foreach ($words as $word) {
if (stripos($filter, $word) !== false) {
if (!in_array($v, $tmp)) {
$tmp[] = $v;
}
}
}
}
if (empty($tmp)) {
return false;
}
return $tmp = get_posts($tmp, $page, $perpage);
}
I'm coming-into all this with a basic, layman’s understanding of PHP (and programming languages, in general), so it's a lot to take-in and try to understand.
Some relevant code in functions.php that I found that might help gluing these ideas together:
foreach ($posts as $index => $v) {
$post = new stdClass;
$filepath = $v['dirname'] . '/' . $v['basename'];
$post->file = $filepath;
$content = file_get_contents($filepath);
// Get the contents and convert it to HTML
$post->body = MarkdownExtra::defaultTransform(remove_html_comments($content));
// or
$postContent = MarkdownExtra::defaultTransform(remove_html_comments($content));
So (since I now have a few functions and parameters that seem-to relate to my quest) I can start to merge these ideas, together, and produce some measurable results.
Update: So, it's So.
// Return search page.
function get_keyword($keyword, $page, $perpage)
{
$posts = get_blog_posts();
$tmp = array();
foreach ($posts as $index => $v) {
$filepath = $v['dirname'] . '/' . $v['basename'];
$findRxWhole = '\b' . preg_quote($keyword, '~') . '\b'; // Add word boundaries (find whole-words only)
$findRx = "~{$findRxWhole}~iu"; // Case-insensitive and UTF-8 mode
$lines = file($filepath);
foreach ($lines as $line) {
if (preg_match ($findRx, $line)) {
if (!in_array($v, $tmp)) {
$tmp[] = $v;
}
}
}
}
if (empty($tmp)) {
return false;
}
return $tmp = get_posts($tmp, $page, $perpage);
}
// Return search result count
function keyword_count($keyword)
{
$posts = get_blog_posts();
$tmp = array();
foreach ($posts as $index => $v) {
$filepath = $v['dirname'] . '/' . $v['basename'];
$findRxWhole = '\b' . preg_quote($keyword, '~') . '\b'; // Add word boundaries (find whole-words only)
$findRx = "~{$findRxWhole}~iu"; // Case-insensitive and UTF-8 mode
$lines = file($filepath);
foreach ($lines as $line) {
if (preg_match ($findRx, $line)) {
if (!in_array($v, $tmp)) {
$tmp[] = $v;
}
}
}
}
$tmp = array_unique($tmp, SORT_REGULAR);
return count($tmp);
}
[edit: 20250128]
My PR was successfully accepted by danpros, but I forgot to limit the Search $keyword to ≥ three characters, so I wrapped the search function() with a strlen() in the code:
// Return search result count
function keyword_count($keyword)
{
if (strlen($keyword) >= 3) { // three-character minimum
// ...
}
}
Without a baseline limiting function, one could basically search for "t", which goes-against the grain of the idea of what a specific keyword search is all about.
The additional, modified PR was accepted, and I now consider this endeavour Closed ;)
In the process of discovery, I have also realized how to modify the title in main.html.php $is_search to label and show the $search->count:
<?php if (isset($is_search)):?>
<!-- main.html.php -->
<div class="row justify-content-center" style="padding-top: 3rem;">
<div class="col-md-12 text-center">
<h2 class="mt-0">Search: <span style='color: #628B48;'><?php echo $search->title;?></span> (<?php echo $search->count;?>)</h2>
<form><input type="search" name="search" class="form-control is-search" placeholder="<?php echo i18n('Type_to_search');?>"></form>
</div>
</div>
<?php endif;?>
fini