and like us on FB

We wrote this function back in 2009 and, since then, its found its way onto hundreds – if not thousands – of other websites. The function was initially written to extract suggested keywords from a string of text that the user could optionally apply to a post. We’ve not modified it on migration from Internoetics so will likely update it soon.

Creating the Keywords/Tags

Given the text body, we wanted to accomplish the following:

  • Based on the entire text, I wanted to create keywords from all words in the text.
  • I wanted a ‘blacklist’ of $commonWords that I could remove from the returned keyword array.
  • I wanted to compare the extracted keywords (the $words array) with an array of permitted keywords. Only the returned keywords that are also in the $allowedWords array will be returned.
  • I wanted to restrict keyword output to words over ‘n’ characters in length.
  • I was required to limit keywords that appeared a minimum of ‘n’ times in the submitted text.
  • I wanted to specify how many keywords, in total, would be returned.

Example

Take, for example, the following block of text (extracted from another blog post):

Many systems that traditionally had a reliance on the pneumatic system have been transitioned to the electrical architecture. They include engine start, API start, wing ice protection, hydraulic pumps and cabin pressurisation. The only remaining bleed system on the 787 is the anti-ice system for the engine inlets. In fact, Boeing claims that the move to electrical systems has reduced the load on engines (from pneumatic hungry systems) by up to 35 percent (not unlike today's electrically power flight simulators that use 20% of the electricity consumed by the older hydraulically actuated flight sims).

Usage:

echo extract_keywords($text);

Output: ice, pneumatic, engine, electrical

The extracted keywords aren’t ideal… but they are a good starting point for ‘suggested’ tags that the end user can refine.

The PHP Function

You should download the PHP function below. The $commonWords array is several hundred words in length so it wasn’t practical reproducing it in this post.

Usage is as follows:

Notes on Usage

In the above example, if $restrict = true were set to false, the tags returned would be system, systems, engine, start, ice. This is because we’re only omitting the $commonWords from the result (and evaluating every other word for consideration). The results is less accurate than comparing against a preferred keyword array.

The most accurate results are obtained from refining the $allowedWords array and including as many subject-specific words as possible to cover all preferred tags.

$min_word_length determines what words are searched. In our case, anything less than 3 characters in length will be ignored.

$min_word_occurrence determines how many times a word must be written into text before it can be considered for inclusion in returned keywords.

$as_array specifies whether the keywords are rendered as text or as an array.

$max_words determines the maximum number of words to return in the keyword string.

Download

The second half to this article (previously shared on Internoetics) will be published soon.

Title: Create Keyword Tags from Text With PHP
Description: Create Keyword Tags from Text With PHP.
Download: PHP Code (V0.2) | Plugin Page

Shortt URL for this post: http://shor.tt/36IH