and like us on FB

A number of years ago I shared a PHP function that I used to truncate strings to a character count. Simply because I had it in my little library I used it a number of times without consideration to alternative methods of accomplishing the same thing… and there are easier ways. The same ignorance applied to my use of the function in WordPress when there’s a function that’ll do everything and more than my little function ever could.

This article will look at a number of different ways of shortening strings to either a word count or character count… with and without the trailing characters (usually dots…). Most of the functions on this page are used simply to demonstrate the excellent library of string manipulation functions available in PHP, and how they might have been used to do what I’ve just described.

For all our examples we’ll use the following string, and we’ll assume we want to truncate the 187 characters down to 120 for Twitter.

Lorem ipsum velit inceptos posuere augue amet sagittis augue sapien gravida vestra nulla non hac ac luctus imperdiet pulvinar ligula hac elit molestie vestibulum fusce porttitor lacinia.

In our example download we’ll also include a second string of 55 characters so you may examine the returned result using a shorter string.

Function Zero

Following is the first function I wrote years ago, and used countless times. It was originally published on Internoetics as a standalone article in 2009.

In looking into alternatives I got stuck in my own loop and experimented with the following.

mb_strimwidth()

mb_strimwidth will return a truncated string with a specific width, and append trailing characters as specified in the function. Because it’s a core PHP function, its use should be considered before anything else. Usage is easy:

The function considers a space as a character – as it should – but this means that there will be a white space before the trailing dots if that’s where the character count ends. If that’s the case, you might want to trim the string then add the dots yourself. Consider the following:

The modified code above will add the trimmarker (dots) indiscriminately regardless of whether the string was or wasn’t truncated. To correct this, we’ll count the string and then only add the dots if the original string did in fact undergo the cut. For example:

When sending messages to Twitter and other character-sensitive locations, every character counts… and this function will occasionally save you a blank space!

mb_substr()

PHP’s mb_substr will “get part of a string”. It performs a multi-byte safe substr() operation based on number of characters. Position is counted from the beginning of str. First character’s position is 0. Second character position is 1, and so on.

To add trailing dots (or any trailing characters), we can modify the first function we looked at. Not unlike the first function, we’ll also trim the string before adding the $trimmarker to ensure we don’t waste a blank space.

mb_substr(), substr(), and mb_strcut()

The mb_substr() , substr() , and mb_strcut() are just a few similar functions to those just described, with the difference being (mainly) how they deal with multibyte character sets (Chinese etc).

If you’re after a basic example that’ll output truncated text to the nearest word based on a character count (but without trailing dots), use the following:

Converted into a function, and this time with the option of adding ellipses, use the following:

preg_match()

You’ll rarely find advocates for regular expressions when there are so many excellent PHP functions that can be used out-of-the-box. However, here’s a function that’ll truncate a string to a whole word given a defined number of characters from the start of the string, up until a word boundary. Unlike the other functions described so far, we’ll snip the string at a full word.

The function described

The function accepts three parameters; $string, $length, and $trimmarker (the dots or characters that come after the string.

Line 10

The first thing we do is check the length of the string. If the string is shorter than then defined $length then we’ll just return it.

Line 12

The mb_substr function forces a break at $length if no word (space) boundary. If we passed a 500 character string and there were no spaces, then the whole string would be returned (because the preg_match function wouldn’t find a word boundary). By snipping it now, we return the whole thing as a result. It’s not a feature… but to correct for input errors.

Line 13, 14, and 15

If the length of our string is greater than the maximum length defined as a parameter in our function, we’ll perform a preg_match() regular expression match to return the portion of the string up to $length characters defined by a word boundary ('/^.{1,$length}\b/s'
). The period sign means Any character except new line (\n). The curly parenthesis defines the quantifier that’ll determine how many characters are permitted… so {1,$length} means between 1 and $length characters. Finally, the \b means that the pattern will match on a word boundary (meaning that we can do a “whole words only” search on the pattern we provided). Finally, the s white space character class adds all white spaces to the search.

Because we don’t want our returned string to exceed $length, the upper character count in our preg_match function needs to be the maximum length minus the length of the $trimmarker – we account for this.

We then return either the truncated string or the original string if it didn’t exceed our length limit.

strrpos()

The strrpos() function finds the position of the last occurrence of a substring in a string. The function returns the position where the needle exists relative to the beginning of the haystack string. Note also that the string position starts at 0 – not 1…. so we account for this in the function by adding 1 to the string length when applying the strrpos function.

wordwrap()

Using wordwrap is another way that you can truncate a string, although it’s not very effective and somewhat of a poor choice (unless circumstances required it). Wordwrap will wrap a string to a given number of characters using a string break character… and used with PHP’s explode function we can build an array of each line of text. We determine if the $trimmarker (trailing dots) is required simply by querying whether the second array value is empty (if empty, the line didn’t wrap).

Setting the cut parameter to true means that the string is always wrapped at or before the specified width. To ensure that we keep under the maximum $length we do the dodgy and just force the string to a value that accounts for the length of the $trimmarker.

str-split()

The str-split function may be used in the above function to convert the string to an array (not that different to wordwrap without the fancy parameters). str-split won’t split at a complete word…. but it will keep the string truncated to exactly 120 characters.

Truncate to a Word Count

Following is an example of exploding a string into an array by space (or words)… not that different to what we’ve already done. We then implode() the corrected array into a string of characters define by our $limit. We add the $trimmarker (…) if our limit is less than the array word count.

strtok()

Using a combination of strtok() and wordwrap() we can use a very short but effective function that’ll truncate to length. As shown below it won’t truncate with consideration to the $length + $trimmarker, but it’s handy if you’re not overly concerned about the length of the returned data.

Trim Words in WordPress

I’ve stupidly used my own functions to truncate text and words in WordPress when it has its own inbuilt functions. To return trimmed words in WordPress use wp_trim_words . It can often be used in company with wp_strip_all_tags to clean the text before it is processed. Of course there are excerpt functions that serve a similar purpose.

Considerations

We literally could have written up hundreds of examples, but we had to stop somewhere. If nothing else, the code on this page demonstrates the usage of various string functions that are at the core of the PHP language. While we’ve generally avoided regular expression based solutions we may address them in the future.

In a number of examples we’ve returned the $trimmarker as three dots. You could optionally return the HTML entity for a Ellipsis by using the code of … (…). I personally prefer three periods.

Download

Title: Truncate Strings
Description: Truncate (Shorten) Strings to the Nearest Whole Word or Character Count with Trailing Dots using PHP Functions.
Download: PHP Code (V0.2) | Plugin Page

Shortt URL for this post: http://shor.tt/2FR4