Calculate Differences Between Strings of Text With PHP

and like us on FB

If you’ve ever dealt with quality and version management you would appreciate the need to record all changes to specific text for the purpose of quality assurance. Some time back I was working on a small project and required the following:

  • Record specific changes to the published version of a document.
  • Provide archived versions of all published data.
  • Render all changes to text on a screen before it was saved to the database.
  • Provide a “roll-back” feature to the last saved version.

Following are a few options I investigated before I implemented my basic solution.

PHP’s array_diff() Function

In the first instance I naturally gravitated towards PHP’s native array_diff() function. It will compare two or more arrays and compute the differences.

The array_diff() function compares two or more arrays, and returns an array with the keys and values from the first array, only if the value is not present in any of the other arrays.

Example:

Keep in mind that this does not return a new array; it simply unsets the matching values. This means that the indexes of the array are not numerical from zero. Although not entirely relevant, PHP’s array_merge() function is an easy way to overcome this. The returned array will have numerical indexes.

You can print out the values as text (or do anything else with them) with code similar to the following:

The array_diff() function was a nice start, but I was somewhat clueless when it came to recursively comparing each element of the array and actually applying it as I explained above. Thankfully, others had previously addressed the problem.

Simple Diff Algorithm in PHP

The following function that will compare the differences between two strings:

Usage:

Result:

The quick brown fox jumped slow yellow cat walked over the lazy dog. fox.

The function works by “finding the longest sequence of words common to both strings, and recursively finding the longest sequences of the remainders of the string until the substrings have no words in common. At this point it adds the remaining new words as an insertion and the remaining old words as a deletion”.