Articles
Image pour le titre du contenu

Ce document est aussi disponible en français fr 



Counting word is easy. One just have to slice a string using space characters. And punctuation. But must ignore dash, and some time quote. And what about the Spanish tilde, the French accents, the German umlaut... Eventually, this become confusing.

In fact, it is easy. There is the str_word_count() function which does the dirty job. It has 2 modes : the first one simply count words, and the second sends back the list of words.

str_word_count() works by grouping. It identifies words by gathering letters. Those are all lettres in the alphabet, upper and lower case, and some cases of dash and quotes.

preg_split() allows also the slicing of a sentence using word boundaries: this is a job for the \b meta-character, though, in the end, it doesn't work as precisely as str_word_count().

<?php
 
 
// using French, to show off locales
setlocale(LC_ALL, 'fr_FR');
$chaine = 'Ceci est une phrase capilo-tract&eacute;e.';
 
 
print_r(str_word_count($chaine,2));
print_r(preg_split('#\b#', $chaine));
 
 
?>
 
 
 
// Note that index is the word offset in the string,
// for easy replacement with substr()
Array
(
    [0] => Ceci
    [5] => est
    [9] => une
    [13] => phrase
    [20] => capilo-tract&eacute;e
)
Array
(
    [0] => 
    [1] => Ceci
    [2] =>  
    [3] => est
    [4] =>  
    [5] => une
    [6] =>  
    [7] => phrase
    [8] =>  
    [9] => capilo
    [10] => -
    [11] => tract&eacute;e
    [12] => .
)

To be noted

Commentaires

Vous pouvez ajouter votre commentaire!


Vous devez vous connecter pour commenter