Image pour le titre du contenu

Ce document est aussi disponible en français fr 

With SQL, there are a number of convinient ways to extract stats or to get distinct values from a table's column : DISTINCT, for the latter and GROUP BY with COUNT for the first.

mysql> select distinct col from table;
mysql> select col, count(*) from table group by col;

Though, data are not always given as SQL table : sometimes, you get time in PHP, as an array and the same treatement as above is needed. Going through the database would be overkill. To get the same functionnalites within PHP, there are two functions that will do the job.

array_unique() realise the dedoubling work. It takes an array, and returns a list of unique instances. Even better, the associated key is the one for the first encountered value.

$x = array('a','a','b','c','a',1, '1', 1.00, array('1'), array('1'), array('2'),array(2), array(2,2));

Ce qui donne :

    [0] => a
    [2] => b
    [3] => c
    [5] => 1
    [8] => Array
            [0] => 1

Please, note a few specific behaviors :
for 'a', the first key was 0, and it was preserved. The second key for 'a', 1, was lost. Besides that, the overall sorting of keys and values has been preserved.

1, 1.000 and '1' (the string) were all converted into the same integer. So, you should be careful with arrays containing heterogenous values, though it may not be wise to make stats on those data.

It is even worse with arrays, as only the first array is kept, and all subsequent one were destroyed. array_unique() isn't handling recursively.

If you need to count the distinct values, there is the array_count_values() function. This fonction will take an array, omis all values that are not integers or strings, and count the occurrences.

$x = array('a','a','b','c','a',1, '1', 1.00, array('1'), array('1'), array('2'),array(2), array(2,2));

This script will print
    [a] => 3
    [b] => 1
    [c] => 1
    [1] => 2

Once again, strings and integers are converted into integers.

Strangely enough, array_unique(), who is bringing less information, is far less fast than array_count_values(), who is doing more. It is then more efficient to use it with array_keys() to get the distinct values.

Compared to SQL, those functions are faster as long as the table or the number of values are small. Up to a million, filling the table, and calculating the index will not be a solution. On the other hand, if the SQL table exists, and has the right index, it will be faster.

Keep in mind


Vous pouvez ajouter votre commentaire!

Vous devez vous connecter pour commenter