utf 8 - PHP - preg_replace and UTF8 -


i'm retrieving data database. receive array contains 'title' index utf8 encoded value. i'd use value name of file in saved, i'm doing this:

file_put_contents($filename, $content); 

where $filename is

'-' . $category['root'] . '-articles-' . $category['id'] . '-' . $this->urlize($category['category']) 

here code of "urlize" :

private function urlize($value) {     if ($value != null && trim($value) != '')     {         $value = preg_replace('/([\[\(].*[\]\)])/i', '', $value);         $value = preg_replace('/[\s]/i', '-', $value);         $value = preg_replace('/[,!?.;:\"\'&+\/]/i', '-', $value);         $value = preg_replace('/[-]+/i', '-', $value);         $value = preg_replace('/(^-)/i', '', $value);         $value = preg_replace('/-$/i', '', $value);         $value = preg_replace('/[éèê]/i', 'e', $value);         $value = preg_replace('/[âà]/i', 'a', $value);         $value = preg_replace('/[öô]/i', 'o', $value);         $value = preg_replace('/[ûùü]/i', 'u', $value);         $value = preg_replace('/[îïíì]/i', 'i', $value);         $value = preg_replace('/[#]/i', 'sharp', $value);         $value = preg_replace('/[<>]/i', '-', $value);          if ($value[strlen($value) - 1] == '-')         {             $value = substr($value, 0, strlen($value) - 1);         }     }      return strtolower($value); } 

my issue title "théorie générale", "theeorie-geeneerale", "e" doubled. guess related charset cannot find way avoid it. of course, i'd have "theorie-generale".

thanks

you need use unicode modifier, u, when using unicode in regex.

so try:

function urlize($value) {     if ($value != null && trim($value) != '')     {         $value = preg_replace('/([\[\(].*[\]\)])/i', '', $value);         $value = preg_replace('/[\s]/i', '-', $value);         $value = preg_replace('/[,!?.;:\"\'&+\/]/i', '-', $value);         $value = preg_replace('/[-]+/i', '-', $value);         $value = preg_replace('/(^-)/i', '', $value);         $value = preg_replace('/-$/i', '', $value);         $value = preg_replace('/[éèê]/iu', 'e', $value);         $value = preg_replace('/[âà]/iu', 'a', $value);         $value = preg_replace('/[öô]/iu', 'o', $value);         $value = preg_replace('/[ûùü]/ui', 'u', $value);         $value = preg_replace('/[îïíì]/ui', 'i', $value);         $value = preg_replace('/[#]/i', 'sharp', $value);         $value = preg_replace('/[<>]/i', '-', $value);          if ($value[strlen($value) - 1] == '-')         {             $value = substr($value, 0, strlen($value) - 1);         }     }     return strtolower($value); } echo urlize('théorie générale'); 

demo: http://sandbox.onlinephpfunctions.com/code/3b7e5985dc23ac71a6298783d2dad646d875d3c8

output:

theorie-generale

you use | (or) grouping reduce number of preg_replace's have. might make regexs bit harder read though. use arrays finds , replaces. here's first approach.

function urlize($value) {     if ($value != null && trim($value) != '')     {         $value = preg_replace('/(([\[\(].*[\]\)])|(^-)|-$)/i', '', $value);         $value = preg_replace('/([,!?.;:\"\'&+\/]|[\s]|[-]+|[<>])/i', '-', $value);         $value = preg_replace('/[éèê]/iu', 'e', $value);         $value = preg_replace('/[âà]/iu', 'a', $value);         $value = preg_replace('/[öô]/iu', 'o', $value);         $value = preg_replace('/[ûùü]/ui', 'u', $value);         $value = preg_replace('/[îïíì]/ui', 'i', $value);         $value = preg_replace('/[#]/i', 'sharp', $value);         if ($value[strlen($value) - 1] == '-') {             $value = substr($value, 0, strlen($value) - 1);         }     }     return strtolower($value); } echo urlize('théorie générale'); 

Comments

Popular posts from this blog

c# - Better 64-bit byte array hash -

webrtc - Which ICE candidate am I using and why? -

php - Zend Framework / Skeleton-Application / Composer install issue -