utf 8 - PHP - preg_replace and UTF8 -
i'm retrieving data database. receive array contains 'title' index utf8 encoded value. i'd use value name of file in saved, i'm doing this:
file_put_contents($filename, $content);
where $filename is
'-' . $category['root'] . '-articles-' . $category['id'] . '-' . $this->urlize($category['category'])
here code of "urlize" :
private function urlize($value) { if ($value != null && trim($value) != '') { $value = preg_replace('/([\[\(].*[\]\)])/i', '', $value); $value = preg_replace('/[\s]/i', '-', $value); $value = preg_replace('/[,!?.;:\"\'&+\/]/i', '-', $value); $value = preg_replace('/[-]+/i', '-', $value); $value = preg_replace('/(^-)/i', '', $value); $value = preg_replace('/-$/i', '', $value); $value = preg_replace('/[éèê]/i', 'e', $value); $value = preg_replace('/[âà]/i', 'a', $value); $value = preg_replace('/[öô]/i', 'o', $value); $value = preg_replace('/[ûùü]/i', 'u', $value); $value = preg_replace('/[îïíì]/i', 'i', $value); $value = preg_replace('/[#]/i', 'sharp', $value); $value = preg_replace('/[<>]/i', '-', $value); if ($value[strlen($value) - 1] == '-') { $value = substr($value, 0, strlen($value) - 1); } } return strtolower($value); }
my issue title "théorie générale", "theeorie-geeneerale", "e" doubled. guess related charset cannot find way avoid it. of course, i'd have "theorie-generale".
thanks
you need use unicode modifier, u
, when using unicode in regex.
so try:
function urlize($value) { if ($value != null && trim($value) != '') { $value = preg_replace('/([\[\(].*[\]\)])/i', '', $value); $value = preg_replace('/[\s]/i', '-', $value); $value = preg_replace('/[,!?.;:\"\'&+\/]/i', '-', $value); $value = preg_replace('/[-]+/i', '-', $value); $value = preg_replace('/(^-)/i', '', $value); $value = preg_replace('/-$/i', '', $value); $value = preg_replace('/[éèê]/iu', 'e', $value); $value = preg_replace('/[âà]/iu', 'a', $value); $value = preg_replace('/[öô]/iu', 'o', $value); $value = preg_replace('/[ûùü]/ui', 'u', $value); $value = preg_replace('/[îïíì]/ui', 'i', $value); $value = preg_replace('/[#]/i', 'sharp', $value); $value = preg_replace('/[<>]/i', '-', $value); if ($value[strlen($value) - 1] == '-') { $value = substr($value, 0, strlen($value) - 1); } } return strtolower($value); } echo urlize('théorie générale');
demo: http://sandbox.onlinephpfunctions.com/code/3b7e5985dc23ac71a6298783d2dad646d875d3c8
output:
theorie-generale
you use |
(or) grouping reduce number of preg_replace's have. might make regexs bit harder read though. use arrays finds , replaces. here's first approach.
function urlize($value) { if ($value != null && trim($value) != '') { $value = preg_replace('/(([\[\(].*[\]\)])|(^-)|-$)/i', '', $value); $value = preg_replace('/([,!?.;:\"\'&+\/]|[\s]|[-]+|[<>])/i', '-', $value); $value = preg_replace('/[éèê]/iu', 'e', $value); $value = preg_replace('/[âà]/iu', 'a', $value); $value = preg_replace('/[öô]/iu', 'o', $value); $value = preg_replace('/[ûùü]/ui', 'u', $value); $value = preg_replace('/[îïíì]/ui', 'i', $value); $value = preg_replace('/[#]/i', 'sharp', $value); if ($value[strlen($value) - 1] == '-') { $value = substr($value, 0, strlen($value) - 1); } } return strtolower($value); } echo urlize('théorie générale');
Comments
Post a Comment