python - Match a continuously repeated backreference in a given times and no more -


in simplified case, want extract repeated number(3 times) input string, 3 times , no more.

#match backreference(\d here) 2 more times #11222(333)34445 matched , consumed,  #then current position moves 11222333^34445 in [3]: re.findall(r'(\d)\1{2}','1122233334445') out[3]: ['2', '3', '4']  #try exclude 11222(333)34445 setting non-backreference(?!\1) #as negative lookahead assertion, skips match of  #11222^(333)34445, captured in next position #112223^(333)4445 in [4]: re.findall(r'(\d)\1{2}(?!\1)','1122233334445') out[4]: ['2', '3', '4']  #backreference cannot go before referenced group in [5]: re.findall(r'(?!\1)(\d)\1{2}(?!\1)','1122233334445') --------------------------------------------------------------------------- error                                     traceback (most recent call last) <ipython-input-5-a5837badf5bb> in <module>() ----> 1 re.findall(r'(?!\1)(\d)\1{2}(?!\1)','1122233334445')  /usr/lib/python2.7/re.pyc in findall(pattern, string, flags)     179      180     empty matches included in result.""" --> 181     return _compile(pattern, flags).findall(string)     182      183 if sys.hexversion >= 0x02020000:  /usr/lib/python2.7/re.pyc in _compile(*key)     249         p = sre_compile.compile(pattern, flags)     250     except error, v: --> 251         raise error, v # invalid expression     252     if not bypass_cache:     253         if len(_cache) >= _maxcache:  error: bogus escape: '\\1' 

but expect ['2','4'].

thank you.

you'd need backreference in lookbehind find borders between different digits, before matching sequence without consuming little supported among regex flavors. (\d)(?<!\1.)\1{2}(?!\1) works in .net not in python obviously.

an idea use the great trick @hwnd commented. of great performance downside of getting dispensable elements. idea find boundary between 2 different digits requirement capture inside lookbehind:

(?:^|(?<=(\d))(?!\1))(\d)\2{2}(?!\2) 
  • (?:^|(?<=(\d))(?!\1)) part lookbehind finding boundaries between different digits.
  • (\d)\2{2}(?!\2) 2nd capture-group captures digit \2. followed same digit @ least 2x - using negative lookahead not being followed same digit again.

this should give accurate matches requires more steps parser. see test @ regex101.


Comments

Popular posts from this blog

c# - Better 64-bit byte array hash -

webrtc - Which ICE candidate am I using and why? -

php - Zend Framework / Skeleton-Application / Composer install issue -