python - Match a continuously repeated backreference in a given times and no more -
in simplified case, want extract repeated number(3 times) input string, 3 times , no more.
#match backreference(\d here) 2 more times #11222(333)34445 matched , consumed, #then current position moves 11222333^34445 in [3]: re.findall(r'(\d)\1{2}','1122233334445') out[3]: ['2', '3', '4'] #try exclude 11222(333)34445 setting non-backreference(?!\1) #as negative lookahead assertion, skips match of #11222^(333)34445, captured in next position #112223^(333)4445 in [4]: re.findall(r'(\d)\1{2}(?!\1)','1122233334445') out[4]: ['2', '3', '4'] #backreference cannot go before referenced group in [5]: re.findall(r'(?!\1)(\d)\1{2}(?!\1)','1122233334445') --------------------------------------------------------------------------- error traceback (most recent call last) <ipython-input-5-a5837badf5bb> in <module>() ----> 1 re.findall(r'(?!\1)(\d)\1{2}(?!\1)','1122233334445') /usr/lib/python2.7/re.pyc in findall(pattern, string, flags) 179 180 empty matches included in result.""" --> 181 return _compile(pattern, flags).findall(string) 182 183 if sys.hexversion >= 0x02020000: /usr/lib/python2.7/re.pyc in _compile(*key) 249 p = sre_compile.compile(pattern, flags) 250 except error, v: --> 251 raise error, v # invalid expression 252 if not bypass_cache: 253 if len(_cache) >= _maxcache: error: bogus escape: '\\1'
but expect ['2','4']
.
thank you.
you'd need backreference in lookbehind find borders between different digits, before matching sequence without consuming little supported among regex flavors. (\d)(?<!\1.)\1{2}(?!\1)
works in .net not in python obviously.
an idea use the great trick @hwnd commented. of great performance downside of getting dispensable elements. idea find boundary between 2 different digits requirement capture inside lookbehind:
(?:^|(?<=(\d))(?!\1))(\d)\2{2}(?!\2)
(?:^|(?<=(\d))(?!\1))
part lookbehind finding boundaries between different digits.(\d)\2{2}(?!\2)
2nd capture-group captures digit\2
. followed same digit @ least 2x - using negative lookahead not being followed same digit again.
this should give accurate matches requires more steps parser. see test @ regex101.
Comments
Post a Comment