c# - Most effective way to lookup a substring C of string B in string A in LINQ -
having 2 strings like:
string = "attagacctgccggaa"; string b = "gccggaatac"; i delete part common in both strings , rest concatenate it. have tell need delete left matched part get
input
attagacctgccggaa gccggaatac output
attagacctgccggaatac firstly thought use pattern , seacrh it, not possible not know pattern in advance (the length of matched chars variable)
then thought on search whole string b in a if had no succes, delete char in string a (last 1 since want preserve left unmatched string) , loop until have no more chars in b like
string = "attagacctgccggaa"; string b = "gccggaatac"; int times = b.length; string wantedstring = string.empty; string auxstring = b; while (times > 0) { if (!a.contains(auxstring)) { //save last char , delete auxstring wantedstring += auxstring[auxstring.length - 1]; auxstring = auxstring.trimend(auxstring[auxstring.length - 1]); } else break; times--; } //reverse string char[] reversedtoappend = wantedstring.tochararray(); array.reverse(reversedtoappend); string toappend = new string(reversedtoappend); so answer a + toappend ;
there way make more efficient? (maybe in linq?)
edit
as @lavin points out correctly c can occur anywhere in a, while being prefix of b. instance if a=aat , b=aag, code should return aatg. reason because common string starting on left c=aa. delete b , a=aat resulting g
aat aag resulting
aatg other example be:
a=atttgggccgcgcgcgaaaaccccgcg b= aaccccgcgcgca here
c= aaccccgcg so result should be
result = atttgggccgcgcgcgaaaaccccgcgcgca
(all arrays , strings 0 based in answer)
first want point out op's problem confusing. assume c common part of a , b, op's example of input , output suggest c needs suffix of a, , prefix of b @ same time. see of answers above adopted understanding of problem.
however, original implementation provided op suggests that, c can occur anywhere in a, while being prefix of b, because using of a.contains(auxstring). means a=aat , b=aag, code return aatg. other people's answers return aataag.
so there 2 possible interpretation of problem. please clarify.
second, assuming size of first string a n, , second string b m, unlike o(n*m) solution provided in original solution , existing answers, o(n+m) algorithm can achieved using of following: kmp, suffix array, suffix tree, z-algorithm.
i'll briefly describe how use z-algorithm solve problem here, since seems less mentioned on stackoverflow compared others.
about details of z-algorithm, see http://www.cs.umd.edu/class/fall2011/cmsc858s/lec02-zalg.pdf
basically string s of length l, calculates array z of length l, in z[i] equals longest common prefix of s , s[i:] (s[i:] means substring of s starting position i).
for problem, combine strings a , b d=b+a (b in front of a), , calculates z array of combined string d. using z array, can figure out longest prefix of b occurs in a.
for possible interpretation 1 of problem, in c needs suffix of a , prefix of b:
max_prefix = 0 in range(m, n+m): if z[i] == n+m - i: if z[i] > max_prefix: max_prefix = z[i] and answer be:
a+b[max_prefix:] for possible interpretation 2 of problem, in c needs prefix of b, , can anywhere in a:
max_prefix = 0 in range(m, n+m): if z[i] > max_prefix: max_prefix = z[i] again answer be:
a+b[max_prefix:] the difference in 2 cases line:
if z[i] == n+m-i: to understand line, remember z[i] longest common prefix of strings d , d[i:], then:
- note
d=b+a - we enumerate
imm+n-1, that's range ofaind.d[i:]equala[i-m:]. , length ofa[i-m:]n-(i-m)=n+m-i. - since
dstartsb, checking ifz[i]equaln+m-i, checking ifa[i-m:]prefix ofb. if indeed equal, found common stringc, prefix ofb, , suffix ofa. - without line, know found string
cprefix ofb, , occurs inastarting positioni, , not guaranteed reach end ofa.
Comments
Post a Comment