c++ - Finding most common sextuplets in a huge set of numbers -
i have matrix 6000x20, containing numbers range 1-80. need find sextuplets appear in matrix. need efficient , fastest solution. current solution steps: 1.i take first row matrix , generate sextuplets (38600 in 1 row) 2. compare each of sextuple other 5999 rows , count them 3. write them file because memory gets full pretty fast 4. take 2nd row , generate sextuplets , steps again
this algorithm pretty bad , im aware of because have 38600x6000 comparisons , possible file writing, , have lots of sextuplets repetition. cant know because cant use variables of size.
i need algorithmic solution can write in matlab/java/c++/python
since values range 1-80, total number of possible sextuplets "only" tad on 300 million (300,500,200 precise). since there 6000 rows in matrix, maximum count sextuplet 6000, count comfortably fit in two-byte integer (uint16_t
, assuming exists in c++ implementation). 3 hundred million two-byte integers total 600 megabytes, have available.
so simple algorithm create count vector, initialized zeros, , iterate on rows in matrix; each row, iterate on 38,760 sextuplets, , each sextuplet increment corresponding count.
the trick figuring out element in count vector corresponds given set of 6 numbers. happens, not difficult long numbers in sextuplet in order smallest largest. (that's not limitation, since need have canonical order sextuplet, , sorted in order simple canonical ordering.)
to see how generate indices, consider how (hypothetically) enumerate 300,500,200 combinations of 6 integers set {1..80}. first, enumerate combinations start 1, , continue 5 integers set {2..80}. then, enumerate combinations start 2, , continue 5 integers set {3..80}. then, enumerate combinations start 3 , continue 5 integers set {4..80}. , on. inside enumeration of each starting point, apply same algorithm recursively.
now, let's turn enumeration on head. suppose have sextuplet {a,b,c,d,e,f}
. let's ask how many sextuplets come after sextuplet?
first, sextuplets start value greater
a
. since sextuplets ordered, if sextuplet starts value greatera
, of values greatera
, means combination of 6 values set{a+1..80}
, of there 80-ac6.then, there sextuplets start
a
, continue quintuplet first value greaterb
. same logic in above point, number of such sextuplets 80-bc5.then, there sextuplets start
a,b
, continue quartet first value greaterc
: total of 80-cc4etc.
so total number of sextuplets following {a,b,c,d,e,f}
precisely:
80-ac6+80-bc5+80-cc4+80-dc3+80-ec2+80-fc1
the interesting above equation there no interaction between 6 variables. compute value creating 6 lookup tables values of 80-xci values of x 1 80, , values of 1 6. can compute (inverse) index of sextuplets doing 6 lookups , adding values together. (if needed to, subtract inverse index total number of combinations forward index. in case, need bijection combinations integers, , inverse index work fine.)
at end of algorithm, necessary turn count indices sextuplets. can done using same lookup tables, doing succession of binary searches: first, find value of a
binary searching in lookup table i==6, subtract corresponding index , continue search remainder in lookup table i==5, etc. (in practice, since lookup table small, might turn out linear search faster binary search. doesn't make difference.)
Comments
Post a Comment