Daryl Stultz
2018-10-05 13:28:31 UTC
Hello,
I am exploring Redis as a solution to a filtering challenge.
Suppose it is your job to pull chips off a conveyor belt. The chips come in
many different shapes and colors. If you are assigned to pull circles and
squares in red or blue I could make a set that has all the combinations. If
the chip's shape-color combination is in the set, you pull it. For 100
shapes and 100 colors this would be 10,000 rows in the set. Better I think
would be a set of shapes and a set of colors. If the shape is in the shape
set and the color is in the color set you pull it. That's only 200 total
rows.
Now suppose you are to also pull diamonds and triangles that are blue or
yellow. Note that you are not allowed to pull diamonds that are red as you
can pull squares that are red. The single-set solution would cover this,
every valid shape-color combination would be present but it would be too
slow. To use the 2-set approach would require multiple pairs of sets per
"rule". So one pair of shape-color sets (a "rule") for "circles and squares
in red or blue" and another pair of sets for "diamonds and triangles that
are blue or yellow". In order to determine if a chip is to be pulled we now
have to check each shape-color pair until we find a match or exhaust the
data. I don't know if this is a good approach nor what Redis provides to
support this. I'm not sure how I could keep track of all the rules. There's
no "iterating" that I can see in Redis, so maybe that approach isn't valid
anyway.
Using a document search "inverted index" approach I could create a set for
each shape and the contents of the set would be colors. So the circle set
would contain red and blue. The square set would also have red and blue in
it. So duplication of data but probably pretty fast. Is this the smartest
way to do this?
I've done some basic performance testing. If I have just one rule (one set
of shapes and one set of colors) with 100 entries each, I can do 1000 tests
(2000 calls total) in 18ms if I use multi/exec and 35ms if not. I can pull
all the data from both sets in 9ms. With a client-side memory model I can
do 1000 tests in less than 1ms, but of course I have the data structure
taking up memory. If I'm trying to filter a collection of 100 chips,
hitting Redis which each test probably isn't a big deal, but if I've got to
filter a million chips, probably not good to make a call to Redis for each
chip.
So thought appreciated on how best to tackle this problem. Thanks.
/Daryl
I am exploring Redis as a solution to a filtering challenge.
Suppose it is your job to pull chips off a conveyor belt. The chips come in
many different shapes and colors. If you are assigned to pull circles and
squares in red or blue I could make a set that has all the combinations. If
the chip's shape-color combination is in the set, you pull it. For 100
shapes and 100 colors this would be 10,000 rows in the set. Better I think
would be a set of shapes and a set of colors. If the shape is in the shape
set and the color is in the color set you pull it. That's only 200 total
rows.
Now suppose you are to also pull diamonds and triangles that are blue or
yellow. Note that you are not allowed to pull diamonds that are red as you
can pull squares that are red. The single-set solution would cover this,
every valid shape-color combination would be present but it would be too
slow. To use the 2-set approach would require multiple pairs of sets per
"rule". So one pair of shape-color sets (a "rule") for "circles and squares
in red or blue" and another pair of sets for "diamonds and triangles that
are blue or yellow". In order to determine if a chip is to be pulled we now
have to check each shape-color pair until we find a match or exhaust the
data. I don't know if this is a good approach nor what Redis provides to
support this. I'm not sure how I could keep track of all the rules. There's
no "iterating" that I can see in Redis, so maybe that approach isn't valid
anyway.
Using a document search "inverted index" approach I could create a set for
each shape and the contents of the set would be colors. So the circle set
would contain red and blue. The square set would also have red and blue in
it. So duplication of data but probably pretty fast. Is this the smartest
way to do this?
I've done some basic performance testing. If I have just one rule (one set
of shapes and one set of colors) with 100 entries each, I can do 1000 tests
(2000 calls total) in 18ms if I use multi/exec and 35ms if not. I can pull
all the data from both sets in 9ms. With a client-side memory model I can
do 1000 tests in less than 1ms, but of course I have the data structure
taking up memory. If I'm trying to filter a collection of 100 chips,
hitting Redis which each test probably isn't a big deal, but if I've got to
filter a million chips, probably not good to make a call to Redis for each
chip.
So thought appreciated on how best to tackle this problem. Thanks.
/Daryl
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+***@googlegroups.com.
To post to this group, send email to redis-***@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+***@googlegroups.com.
To post to this group, send email to redis-***@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.