[redis-db] Filtering design

Discussion:

Daryl Stultz

2018-10-05 13:28:31 UTC

Hello,

I am exploring Redis as a solution to a filtering challenge.

Suppose it is your job to pull chips off a conveyor belt. The chips come in
many different shapes and colors. If you are assigned to pull circles and
squares in red or blue I could make a set that has all the combinations. If
the chip's shape-color combination is in the set, you pull it. For 100
shapes and 100 colors this would be 10,000 rows in the set. Better I think
would be a set of shapes and a set of colors. If the shape is in the shape
set and the color is in the color set you pull it. That's only 200 total
rows.

Now suppose you are to also pull diamonds and triangles that are blue or
yellow. Note that you are not allowed to pull diamonds that are red as you
can pull squares that are red. The single-set solution would cover this,
every valid shape-color combination would be present but it would be too
slow. To use the 2-set approach would require multiple pairs of sets per
"rule". So one pair of shape-color sets (a "rule") for "circles and squares
in red or blue" and another pair of sets for "diamonds and triangles that
are blue or yellow". In order to determine if a chip is to be pulled we now
have to check each shape-color pair until we find a match or exhaust the
data. I don't know if this is a good approach nor what Redis provides to
support this. I'm not sure how I could keep track of all the rules. There's
no "iterating" that I can see in Redis, so maybe that approach isn't valid
anyway.

Using a document search "inverted index" approach I could create a set for
each shape and the contents of the set would be colors. So the circle set
would contain red and blue. The square set would also have red and blue in
it. So duplication of data but probably pretty fast. Is this the smartest
way to do this?

I've done some basic performance testing. If I have just one rule (one set
of shapes and one set of colors) with 100 entries each, I can do 1000 tests
(2000 calls total) in 18ms if I use multi/exec and 35ms if not. I can pull
all the data from both sets in 9ms. With a client-side memory model I can
do 1000 tests in less than 1ms, but of course I have the data structure
taking up memory. If I'm trying to filter a collection of 100 chips,
hitting Redis which each test probably isn't a big deal, but if I've got to
filter a million chips, probably not good to make a call to Redis for each
chip.

So thought appreciated on how best to tackle this problem. Thanks.

/Daryl

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+***@googlegroups.com.
To post to this group, send email to redis-***@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

ddorian43

2018-10-06 09:08:39 UTC

Permalink

Smartest way is to explain your needs with simple sql strings.
Fastest way is a search-engine / inverted-index. There is redisearch or
other search-engine tools that do it.

Post by Daryl Stultz
Hello,
I am exploring Redis as a solution to a filtering challenge.
Suppose it is your job to pull chips off a conveyor belt. The chips come
in many different shapes and colors. If you are assigned to pull circles
and squares in red or blue I could make a set that has all the
combinations. If the chip's shape-color combination is in the set, you pull
it. For 100 shapes and 100 colors this would be 10,000 rows in the set.
Better I think would be a set of shapes and a set of colors. If the shape
is in the shape set and the color is in the color set you pull it. That's
only 200 total rows.
Now suppose you are to also pull diamonds and triangles that are blue or
yellow. Note that you are not allowed to pull diamonds that are red as you
can pull squares that are red. The single-set solution would cover this,
every valid shape-color combination would be present but it would be too
slow. To use the 2-set approach would require multiple pairs of sets per
"rule". So one pair of shape-color sets (a "rule") for "circles and squares
in red or blue" and another pair of sets for "diamonds and triangles that
are blue or yellow". In order to determine if a chip is to be pulled we now
have to check each shape-color pair until we find a match or exhaust the
data. I don't know if this is a good approach nor what Redis provides to
support this. I'm not sure how I could keep track of all the rules. There's
no "iterating" that I can see in Redis, so maybe that approach isn't valid
anyway.
Using a document search "inverted index" approach I could create a set for
each shape and the contents of the set would be colors. So the circle set
would contain red and blue. The square set would also have red and blue in
it. So duplication of data but probably pretty fast. Is this the smartest
way to do this?
I've done some basic performance testing. If I have just one rule (one set
of shapes and one set of colors) with 100 entries each, I can do 1000 tests
(2000 calls total) in 18ms if I use multi/exec and 35ms if not. I can pull
all the data from both sets in 9ms. With a client-side memory model I can
do 1000 tests in less than 1ms, but of course I have the data structure
taking up memory. If I'm trying to filter a collection of 100 chips,
hitting Redis which each test probably isn't a big deal, but if I've got to
filter a million chips, probably not good to make a call to Redis for each
chip.
So thought appreciated on how best to tackle this problem. Thanks.
/Daryl

Daryl Stultz

2018-10-07 13:23:57 UTC

Permalink

Post by ddorian43
Smartest way is to explain your needs with simple sql strings.

Given a shape, I might want to know what colors I can work with. So:

select Colors.* from Colors
join Rules on Rule.colorId = Colors.colorId
where Rules.shapeId = 1234;

Also, the inverse, given a color, what shapes can I work with.

/Daryl

Dorian Hoxha

2018-10-07 20:15:10 UTC

Permalink

Can you remove the joins ? By denormalizing the data ?

Post by Daryl Stultz

Post by ddorian43
Smartest way is to explain your needs with simple sql strings.

select Colors.* from Colors
join Rules on Rule.colorId = Colors.colorId
where Rules.shapeId = 1234;
Also, the inverse, given a color, what shapes can I work with.
/Daryl
--
You received this message because you are subscribed to a topic in the
Google Groups "Redis DB" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/redis-db/l_I40jebQAI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Daryl Stultz

2018-10-08 00:25:19 UTC

Permalink

Post by Dorian Hoxha
Can you remove the joins ? By denormalizing the data ?

The way this problem is presently solved is using views in the (PostgreSQL)
database where such a Cartesian is produced, all Shape-Color pairs that are
valid are present. The data has grown so much that performance is a big
problem. Yes I could do that but the dataset would be very large and thus
searching it would be slow. 100 shapes and 100 colors would produce a data
set of 10,000 elements. Unless I'm not understanding how one would take
advantage of Redis to denormalize.

Presently I'm creating Sets in (Java) memory, encoding into byte arrays
(Kryo) and storing them as a plain String. Fetching from Redis (localhost),
decoding and searching in memory is pretty fast, 5-20ms, so it's very
workable, especially if I put a near cache in front of it.

I was just hoping to learn how problems like this can be solved with Redis
data structures instead of just using it as a cache.

Thanks.

/Daryl

Continue reading on narkive:

Search results for '[redis-db] Filtering design' (Questions and Answers)

replies

Google's SafeSearch filtering?

started 2008-10-30 18:18:02 UTC

google

replies

questions about video on PC (refresh rate, anisotropic filtering, etc.)?

started 2008-09-06 18:24:34 UTC

replies

my school has blocked all the tunneling sites that i use to get past light speed internet filtering.?

started 2009-08-26 07:55:19 UTC