[redis-db] small key cost long running time

Discussion:

Romber Li

2018-11-14 09:14:08 UTC

Hi all,

We are facing a problem that a very small key(even 100bytes) may cost over 100ms, even 1000ms.

Although not all operation will use that much time,

But that may happen tens or hundreds times a day.

this key cost 11.325ms and it's only a 495bytes string type key.

We use 48C/128G server as host machine, on that we run two 4C/32G virtual machines as redis server,

And run two instances on each virtual machine, each redis instance has 12GB RAM.

We also disabled THP and swap, no other heavy service running on the virtual machines, except monitoring agent, such as Zabbix agent, devops agent, pmm agent(percona monitoring and management)âŠ etc.

Workload on redis server is not heavy, ops is less than 10000,

Both of RDB and AOF are disabled,

We dump an RDB file every night on slave node, in order to analyze key size,

We use redis server 4.0.8, and use volatile-lfu as key evict strategy,

and the only reason I guess is the evict strategy, though I don't have any provement.

any help is welcomed, thanks in advance.

Here is the output of INFO command:

# Server

redis_version:4.0.8

redis_git_sha1:00000000

redis_git_dirty:0

redis_build_id:2bedfbe6a617df94

redis_mode:cluster

os:Linux 3.10.0-862.9.1.el7.x86_64 x86_64

arch_bits:64

multiplexing_api:epoll

atomicvar_api:atomic-builtin

gcc_version:4.8.5

process_id:8201

run_id:7949cca0c8746f22cc799fd63f396b20864a280b

tcp_port:7001

uptime_in_seconds:510290

uptime_in_days:5

hz:10

lru_clock:15452891

executable:/data/xxx/xxx/redis-server

config_file:/data/xxx/xxx/redis.conf

# Clients

connected_clients:307

client_longest_output_list:0

client_biggest_input_buf:0

blocked_clients:0

# Memory

used_memory:1614857128

used_memory_human:1.50G

used_memory_rss:1727164416

used_memory_rss_human:1.61G

used_memory_peak:1665041808

used_memory_peak_human:1.55G

used_memory_peak_perc:96.99%

used_memory_overhead:220949002

used_memory_startup:1442768

used_memory_dataset:1393908126

used_memory_dataset_perc:86.39%

total_system_memory:33567760384

total_system_memory_human:31.26G

used_memory_lua:37888

used_memory_lua_human:37.00K

maxmemory:12884901888

maxmemory_human:12.00G

maxmemory_policy:volatile-lfu

mem_fragmentation_ratio:1.07

mem_allocator:jemalloc-4.0.3

active_defrag_running:0

lazyfree_pending_objects:0

I uploaded some screenshot of monitor data, you can see that workload is not high.

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+***@googlegroups.com.
To post to this group, send email to redis-***@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

h***@gmail.com

2018-11-14 16:56:51 UTC

Permalink

You've described your server in a lot of detail, but you haven't described
your client.

Does your client software open connections to the Redis server and keep
them open, sending new commands and getting the responses through the
already opened TCP connection? Or does your client open a new connection
for each command/response and close the connection afterward?

Keeping connections open is an excellent way to avoid the latency caused by
opening a new TCP connection. Opening connections for each command and
closing them again is a very common source of latency - particularly
inconsistent latency like the kind you're seeing.

See the discussion in Reddit started by Trivago's blog post about their
experiences with Redis, one of which was a latency issue like this:
https://www.reddit.com/r/redis/comments/5q5ddr/learn_redis_the_hard_way_in_production/
Their blog post also talks about several other design choices that can
limit the performance of Redis server, so it's something of a roadmap of
common mistakes that you can see and avoid.

Post by Romber Li
Hi all,
We are facing a problem that a very small key(even 100bytes) may cost
over 100ms, even 1000ms.
Although not all operation will use that much time,
But that may happen tens or hundreds times a day.
this key cost 11.325ms and it's only a 495bytes string type key.
We use 48C/128G server as host machine, on that we run two 4C/32G virtual
machines as redis server,
And run two instances on each virtual machine, each redis instance has 12GB RAM.
We also disabled THP and swap, no other heavy service running on the
virtual machines, except monitoring agent, such as Zabbix agent, devops
agent, pmm agent(percona monitoring and management)âŠ etc.
Workload on redis server is not heavy, ops is less than 10000,
Both of RDB and AOF are disabled,
We dump an RDB file every night on slave node, in order to analyze key size,
We use redis server 4.0.8, and use volatile-lfu as key evict strategy,
and the only reason I guess is the evict strategy, though I don't have any provement.
any help is welcomed, thanks in advance.
# Server
redis_version:4.0.8
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:2bedfbe6a617df94
redis_mode:cluster
os:Linux 3.10.0-862.9.1.el7.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:4.8.5
process_id:8201
run_id:7949cca0c8746f22cc799fd63f396b20864a280b
tcp_port:7001
uptime_in_seconds:510290
uptime_in_days:5
hz:10
lru_clock:15452891
executable:/data/xxx/xxx/redis-server
config_file:/data/xxx/xxx/redis.conf
# Clients
connected_clients:307
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
# Memory
used_memory:1614857128
used_memory_human:1.50G
used_memory_rss:1727164416
used_memory_rss_human:1.61G
used_memory_peak:1665041808
used_memory_peak_human:1.55G
used_memory_peak_perc:96.99%
used_memory_overhead:220949002
used_memory_startup:1442768
used_memory_dataset:1393908126
used_memory_dataset_perc:86.39%
total_system_memory:33567760384
total_system_memory_human:31.26G
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:12884901888
maxmemory_human:12.00G
maxmemory_policy:volatile-lfu
mem_fragmentation_ratio:1.07
mem_allocator:jemalloc-4.0.3
active_defrag_running:0
lazyfree_pending_objects:0
I uploaded some screenshot of monitor data, you can see that workload is not high.

Romber Li

2018-11-15 01:32:06 UTC

Permalink

hi hvarzan,

first of all, thanks for your reply.

we use redis connection pool, In particular JedisPool.

and I'm afraid this kind of latency is none business of client side,

because time cost recorded in slowlog is only redis process the key,

it does not calculate connection, nor queue time.

this time means redis process this small size of key cost long time.

åä»¶äºº: <redis-***@googlegroups.com> ä»£è¡š <***@gmail.com>
çå€: <redis-***@googlegroups.com>
æ¥æ: 2018å¹Ž11æ15æ¥ ææå 00:56
æ¶ä»¶äºº: Redis DB <redis-***@googlegroups.com>
äž»é¢: [redis-db] Re: small key cost long running time

You've described your server in a lot of detail, but you haven't described your client.

Does your client software open connections to the Redis server and keep them open, sending new commands and getting the responses through the already opened TCP connection? Or does your client open a new connection for each command/response and close the connection afterward?

Keeping connections open is an excellent way to avoid the latency caused by opening a new TCP connection. Opening connections for each command and closing them again is a very common source of latency - particularly inconsistent latency like the kind you're seeing.

See the discussion in Reddit started by Trivago's blog post about their experiences with Redis, one of which was a latency issue like this: https://www.reddit.com/r/redis/comments/5q5ddr/learn_redis_the_hard_way_in_production/

Their blog post also talks about several other design choices that can limit the performance of Redis server, so it's something of a roadmap of common mistakes that you can see and avoid.

On Wednesday, November 14, 2018 at 2:04:15 AM UTC-8, Romber Li wrote:

Hi all,

We are facing a problem that a very small key(even 100bytes) may cost over 100ms, even 1000ms.

Although not all operation will use that much time,

But that may happen tens or hundreds times a day.

this key cost 11.325ms and it's only a 495bytes string type key.

We use 48C/128G server as host machine, on that we run two 4C/32G virtual machines as redis server,

And run two instances on each virtual machine, each redis instance has 12GB RAM.

We also disabled THP and swap, no other heavy service running on the virtual machines, except monitoring agent, such as Zabbix agent, devops agent, pmm agent(percona monitoring and management)âŠ etc.

Workload on redis server is not heavy, ops is less than 10000,

Both of RDB and AOF are disabled,

We dump an RDB file every night on slave node, in order to analyze key size,

We use redis server 4.0.8, and use volatile-lfu as key evict strategy,

and the only reason I guess is the evict strategy, though I don't have any provement.

any help is welcomed, thanks in advance.

Here is the output of INFO command:

# Server

redis_version:4.0.8

redis_git_sha1:00000000

redis_git_dirty:0

redis_build_id:2bedfbe6a617df94

redis_mode:cluster

os:Linux 3.10.0-862.9.1.el7.x86_64 x86_64

arch_bits:64

multiplexing_api:epoll

atomicvar_api:atomic-builtin

gcc_version:4.8.5

process_id:8201

run_id:7949cca0c8746f22cc799fd63f396b20864a280b

tcp_port:7001

uptime_in_seconds:510290

uptime_in_days:5

hz:10

lru_clock:15452891

executable:/data/xxx/xxx/redis-server

config_file:/data/xxx/xxx/redis.conf

# Clients

connected_clients:307

client_longest_output_list:0

client_biggest_input_buf:0

blocked_clients:0

# Memory

used_memory:1614857128

used_memory_human:1.50G

used_memory_rss:1727164416

used_memory_rss_human:1.61G

used_memory_peak:1665041808

used_memory_peak_human:1.55G

used_memory_peak_perc:96.99%

used_memory_overhead:220949002

used_memory_startup:1442768

used_memory_dataset:1393908126

used_memory_dataset_perc:86.39%

total_system_memory:33567760384

total_system_memory_human:31.26G

used_memory_lua:37888

used_memory_lua_human:37.00K

maxmemory:12884901888

maxmemory_human:12.00G

maxmemory_policy:volatile-lfu

mem_fragmentation_ratio:1.07

mem_allocator:jemalloc-4.0.3

active_defrag_running:0

lazyfree_pending_objects:0

I uploaded some screenshot of monitor data, you can see that workload is not high.
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+***@googlegroups.com.
To post to this group, send email to redis-***@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Xingbo Wang

2018-11-14 16:35:35 UTC

Permalink

Hi Romber:

I can thing of 2 potential reasons.

1. You mentioned your Redis process is running inside a 4 Core virtual
machine which runs on a 48Core physical machine. The physical machine looks
like pretty over-provisioned. You mentioned the virtual machine load is
very light. However, how is the load on the physical machine? If the
physical machine is busy, the virtual machine could be paused causing P100
latency high. Could you run the same experiment on a physical machine with
light load and see whether the problem still exist?

2. Redis performs incremental rehashing during the command execution, which
could increase the P100 latency. Do you have any metric on the total number
of keys in your Redis instance? The rehash usually happens when the number
of items growed to be more than, or shrinked to less than 10% of the number
of hash buckets.

At last, I think this link is quite helpful for trouble shooting high
latency issue. https://redis.io/topics/latency

Thanks,
Shawn