Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PartitionSelectorException during refresh of Partitions #2178

Closed
tadashiya opened this issue Aug 9, 2022 · 4 comments · Fixed by #2179
Closed

PartitionSelectorException during refresh of Partitions #2178

tadashiya opened this issue Aug 9, 2022 · 4 comments · Fixed by #2179
Labels
type: regression A regression from a previous release
Milestone

Comments

@tadashiya
Copy link
Contributor

tadashiya commented Aug 9, 2022

Bug Report

Related to #2146, I encountered PartitionSelectorException.
When I use enablePeriodicRefresh(), in some case, masterCache = EMPTY and PooledClusterConnectionProvider fails to get a master node.
There is a time gap between invalidateCache() and this.masterCache = masterCache in Partition.java.
https://github.com/lettuce-io/lettuce-core/blob/a60560e693b3f73acc3bc6947fa78735dc0dd547/src/main/java/io/lettuce/core/cluster/models/partitions/Partitions.java#L163-L167

Current Behavior

In some case, PartitionSelectorException occurs, when I use enablePeriodicRefresh().

Stack trace
Exception in thread "main" io.lettuce.core.cluster.PartitionSelectorException: Cannot determine a partition for slot 4241.
	at io.lettuce.core.cluster.PooledClusterConnectionProvider.getWriteConnection(PooledClusterConnectionProvider.java:164)
	at io.lettuce.core.cluster.PooledClusterConnectionProvider.getConnectionAsync(PooledClusterConnectionProvider.java:149)
	at io.lettuce.core.cluster.ClusterDistributionChannelWriter.doWrite(ClusterDistributionChannelWriter.java:170)
	at io.lettuce.core.cluster.ClusterDistributionChannelWriter.write(ClusterDistributionChannelWriter.java:103)
	at io.lettuce.core.RedisChannelHandler.dispatch(RedisChannelHandler.java:218)
	at io.lettuce.core.cluster.StatefulRedisClusterConnectionImpl.dispatch(StatefulRedisClusterConnectionImpl.java:216)
	at io.lettuce.core.AbstractRedisAsyncCommands.dispatch(AbstractRedisAsyncCommands.java:676)
	at io.lettuce.core.AbstractRedisAsyncCommands.setex(AbstractRedisAsyncCommands.java:1700)
	at jdk.internal.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at io.lettuce.core.cluster.ClusterFutureSyncInvocationHandler.handleInvocation(ClusterFutureSyncInvocationHandler.java:122)
	at io.lettuce.core.internal.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:80)

Input Code

Input Code
@SpringBootApplication
class DemoApplication

fun main(args: Array<String>) {
    val context = runApplication<DemoApplication>(*args)

    @Suppress("UNCHECKED_CAST")
    val syncCommands =
        context.getBean(RedisClusterCommands::class.java) as RedisClusterCommands<String, String>
    for (i in 1..10000) {
        syncCommands.setex("key$i", 10, "value$i")
        if (i % 100 == 0) {
            println(i)
        }
    }
}

@Configuration
@EnableConfigurationProperties(RedisClientProperties::class)
class RedisClientConfiguration(
    private val props: RedisClientProperties
) {

    @Bean
    fun getCommands(): RedisClusterCommands<String, String> {
        return build().connect().sync()
    }

    private fun build(): RedisClusterClient {
        val redisURIs =
            props.nodes.map {
                val (host, port) = it.split(":")
                RedisURI.builder()
                    .withHost(host)
                    .withPassword(props.password as CharSequence)
                    .withAuthentication(props.userName, props.password as CharSequence)
                    .withPort(port.toInt())
                    .build()
            }
        val redisClusterClient = RedisClusterClient.create(redisURIs)
        val topologyRefreshOptions = ClusterTopologyRefreshOptions.builder()
            .enablePeriodicRefresh(Duration.ofSeconds(2)) // Make it happen easily.
            .build()
        val clientOptions = ClusterClientOptions
            .builder()
            .topologyRefreshOptions(topologyRefreshOptions)
            .build()
        redisClusterClient.setOptions(clientOptions)
        return redisClusterClient
    }
}

Expected behavior/code

In Partitions#updateCache(), masterCache shouldn't be empty.

Environment

  • Lettuce version(s): 6.2.0.RELEASE
  • Redis version: 6.2.7, 4.0.10

Possible Solution

Move invalidateCache() to if clause following.

@be-hase
Copy link
Contributor

be-hase commented Sep 7, 2022

I had the same issue in my environment.

Please check when you have time. @mp911de 🙏 🙏

@mp911de mp911de added the type: feature A new feature label Oct 7, 2022
@mp911de mp911de changed the title PartitionSelectorException in PooledClusterConnectionProvider when using enablePeriodicRefresh PartitionSelectorException during refresh of Partitions Oct 7, 2022
@mp911de
Copy link
Collaborator

mp911de commented Oct 7, 2022

Thanks for the report. It is indeed a subtle change, but you're right, there is a gap between the invalidation of the topology cache and the cache rebuild. It makes sense to close that gap.

@mp911de mp911de added this to the 6.2.1.RELEASE milestone Oct 7, 2022
mp911de pushed a commit that referenced this issue Oct 7, 2022
Original pull request: #2179.
Co-authored-by: Tadashi Yamasaki <[email protected]>
mp911de pushed a commit that referenced this issue Oct 7, 2022
Original pull request: #2179.
Co-authored-by: Tadashi Yamasaki <[email protected]>
@mp911de mp911de linked a pull request Oct 7, 2022 that will close this issue
4 tasks
@mp911de mp911de closed this as completed Oct 7, 2022
@tadashiya
Copy link
Contributor Author

Thank you for your work.

@mp911de mp911de added type: regression A regression from a previous release and removed type: feature A new feature labels Oct 7, 2022
@mp911de
Copy link
Collaborator

mp911de commented Oct 7, 2022

Reclassifying this as regression because in 6.1.x, we had the exact same behavior as the fix you provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: regression A regression from a previous release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants