关于使用k8s configmap作为持久化策略时潜在的压力风险与写一致性问题 #89

24kpure · 2024-11-13T07:19:15Z

我已经检查过discussions
我已经搜索过issues
我已经仔细检查过FAQ

描述bug（不确定）

在KubernetesManager.java中updateConfigMap方法中将配置持久化至configmap，如果存在海量pod订阅相关配置，即便有先比对配置，也有可能大量节点会同步进行更新请求

瞬时大量对k8s服务有额外的压力
大量409请求在随机sleep等待后，~~仍无法确保readNamespacedConfigMap拿到的配置一定是最新的值~~ 重试似乎也无法一定确保（概率较低）持久化成功

期望

仅有少量节点（甚至1个）处理配置持久化的任务，减低k8s负载与潜在的写一致性问题

The text was updated successfully, but these errors were encountered:

nobodyiam · 2024-11-16T02:52:40Z

@dyx1234 @shoothzj 对这个问题有啥建议吗？

hezhangjian · 2024-11-16T03:29:59Z

我认为在超过200+pod监听同一个配置下使用当前的configmap缓存机制确实不一定是个很好的主意，不过这应该在100左右的pod下运行正常。你觉得海量是多少呢？

仅有少量节点（甚至1个）处理配置持久化的任务，减低k8s负载与潜在的写一致性问题
这对于长连接的客户端来说很难协商，如果是这样子，我比较倾向于之前讨论的，新增一个负载去保存在configmap，其他pod只读。 cc @nobodyiam

24kpure · 2024-11-18T06:33:03Z

个人认为核心点是如何从pods中挑选出一个能被所有pod认可的pod进行写操作，本质是一个一致性协调的问题。考虑到场景是本地缓存，不能依靠apollo进行协调，那只能通过k8s进行协调，这边提一个方案，各位老师可以评估下是否可行：
通过pod获取pod应用名，读取同应用下的所有pod列表，通过创建时间排序获取最早创建的pod（通过其它排序规则也可以，主要是为了保证一致性），这样各个节点都可以认可该pod进行写操作，后续选中pod销毁也可以通过时间排序自动选出新的pod。伪代码如下：

                String podName = System.getenv("HOSTNAME");
                V1Node localNode = coreV1Api.readNode(podName, null);
                String appName = localNode.getMetadata().getLabels().get("app");
                String labelSelector = "app=" + appName;

                V1PodList v1PodList = coreV1Api.listNamespacedPod(k8sNamespace, null, null,
                        null, null, labelSelector,
                        null, null, null
                        , null, null);
                List<V1Pod> pods = v1PodList.getItems();
                pods.sort(Comparator.comparing(pod -> pod.getMetadata().getCreationTimestamp()));
                V1Pod selectorPod = pods.get(0);

                if(!podName.equals(selectorPod.getMetadata().getName())){
                    return ; 
                }
                
                //do persist

24kpure · 2024-11-25T02:45:45Z

@nobodyiam @shoothzj @dyx1234 CC

nobodyiam · 2024-11-25T11:45:04Z

这个是类似选主的思路，可以是一种解法

hezhangjian · 2024-11-26T01:01:58Z

这个方案想做的完备比较困难，很难理论上做出CA的效果，大概率做出来是个AP的选主方案。对于配置中心来说，不是一个很好的语义。

24kpure · 2024-11-26T08:16:42Z

这个方案想做的完备比较困难，很难理论上做出CA的效果，大概率做出来是个AP的选主方案。对于配置中心来说，不是一个很好的语义。

对于配置中心来说，配置会倾向于CA。但是配置做持久化存储本身，就是一种倾向于AP的解法，所以是不是不必细究CA还是AP？
Ok，我们仔细梳理下需求的目的：减低k8s负载与潜在的写一致性问题 。

降负载的实现应该没什么争议，控制写节点的数量。
写一致性的问题，原实现就存在理论不一致的可能，控制写节点数量可以一定程度上缓解写一致问题（减少碰撞概率）

梳理完需求，控制写节点的数量的方向是能满足需求的，接下来就是实现细节，是一步到位限制仅有一个节点可写（选主）还是每次允许少量节点（例如2-3个）节点写入的实现更好。

选主方案简单
多节点写入的方案数据一致性会更好
如果配置数据是全量快照，个人倾向于选主的方案，几位member可以再考量下

btw，请教一下，在com.ctrip.framework.apollo.kubernetes.KubernetesManager#updateConfigMap中的 finalExistingData如果是data的父集，是否会导致持久化配置与云端配置不一致？

public interface RepositoryChangeListener {
  /**
   * Invoked when config repository changes.
   * @param namespace the namespace of this repository change
   * @param newProperties the properties after change
   */
  void onRepositoryChange(String namespace, Properties newProperties);
}

 // Determine if the data contains its own kv and de-weight it
                Map<String, String> finalExistingData = existingData;
                //除了全元素比对，是否考虑比对下size？
                boolean containsEntry = data.entrySet().stream()
                        .allMatch(entry -> entry.getValue().equals(finalExistingData.get(entry.getKey())));

                if (containsEntry) {
                    logger.info("Data is identical or already contains the entry, no update needed.");
                    return true;
                }

dyx1234 · 2024-11-26T08:45:27Z

@24kpure 下面updateConfigMap这个问题，这样设计是因为ConfigMap中键值对的key为cluster___namespace。意味着一个ConfigMap可能存储多个namespace的配置。即finalExistingData可能包含其他namespace的配置。
因此它是data的父集是合理的，只要确保data中的键值对在finalExistingData中是最新的即可。

24kpure · 2024-11-26T10:10:50Z

@24kpure 下面updateConfigMap这个问题，这样设计是因为ConfigMap中键值对的key为cluster___namespace。意味着一个ConfigMap可能存储多个namespace的配置。即finalExistingData可能包含其他namespace的配置。因此它是data的父集是合理的，只要确保data中的键值对在finalExistingData中是最新的即可。

感谢您的回复。很抱歉我的问题没有描述清楚，原意是想咨询是否存在server端移除某个key，导致finalExistingData含有删除的key，此时如何移除？。

24kpure · 2024-12-03T10:31:09Z

如果各位都没有其它补充，我这就在稍后raise a pr ，妥否？
@nobodyiam @shoothzj @dyx1234

nobodyiam · 2024-12-06T09:22:12Z

我觉得可以，不过如 @shoothzj 所言，实现上有些复杂度，届时可以看下 PR

## What's the purpose of this PR fix #89 ## Which issue(s) this PR fixes: fix #89 ## Brief changelog reduce conflicts when update configmap in k8s Follow this checklist to help us incorporate your contribution quickly and easily: - [x] Read the [Contributing Guide](https://github.com/apolloconfig/apollo/blob/master/CONTRIBUTING.md) before making this pull request. - [x] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [x] Write necessary unit tests to verify the code. - [x] Run `mvn clean test` to make sure this pull request doesn't break anything. - [x] Update the [`CHANGES` log](https://github.com/apolloconfig/apollo-java/blob/master/CHANGES.md).  ## Summary by CodeRabbit - **New Features** - Introduced a mechanism to reduce conflicts when updating ConfigMap in Kubernetes. - Enhanced access key secret retrieval for applications. - Added pod write permission controls for ConfigMap management. - **Improvements** - Refined configuration utility for better property handling. - Improved Kubernetes resource management logic. - **Testing** - Updated test suite to improve coverage of Kubernetes-related functionality.

nobodyiam added the feature request label Nov 16, 2024

nobodyiam added the discussion label Nov 19, 2024

24kpure added a commit to 24kpure/apollo-java that referenced this issue Dec 9, 2024

feat: reduce conflicts when update configmap in k8s apolloconfig#89

45d2cce

24kpure mentioned this issue Dec 9, 2024

feat: reduce conflicts when update configmap in k8s #89 #93

Merged

5 tasks

24kpure added a commit to 24kpure/apollo-java that referenced this issue Dec 10, 2024

feat: reduce conflicts when update configmap in k8s apolloconfig#89

b56e717

24kpure added a commit to 24kpure/apollo-java that referenced this issue Jan 9, 2025

feat: reduce conflicts when update configmap in k8s apolloconfig#89

1cc173e

hezhangjian closed this as completed in #93 Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于使用k8s configmap作为持久化策略时潜在的压力风险与写一致性问题 #89

关于使用k8s configmap作为持久化策略时潜在的压力风险与写一致性问题 #89

24kpure commented Nov 13, 2024 •

edited

Loading

nobodyiam commented Nov 16, 2024

hezhangjian commented Nov 16, 2024 •

edited

Loading

24kpure commented Nov 18, 2024

24kpure commented Nov 25, 2024

nobodyiam commented Nov 25, 2024

hezhangjian commented Nov 26, 2024

24kpure commented Nov 26, 2024

dyx1234 commented Nov 26, 2024

24kpure commented Nov 26, 2024

24kpure commented Dec 3, 2024

nobodyiam commented Dec 6, 2024

关于使用k8s configmap作为持久化策略时潜在的压力风险与写一致性问题 #89

关于使用k8s configmap作为持久化策略时潜在的压力风险与写一致性问题 #89

Comments

24kpure commented Nov 13, 2024 • edited Loading

nobodyiam commented Nov 16, 2024

hezhangjian commented Nov 16, 2024 • edited Loading

24kpure commented Nov 18, 2024

24kpure commented Nov 25, 2024

nobodyiam commented Nov 25, 2024

hezhangjian commented Nov 26, 2024

24kpure commented Nov 26, 2024

dyx1234 commented Nov 26, 2024

24kpure commented Nov 26, 2024

24kpure commented Dec 3, 2024

nobodyiam commented Dec 6, 2024

24kpure commented Nov 13, 2024 •

edited

Loading

hezhangjian commented Nov 16, 2024 •

edited

Loading