Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RackAware placement policy can not ensure new ensemble selection succeed when one rack goes down. #23

Open
hangc0276 opened this issue Oct 28, 2022 · 0 comments

Comments

@hangc0276
Copy link
Owner

BUG REPORT

Describe the bug

r1 -> bk1, bk2
r2 -> bk3
r3 -> bk4

enable EnforceMinNumRacksPerWriteQuorum and set minNumRacksPerWriteQuorumConfValue=2

When bk3 or bk4 is quarantined, the new ensemble selection should succeed, because it fulfills min rack is 2.
However, the new ensemble selection result some times will fail.

Run the following unit test in TestRackawareEnsemblePlacementPolicy.java can reproduce the bug.

final int minNumRacksPerWriteQuorumConfValue = 2;
conf.setMinNumRacksPerWriteQuorum(minNumRacksPerWriteQuorumConfValue);
conf.setEnforceMinNumRacksPerWriteQuorum(true);

@Test
    public void testNewEnsemblePolicyWithMultipleRacksV2() throws Exception {
        BookieSocketAddress addr1 = new BookieSocketAddress("127.0.0.1", 3181);
        BookieSocketAddress addr2 = new BookieSocketAddress("127.0.0.2", 3181);
        BookieSocketAddress addr3 = new BookieSocketAddress("127.0.0.3", 3181);
        BookieSocketAddress addr4 = new BookieSocketAddress("127.0.0.4", 3181);
        BookieSocketAddress addr5 = new BookieSocketAddress("127.0.0.5", 3181);
        // update dns mapping
        StaticDNSResolver.addNodeToRack(addr1.getHostName(), "/default-region/r1");
        StaticDNSResolver.addNodeToRack(addr2.getHostName(), "/default-region/r1");
        StaticDNSResolver.addNodeToRack(addr3.getHostName(), "/default-region/r2");
        StaticDNSResolver.addNodeToRack(addr4.getHostName(), "/default-region/r3");
        //StaticDNSResolver.addNodeToRack(addr5.getHostName(), "/default-region/r1");
        // Update cluster
        Set<BookieId> addrs = new HashSet<BookieId>();
        addrs.add(addr1.toBookieId());
        addrs.add(addr2.toBookieId());
        addrs.add(addr3.toBookieId());
        //addrs.add(addr5.toBookieId());
        addrs.add(addr4.toBookieId());
        repp.onClusterChanged(addrs, new HashSet<BookieId>());

        try {
            int ensembleSize = 3;
            int writeQuorumSize = 3;
            int ackQuorumSize = 2;

            Set<BookieId> excludeBookies = new HashSet<>();
            excludeBookies.add(addr4.toBookieId());
            //excludeBookies.add(addr3.toBookieId());

            for (int i = 0; i < 50; ++i) {
                EnsemblePlacementPolicy.PlacementResult<List<BookieId>> ensembleResponse =
                    repp.newEnsemble(ensembleSize, writeQuorumSize,
                        ackQuorumSize, null, excludeBookies);
                List<BookieId> ensemble = ensembleResponse.getResult();

                ensemble.forEach(t -> {
                    LOG.info("[hangc] {}", t);
                });
                LOG.info("==========");
            }
        } catch (Exception e ){
            LOG.error("failed ", e);
        }
    }
@hangc0276 hangc0276 changed the title RackAware placement policy can not ensure RackAware placement policy can not ensure new ensemble selection succeed when one rack goes down. Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant