Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17359: Move Zk Arg parsing into Java Code #2593

Merged
merged 33 commits into from
Aug 14, 2024
Merged

Conversation

epugh
Copy link
Contributor

@epugh epugh commented Jul 24, 2024

https://issues.apache.org/jira/browse/SOLR-17359

Description

Migrate arg parsing for ZK commands into Java code

Solution

Had to add a ZkToolHelp to handle some of the global help output.

Tests

Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@github-actions github-actions bot added documentation Improvements or additions to documentation docker Docker image labels Jul 25, 2024
@epugh epugh marked this pull request as ready for review July 25, 2024 18:56
@epugh epugh requested review from janhoy and gerlowskija July 25, 2024 18:58
@epugh
Copy link
Contributor Author

epugh commented Jul 25, 2024

Okay, I changed the short cut "-s" to mean Solr URL instead of "shards". however, in 9x "-s" means shards.. I am thinking that maybe I change allt he tests to use "--shards", and when I backport to 9x, I don't make "-s" mean solr url, but leave it as shards parameter?

@epugh
Copy link
Contributor Author

epugh commented Jul 31, 2024

FYI, for folks tracking this, @rahulgoswami has started testing this PR on Windows! Thanks @rahulgoswami ... He has validated start, stop and restart and then will start digging into the ZK commands.

@rahulgoswami
Copy link
Contributor

rahulgoswami commented Aug 2, 2024

Update so far:

  1. Commands that work as expected: solr zk ls , solr zk upconfig

Yet to test: zk rm, zk mv, zk mkroot, and some more of zk cp . That should cover everything talked about in the documentation (https://solr.apache.org/guide/solr/latest/deployment-guide/solr-control-script-reference.html). There are more if I dig into SolrCLI.java. Do those need testing as well?

  1. Running the solr start -e cloud example fails.
    Reason: Use of --cloud instead of -cloud in RunExampleTool.java

  2. solr zk -help goes into infinite recursion (stack overflow).
    Reason: getUsage() method in ZkToolHelp.java calls itself with no base condition

Exception in thread "main" java.lang.StackOverflowError
at org.apache.solr.cli.ZkToolHelp.getUsage(ZkToolHelp.java:59)
at org.apache.solr.cli.ZkToolHelp.getUsage(ZkToolHelp.java:59)
at org.apache.solr.cli.ZkToolHelp.getUsage(ZkToolHelp.java:59)
at org.apache.solr.cli.ZkToolHelp.getUsage(ZkToolHelp.java:59)
at org.apache.solr.cli.ZkToolHelp.getUsage(ZkToolHelp.java:59)

  1. Executing a command incorrectly gives a message which is not clear. Eg:
    >solr zk ls
    You must invoke this subcommand using the zk command. bin/solr zk ls.

  2. solr zk downconfig command fails to download given configset. Downloads an empty "conf" directory

solr zk downconfig -z localhost:9983 -n testzkcommand -d "C:\Work\Git\command-testing"
WARN - 2024-08-01 23:10:37.548; org.apache.solr.common.cloud.SolrZkClient; Using default ZkCredentialsInjector. ZkCredentialsInjector is not secure, it creates an empty list of credentials which leads to 'OPEN_ACL_UNSAFE' ACLs to Zookeeper nodes
WARN - 2024-08-01 23:10:38.842; org.apache.solr.common.cloud.SolrZkClient; Using default ZkACLProvider. DefaultZkACLProvider is not secure, it creates 'OPEN_ACL_UNSAFE' ACLs to Zookeeper nodes
Option 'n''confname': Deprecated for removal since 9.7: Use --conf-name instead
Downloading configset null from ZooKeeper at localhost:9983 to directory C:\Work\Git\command-testing\conf

ERROR: Error downloading files from zookeeper path /configs/null to C:\Work\Git\command-testing\conf

  1. solr zk cp
    Copy from Zk to local drive => works for file, but fails to copy a directory with the below error:

solr zk cp zk:/configs/techproducts/params.json "C:\Work\Git\command-testing" -z localhost:9983 ==> This works
Copying from 'zk:/configs/techproducts/params.json' to 'C:\Work\Git\command-testing'. ZooKeeper at localhost:9983
WARN - 2024-08-01 23:32:11.569; org.apache.solr.common.cloud.SolrZkClient; Using default ZkCredentialsInjector. ZkCredentialsInjector is not secure, it creates an empty list of credentials which leads to 'OPEN_ACL_UNSAFE' ACLs to Zookeeper nodes
WARN - 2024-08-01 23:32:11.733; org.apache.solr.common.cloud.SolrZkClient; Using default ZkACLProvider. DefaultZkACLProvider is not secure, it creates 'OPEN_ACL_UNSAFE' ACLs to Zookeeper nodes

====

solr zk cp -r zk:/configs/techproducts "C:\Work\Git\command-testing" -z localhost:9983 ==> This fails

ERROR: Index 1 out of bounds for length 1

@epugh
Copy link
Contributor Author

epugh commented Aug 2, 2024

I'm going to try out some of the examples on my Mac, just to see which are OS specific and which are just part of the code movement not being right. I'll also take a stab at adding more bats style tests to duplicate what you've added.

I think the more testing on Solr on Windows, the better! There has been a LOT of churn.

@epugh
Copy link
Contributor Author

epugh commented Aug 3, 2024

Okay, just fixed the downconfig and upconfig commands!

@epugh
Copy link
Contributor Author

epugh commented Aug 3, 2024

@rahulgoswami would you mind giving another pass through! I fixed a BUNCH of bugs that you found, thank you for the testing. Plus, I updated this branch with other changes that have happened in main that I think fix the RunExampleTool bugs you saw....

@rahulgoswami
Copy link
Contributor

rahulgoswami commented Aug 3, 2024

Pulled in the latest changes on the branch SOLR-17359 and rebuilt. This covers testing of all commands listed in documentation (https://solr.apache.org/guide/solr/latest/deployment-guide/solr-control-script-reference.html#zookeeper-operations). Will try to play with a few more not listed in the documentation, but that show up under help.

  1. zk downconfig has been fixed. upconfig worked for me the first time as well.

  2. solr zk -help now says "-help is not a valid command!". I do see it listed in documentation though(https://solr.apache.org/guide/solr/latest/deployment-guide/solr-control-script-reference.html#zookeeper-operations). Do we want to support it for backward compatibility with the documentation ? solr zk -h works fine.

  3. Executing a command incorrectly now lists the help for the command. Nice!

  4. solr start -e cloud example still fails to bring up the cluster.
    Needs change on line https://github.com/epugh/solr/blob/SOLR-17359/solr/core/src/java/org/apache/solr/cli/RunExampleTool.java#L618 from "--cloud " to "-cloud "

  5. Commands that work fine for me: zk ls, zk upconfig, zk mkroot, zk mv

  6. "solr zk cp" doesn't work. We have 4 cases: file/folder copy to/from zk  
    i) File copy from zk to local drive ==> this is the only cp command that worked for me

    ii) Folder copy from zk to local drive fails with below error
    solr zk cp -r zk:/configs/techproducts "C:\Work\Git\command-testing" -z localhost:9983    
    ERROR: Index 1 out of bounds for length 1

    iii) File copy from local drive to zk fails . It doesn't print an error message but the command completes without copying anything (when I try to check via solr zk ls
    >solr zk cp "C:\Work\Git\command-testing\test-file.txt" zk:/uploads/data -z localhost:9983
    Copying from 'C:\Work\Git\command-testing\test-file.txt' to 'zk:/uploads/data'. ZooKeeper at localhost:9983
    WARN  - 2024-08-02 22:39:04.304; org.apache.solr.common.cloud.SolrZkClient; Using default ZkCredentialsInjector. ZkCredentialsInjector is not secure, it creates an empty list of credentials which leads to 'OPEN_ACL_UNSAFE' ACLs to Zookeeper nodes
    WARN  - 2024-08-02 22:39:04.471; org.apache.solr.common.cloud.SolrZkClient; Using default ZkACLProvider. DefaultZkACLProvider is not secure, it creates 'OPEN_ACL_UNSAFE' ACLs to Zookeeper nodes

    iv) Folder copy from local drive to zk fails. I created a znode "/uploads/data" using the solr zk mkroot command prior to running the below command.
    >solr zk cp -r "C:\Work\Git\command-testing\conf" zk:/uploads/data -z localhost:9983
    Neither --zk-host or --solr-url parameters provided so assuming solr url is http://localhost:8983.

ERROR: Index 0 out of bounds for length 0

  1. zk rm works while trying to delete a file. Fails to delete a directory.
    >solr zk rm -r /uploads/data -z localhost:9983

ERROR: Index 0 out of bounds for length 0

>solr zk ls /uploads -z localhost:9983  ==> Command to show the znode "/uploads/data" exists
WARN  - 2024-08-02 23:31:46.793; org.apache.solr.common.cloud.SolrZkClient; Using default ZkCredentialsInjector. ZkCredentialsInjector is not secure, it creates an empty list of credentials which leads to 'OPEN_ACL_UNSAFE' ACLs to Zookeeper nodes
WARN  - 2024-08-02 23:31:48.120; org.apache.solr.common.cloud.SolrZkClient; Using default ZkACLProvider. DefaultZkACLProvider is not secure, it creates 'OPEN_ACL_UNSAFE' ACLs to Zookeeper nodes
data

@rahulgoswami
Copy link
Contributor

rahulgoswami commented Aug 6, 2024

Submitted a PR for 4 against your branch. Will work on 6 and 7 tonight.

@rahulgoswami
Copy link
Contributor

Submitted PR #8 for issues 6 and 7.
6.iii is the only known open issue for now (File copy from local drive to zk fails). Will tackle soon!

epugh added 4 commits August 8, 2024 13:21
SOLR-17359: Resolve recursive option (-r) failure (eg: in 'zk cp -r' and 'zk rm -r')
Recurse is never provided as a "false", and we want these to be simpler, so let's make the existence of -r as default to apply recursion true...!
@epugh
Copy link
Contributor Author

epugh commented Aug 9, 2024

@rahulgoswami thanks for your work, I was able to take what you did and find a really nice refactoring on the -r option! Looking forward to the next pr...

@rahulgoswami
Copy link
Contributor

rahulgoswami commented Aug 12, 2024

Ok, so 6.iii) is a non-issue. The zk destination was missing a trailing '/' and the zk cp command works fine for a file once the destination ends in a '/' to indicate a parent znode.
solr zk cp "C:\Work\Git\command-testing\test-file.txt" zk:/uploads/data/ -z localhost:9983 ==> Works

Although in traditional filesystems I would expect the command to work if the destination is a directory despite missing a file separator at the end, in case of Zk getting this to work might be tricky. Given the fact that there is no concept of file and directory in Zookeeper, and everything is just a znode, I could not find a reliable check to establish whether an empty znode destination should be classified as a "file" or "directory". This is where the trailing '/' helps clarify things imo.

@rahulgoswami
Copy link
Contributor

rahulgoswami commented Aug 12, 2024

The only zk commands in the "solr zk --help" list that I have not been able to test are "zk linkconfig" and "zk updateacls". I am not sure how these are supposed to work/how to test them. The current documentation doesn't list the usage either (https://solr.apache.org/guide/solr/latest/deployment-guide/solr-control-script-reference.html#zookeeper-operations).

@rahulgoswami
Copy link
Contributor

rahulgoswami commented Aug 12, 2024

Minor hiccup with solr zk --help command. "downconfig" is being displayed twice. Submitted a tiny PR #9.

rahulgoswami and others added 4 commits August 12, 2024 02:27
@epugh
Copy link
Contributor Author

epugh commented Aug 12, 2024

The only zk commands in the "solr zk --help" list that I have not been able to test are "zk linkconfig" and "zk updateacls". I am not sure how these are supposed to work/how to test them. The current documentation doesn't list the usage either (https://solr.apache.org/guide/solr/latest/deployment-guide/solr-control-script-reference.html#zookeeper-operations).

Ok, so 6.iii) is a non-issue. The zk destination was missing a trailing '/' and the zk cp command works fine for a file once the destination ends in a '/' to indicate a parent znode. solr zk cp "C:\Work\Git\command-testing\test-file.txt" zk:/uploads/data/ -z localhost:9983 ==> Works

Although in traditional filesystems I would expect the command to work if the destination is a directory despite missing a file separator at the end, in case of Zk getting this to work might be tricky. Given the fact that there is no concept of file and directory in Zookeeper, and everything is just a znode, I could not find a reliable check to establish whether an empty znode destination should be classified as a "file" or "directory". This is where the trailing '/' helps clarify things imo.

Do you have any thoughts on ref guide text or maybe expanding the text in bin/solr zk cp -h ? Thought that text is kind of wordy already, it may need another edit!

@epugh
Copy link
Contributor Author

epugh commented Aug 12, 2024

Minor hiccup with solr zk --help command. "downconfig" is being displayed twice. Submitted a tiny PR #9.

Thanks! I also saw that the getUsage needed updating, so I did that. This apporach of hard coding getUsage text is rather brittle and we really need to improve how things work underlying. (getUsage() is being overriden because we need bin/solr zk blah-blah and the default does bin/solr blah-blah...

epugh added 2 commits August 12, 2024 07:33
Should all this content someday just go in this referenced page?  we kind of duplicate!
@epugh
Copy link
Contributor Author

epugh commented Aug 12, 2024

@rahulgoswami I updated the ref guide docs to cross link to an EXISTING page for the upconfig/downconfig etc commands... Someday we'll reorg the ref guide page to make the solrcli easier to understand!

@epugh
Copy link
Contributor Author

epugh commented Aug 12, 2024

I believe the one failing unit test doesn't have anything to do with this PR... This is ready for review and merging for 9.8!

@epugh epugh merged commit 463e093 into apache:main Aug 14, 2024
4 of 5 checks passed
epugh added a commit that referenced this pull request Sep 28, 2024
* Move Zk Arg parsing into Java Code from the bin/solr shell and bin/solr.cmd command scripts, making testing life easier on Windows.

* Improved the output of -h to make usage information look nicer

* restructure control flow so that we don't check for each script command by name and THEN call the java code, instead only look for the three exceptions, start, stop, restart.

* Resolve Executing a command incorrectly gives a message which is not clear. Eg: >solr zk ls

---------

Co-authored-by: Rahul Goswami <[email protected]>
(cherry picked from commit 463e093)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:cli docker Docker image documentation Improvements or additions to documentation start-scripts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants