Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-16835: Generate Python client from OpenAPI spec #1681

Merged

Conversation

gerlowskija
Copy link
Contributor

@gerlowskija gerlowskija commented Jun 2, 2023

https://issues.apache.org/jira/browse/SOLR-16835

Description

Currently, Solr only offers one "first-party" client, SolrJ, which can only be used within JVM environments. This is obviously limiting for our users.

Historically, keeping client bindings in multiple languages up to date has been too daunting for the project to undertake. But now that Solr produces an OpenAPI spec for its upcoming "v2" API, we have the ability to generate client bindings from that for a variety of languages with very little additional maintenance cost.

Solution

This PR uses the 'openapi-generator' gradle plugin to create a Python client based on our OpenAPI spec. A few details:

  • ./gradlew solr:api:buildPythonClient to generate the Python client
  • Of course, the generated client only supports those APIs that are represented in our OpenAPI spec - i.e. the v2 APIs that have been converted to JAX-RS. Users that cannot find an API they'd like should consider migrating the code for that API to JAX-RS.
  • Generation uses a "moustache" template that we can modify if we'd like to. (Currently we're not modifying that template at all.)
  • Once created, client is available at solr/api/build/generated/python and can be installed to the user's Python environment with python setup.py install --user
  • Generated code uses urllib3 to send requests (though that's abstracted away from users afaict)
  • The organization (and to an extent, quality) of the generated code depends in part on some OpenAPI annotations that we don't use across-the-board currently. This PR adds those annotations to a few Java classes so the generated client is a little easier to demo. We already have other JIRA tickets for doing this more comprehensively though.

See below for a snippet that uses the generated client to create a collection:

import solr
from pprint import pprint
from solr.api import collections_api
from solr.model.create_collection_request_body import CreateCollectionRequestBody

configuration = solr.Configuration(host = "http://localhost:8983/api")
with solr.ApiClient(configuration) as api_client:
    api_instance = collections_api.CollectionsApi(api_client)
    create_params = CreateCollectionRequestBody(
            name = "somecollection2",
            num_shards = 1 
    )   
    try:
        api_response = api_instance.create_collection(create_collection_request_body = create_params)
        pprint(api_response)
    except solr.ApiException as e:
        print("Exception when calling CollectionApi->create_collection: %s\n" % e)

Tests

The code-generator has an option to generate tests for the generated client code. I've disabled that currently, trusting the generator to do its thing, but we can enable that if it's something we'd feel more comfortable using.

Other than those generated tests, this has mostly been manual testing on my part.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@gerlowskija
Copy link
Contributor Author

I've left this as a "draft" to signal that I don't intend to merge it imminently, but this is largely ready to go if anyone wants to review or pull it down to try out. I'd really love some feedback if anyone has a chance to "kick the tires" on their own.

In terms of "timeline" for merging: there's another ticket similar to this one that's targeting Java code generation, and I'd like to make a little progress there before I merge here. (I don't want to do anything in this PR that makes Java-generation harder, or vice versa.)

solr/core/build.gradle Outdated Show resolved Hide resolved
@gerlowskija
Copy link
Contributor Author

Quick update on the status of this PR:

The Java client generation I mentioned above has been merged for some time now. So there's no "hard" blocker for getting this merged.

My main concern at this point is that I don't know the Python ecosystem well enough to weigh in on what folks want from a Python client. Maybe OpenAPI's default Python template is good enough without any tweaking, but I'm reluctant to assume that and commit absent feedback from someone that actually knows and uses Python.

If anyone out there has Python experience or currently uses a Python client, could really use your input!

@epugh
Copy link
Contributor

epugh commented Oct 27, 2023

i wonder if there is anything we could do to "dogfood" the python client? Is there anything we wanted to build, like a script that interacts with Solr to do X that could be written in python, using this client library, to validate it....

A crazy thought would be to try and write a script that demonstrates "Doing something complex with Solr apis that are currently available in v2" using the client? Like, this file, but where appropriate we use the Python client?

https://github.com/apache/solr/blob/4e54090d2a10638dfd24311428084da56a043c89/solr/packaging/test/test_opennlp.bats

Here is another thought... What if we tried generating a Javascript client and using THAT in our admin UI?

@janhoy
Copy link
Contributor

janhoy commented Oct 29, 2023

Here is another thought... What if we tried generating a Javascript client and using THAT in our admin UI?

Very good ideas Eric. Using a generated JS client in AdminUI would give us a way to cut over from v1 to v2, and could be the start of a client that could be released as an independent artifact.

@madrob
Copy link
Contributor

madrob commented Oct 30, 2023

Would a python client make any of our bats tests easier possibly instead of complex curl commands?

Generation currently uses the default openapi-generator template for
Python, and occurs in the 'api' module following OAS generation.
@gerlowskija gerlowskija force-pushed the SOLR-16835-python-client-generation branch from 172754a to 36bd814 Compare October 30, 2023 16:20
@gerlowskija
Copy link
Contributor Author

i wonder if there is anything we could do to "dogfood" the python client?

Yeah, it's still not ideal IMO but dogfooding might get us enough confidence to merge at least. If anyone has a good idea where dog-fooding might make sense, that'd be awesome. Curious to hear what @epugh thinks about @madrob 's suggestion above. Up to this point I kindof thought the BATS framework was targeting bash specifically, but you two are the most familiar so if you think it'd be a decent fit I'm happy to give it a shot...

@HoustonPutman
Copy link
Contributor

Up to this point I kindof thought the BATS framework was targeting bash specifically, but you two are the most familiar so if you think it'd be a decent fit I'm happy to give it a shot...

Yeah, there is a niceness about just using bash in the BATS framework, and I'm not sure any of the curl stuff is that bad... I can definitely see some awesome integration tests using python, but that would target other stuff maybe?

@epugh
Copy link
Contributor

epugh commented Oct 30, 2023

Up to this point I kindof thought the BATS framework was targeting bash specifically, but you two are the most familiar so if you think it'd be a decent fit I'm happy to give it a shot...

Yeah, there is a niceness about just using bash in the BATS framework, and I'm not sure any of the curl stuff is that bad... I can definitely see some awesome integration tests using python, but that would target other stuff maybe?

Up to this point I kindof thought the BATS framework was targeting bash specifically, but you two are the most familiar so if you think it'd be a decent fit I'm happy to give it a shot...

Yeah, there is a niceness about just using bash in the BATS framework, and I'm not sure any of the curl stuff is that bad... I can definitely see some awesome integration tests using python, but that would target other stuff maybe?

I could see a whole separate set of tests that demonstrate complex multi step processes (and are probably going to end up being brittle ;-( ) that we could add. For example, I added a .bats test that demonstrated running two SEPARATE SolrCloud instances, and using classic Replication to move data between them ;-). There are lots of things like splitting collections or using the REINDEXCOLLECTION command that might benefit by a separate "integration" test approach both to validate the code works, but also to show folks how to do these things. Heck, #1999 features a complex multi step bats test just to help me figure out how it would all work together ;-).

@epugh
Copy link
Contributor

epugh commented Oct 30, 2023

Here is another thought... What if we tried generating a Javascript client and using THAT in our admin UI?

Very good ideas Eric. Using a generated JS client in AdminUI would give us a way to cut over from v1 to v2, and could be the start of a client that could be released as an independent artifact.

I've been putting off migrating anything to v2 in the AdminUI on the hope that somethign like a client would come along...

@anshumg
Copy link
Contributor

anshumg commented Oct 30, 2023

I think we should call this client the 'Solr Admin Client' considering that's what it enables users to do.

@gerlowskija
Copy link
Contributor Author

I think we should call this client the 'Solr Admin Client' considering that's what it enables users to do.

The APIs currently covered by our OAS are all admin APIs, sure. But that's a point-in-time reality that's likely to change as more APIs are converted to JAX-RS and added to our OAS. I definitely don't want folks that come here today to be confused when they can't (yet) find the Python method to run a query, etc. But I'm also pretty leery of over-specifying and then being stuck with a name that's misleading in the other direction.

Wdyt about calling out the current limitations with a big bold message at the top of the client's README, or some other doc-based solution?

@gerlowskija
Copy link
Contributor Author

I've been putting off migrating anything to v2 in the AdminUI on the hope that somethign like a client would come along...

Be careful what you wish for @epugh 😛

@MarcusSorealheis
Copy link
Contributor

I think we should call this client the 'Solr Admin Client' considering that's what it enables users to do.

In the package name?

@gerlowskija
Copy link
Contributor Author

gerlowskija commented Nov 14, 2023

In the package name?

I hesitate to put words in his mouth, but I interpreted Anshum's comment as being more about documentation and messaging around the client in general. As it stands today, folks that come in expecting a full querying/indexing client are going to be disappointed unless we convey the limitations more clearly than we do currently.


A small general update on this:

With the JS client PR (#2050) merged, I've got more time and attention to hopefully move this forward. My first step is to find a way to dogfood the client, ideally in Python code we already have. We have a surprising number of small Python scripts in our codebase (see below), if any of these are making Admin APIs calls they might be a good place to get some dog-fooding in.

$ find . -name "*.py" | grep -v nodejs | grep -v build
./solr/example/films/vectors/create_dataset.py
./solr/example/films/vectors/films.py
./solr/example/films/vectors/create_model.py
./solr/modules/ltr/example/libsvm_formatter.py
./solr/modules/ltr/example/train_and_upload_demo_model.py
./dev-tools/scripts/githubPRs.py
./dev-tools/scripts/smokeTestRelease.py
./dev-tools/scripts/refguide/gen-refguide-redirects.py
./dev-tools/scripts/addVersion.py
./dev-tools/scripts/scriptutil.py
./dev-tools/scripts/scaffoldNewModule.py
./dev-tools/scripts/releasedJirasRegex.py
./dev-tools/scripts/releaseWizard.py
./dev-tools/scripts/addDepsToChanges.py
./dev-tools/scripts/checkJavadocLinks.py
./dev-tools/scripts/reproduceJenkinsFailures.py
./dev-tools/scripts/create_line_file_docs.py

The files look like a mix of scripts for developers, and scripts that actually ship with Solr for examples and other purposes.
Adding dogfooding to the development scripts would require script-users to call ./gradlew to generate the Python client prior to running the script. That's a bit of a burden, but maybe it'd be reasonable given the intended audience.

Adding the Python client to the various scripts used in Solr's "examples" seems trickier since these ship with Solr. The Python client library would have to be included in the Solr tgz in some way. But maybe there's prior-art for this if the 'example' scripts rely on any 3rd-party libraries already?

Anyone have thoughts/suggestions?

@gerlowskija
Copy link
Contributor Author

Probably our best bet for dogfooding the Python client is the ./solr/modules/ltr/example/train_and_upload_demo_model.py script, which uses Python to send simple queries. This requires that the generated clients have some rudimentary support for querying, so I've taken a crack at adding this in #2079. That PR will block this one, unless others have thoughts around ways to validate or QA the generated Python client.

@gerlowskija gerlowskija marked this pull request as ready for review November 30, 2023 18:25
@gerlowskija
Copy link
Contributor Author

Alright! #2079 has been merged, giving our generated clients a way to make rudimentary queries. With that in place, this PR now switches the ltr train_and_upload_demo_model.py example script to use the new client. The resulting Python code looks really nice IMO!

Now that we've finally gotten some dog-fooding around the new client, I've taken this PR out of "Draft" mode. Hopefully seeing the client in use will help generate some ideas and feedback. Pending additional feedback though, I'll aim to merge this sometime next week!

Copy link
Contributor

@epugh epugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Maybe the tiniest nit on the pushd/popd as "maybe a command most people don't know about'.. the python changes looked pretty natural!

solr/modules/ltr/example/README.md Outdated Show resolved Hide resolved
def setupSolrClient(host, port):
'''Configures the Solr client with the specified Solr host/port'''
solr_client_config = solr.Configuration(
host = "http://" + host + ":" + str(port) + "/api"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess at some point the /api will get dropped right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the plan eventually, yep.

@gerlowskija gerlowskija merged commit b0d767c into apache:main Dec 5, 2023
2 checks passed
@gerlowskija gerlowskija deleted the SOLR-16835-python-client-generation branch December 5, 2023 15:24
gerlowskija added a commit that referenced this pull request Dec 7, 2023
This commit adds build code to generate a Python client (using
the OpenAPI Generator's 'python' template) and adds the
necessary plumbing to bundle the client into an example directory
used in Solr's LTR module.

Note that nothing in this commit adds this client as a release
artifact, publishes it to pip, etc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants