-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use SSH to authenticate GitDagBundle #44976
base: main
Are you sure you want to change the base?
Conversation
3806221
to
9537677
Compare
This uses SSH hook to authenticate GitDagBundle when provided.
9537677
to
7a3629f
Compare
repo_url: str, | ||
tracking_ref: str, | ||
subdir: str | None = None, | ||
ssh_conn_kwargs: dict[str, str] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ssh_conn_kwargs: dict[str, str] | None = None, | |
ssh_hook_kwargs: dict[str, str] | None = None, |
These are really kwargs for SSHHook
. docstring is wrong too.
|
||
def initialize(self) -> None: | ||
if self.ssh_conn_kwargs: | ||
if not self.repo_url.startswith("git@") and not self.repo_url.endswith(".git"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not self.repo_url.startswith("git@") and not self.repo_url.endswith(".git"): | |
if not self.repo_url.startswith("git@") or not self.repo_url.endswith(".git"): |
We need to check that they are there individually.
def test_repo_url_starts_with_git_when_using_ssh_conn_id(self): | ||
repo_url = "https://github.com/apache/airflow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_repo_url_starts_with_git_when_using_ssh_conn_id(self): | |
repo_url = "https://github.com/apache/airflow" | |
@pytest.mark.parametrize( | |
"repo_url", | |
[ | |
pytest.param("https://github.com/apache/airflow", id="https_url"), | |
pytest.param("airflow@example:apache/airflow.git", id="does_not_start_with_git_at"), | |
pytest.param("git@example:apache/airflow", id="does_not_end_with_dot_git"), | |
] | |
) | |
def test_repo_url_starts_with_git_when_using_ssh_conn_id(self, repo_url): |
Or similar. This better tests that we enforce it both starts with and ends with git.
self.bare_repo.remotes.origin.fetch("+refs/heads/*:refs/heads/*") | ||
self.repo.remotes.origin.pull() | ||
if self.ssh_hook: | ||
with self.ssh_hook.get_conn(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this create a session to the (a?) remote ssh host here? Not sure we want that. Not even sure where this would connect, since the remote host likely wouldn't match the repo_url
.
Unless we can somehow tell git
to use that socket/session?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I experimented, once the ssh connection is established, it will be used by git to clone the repo. The remote host is where the ssh will establish the connection and not the repo url. While testing in breeze, I set localhost as the remote host and cloned a private repo. Also, in the breeze ssh config, I put it to use private key:
Host *
IdentityFile ~/.ssh/id_ed25519
Maybe @potiuk can help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not paid too much attention to those changes so far but happy to help. Not sure exactly what is the problem, but if I understand correctly the question is about the repo_url
and host
connection established in SSHHook via the SSH conn.
Yeah - generally git
can use SSH protocol (And I think it should be generally default for most "production" setup configurations.
https://git-scm.com/book/ms/v2/Git-on-the-Server-The-Protocols
And in this case indeed we need to somehow make repo_url and SSHHook host match. because if they don't then indeed SSHHook.get_conn()
uses the host from SSH conn to establish connection and they connection is reused later.
I believe (from what I see and earlier comments) we are not trying to use existing ssh_conn_id - which I find a bit strange, because Connection (and/or secrets) is where we can securely store authentication information, so i think we should pass conn_id - to retrieve the authentication information from the Connection.
I can imagine - for example that:
Connection(
login = "git"
host = "github.com"
extra = {... private_key etc ..)
Where repo_url = :
[email protected]:apache/airflow.git
There is a bit of a problem here indeed because the question is how to make host part of the repo_url and connection host match. We don't have really a place to store the "repo" portion of the github URL in SSH Connection
, so in the current setup we have a bit of a strange mixup between connection and repo_url where part of the connection definition is in the hook and part in the repo_url.
IMHO - the cleanest solution would be to create a GItSSHHook
and Git SSH Connection
extending from SSHHook
. We could add repository
extra to Git SSH Connection
on top of what SSH Connection
already has - and use it exclusively for git SSH
connectivity/authentication information.
In this case repo_url
and ssh_conn_id
would be mutually exclusive. This would avoid a lot of confusion. We have very similar case where we mixed configuration of emails sent by Airflow - where part of the email configuration is taken from the email conn id and part from the configuraiton and it ain't pretty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I worked towards a separate hook earlier but later changed to using the ssh conn_id. I will revisit it. I think the issue I had was populating the connection type field with the new hook since the hook is not yet in the providers package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a change to use githook. Not manually tested for now
This uses SSH hook to authenticate GitDagBundle when provided.