Pivot Role nested stack failing due to already present Lake formation service linked role #993

TejasRGitHub · 2024-01-22T18:51:45Z

Describe the bug

When migrating or bootstrapping environment with auto created pivot role ( dataallPivotRole-cdk ), the Pivot Role nested stack fails due to an already present lake formation service-linked role ( AWSServiceRoleForLakeFormationDataAccess ). This role is created when a S3 bucket is registered in Lake formation ( https://docs.aws.amazon.com/lake-formation/latest/dg/service-linked-roles.html ).

When boostrapping an AWS account into data.all it is certainly possible that this role is being used already.

How to Reproduce

Register a S3 bucket and mention this role for registering the S3 bucket into Lake formation
Create a data.all environment with that AWS account with auto create pivot role setting ON.
The environment creation should fail due to dataallPivotRole stack not able to create the AWSServiceRoleForLakeFormationDataAccess role as it is already present.

Expected behavior

The environment bootstrap should complete properly with the dataallPivotRole-cdk properly created.

Your project

No response

Screenshots

No response

OS

Mac

Python version

3.9

AWS data.all version

2.2

Additional context

No response

The text was updated successfully, but these errors were encountered:

noah-paige · 2024-01-23T15:52:50Z

I don't think we need to create this role as part of pivot role auto create because we use dataset role to register LF locations now. We could potentially remove this piece of code but first need to investigate if any unforeseen errors could come up with with pre-existing environments using this role in data.all or with setting admins in lakeformation using this role.

If we need to still create role and just check if it exists before can use .from_role_arn() (or .from_role_name()) similar to how we do with importing pivot role in env stack:

self.pivot_role = iam.Role.from_role_arn(
     self,
     f'PivotRole{self._environment.environmentUri}',
     pivot_role_stack.pivot_role.role_arn,
)

FOLLOW UP - from discussion with @dlpzx
When we create the data.all Environment, we register the pivot role as a data lake administrator, without the existence of the LF service role this step would fail. At the time data.all was first developed we had to create this LF service role. Alternatively, customers could manually open Lake Formation in the AWS Console and the service role was created automatically. I think we should:

check if it is still necessary: does the role exist in brand new AWS accounts? can we create an environment without creating it explicitly?
- If it is still necessary ---> add check if exists and import with .from_role_arn() or .from_role_name() . Careful with the update stack behaviour
- if it is not necessary ---> removing would cause problems, I would still do number 2 and have a conditional import

TejasRGitHub · 2024-01-23T22:09:30Z

@noah-paige , I wasn't able to find any problems with dataset import, dataset sharing without having the LF servicelinked role.

As pointed out by @dlpzx , removing the role is tricky and can cause problems. As this role is AWS managed service linked role and any aws account would have this created and they could be using this role, for existing data.all environments, ( which use the auto created pivot role ) and have this role in the cdk generated CF templates , if this role is removed, then the role will be deleted from the AWS accounts. This could cause issue with environment which use this role to do any operations on S3 buckets through lake formation.

In order to successfully remove this role, without letting this role get deleted, one way would be to apply the removal policy with RemovalPolicy.RETAIN.

Ways of doing this -

This service role deletion could be done over versions of data.all. For example, in v2.3, we add the code to apply the removal policy. Then in v2.4 , we could change the code to not include this role in pivot role stack. This will remove the role from CF of the pivot role in the update but won't delete the actual role.

Other options - Somehow access the CF template which is deployed for the pivot role and check if the role is present in the Pivot Role CF template. If the role is present, then add the RemovalPolicy.RETAIN policy. In the next update ( whenever the update-stacker runs ) , check if the CF template has the role and also has the removal policy, if yes then remove the role.

class PivotRole(NestedStack):
.. 
.. 
     template = CFTemplate.getTemplate(id: "")
     if template contains LakeFormationSLR and not RemovalPolicy statment:
        self.lf_service_role = iam.CfnServiceLinkedRole(
                self, 'LakeFormationSLR', aws_service_name='lakeformation.amazonaws.com'
        )
        self.lf_service_role.apply_removal_policy(RemovalPolicy.RETAIN)

I don't know if the above is possible, but would be a potential option if that works.

…d Role as its not needed

TejasRGitHub · 2024-01-24T22:26:33Z

PR for fix - #999

noah-paige · 2024-01-24T22:26:47Z

Would be good to add a warning that the LF Service Role will be released on the upcomming release

Can re-use this issue to track maintenance UI view #998 - out of scope for this PR but a good to have for upcoming release to include Notice

…m Pivot Role Nested Stack (#999) ### Feature or Bugfix - Bugfix ### Detail - Updating code to remove `AWSServiceRoleForLakeFormationDataAccess` role which is created in the pivot role nested stack ### Testing 1. Deployed code and checked all the environment are in proper state and that this role is removed from AWS account - ✅ 2. Imported a dataset ( with and without KMS key ) ✅ 3. Created a dataset ✅ 4. Created shares for both type of datasets ✅ 5. Onboarded another environment successfully ✅ ### Relates - #993 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? N/A - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? N/A - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? N/A - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? Removing Role - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

TejasRGitHub added a commit to TejasRGitHub/aws-dataall that referenced this issue Jan 24, 2024

data-dot-allGH-993 - Removing AWS Manged Lake Formation Service Linke…

30905cb

…d Role as its not needed

noah-paige added effort: low type: bug Something isn't working labels Jan 24, 2024

TejasRGitHub mentioned this issue Jan 24, 2024

[GH-993] - Removing AWS Manged Lake Formation Service Linked Role from Pivot Role Nested Stack #999

Merged

dlpzx closed this as completed Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pivot Role nested stack failing due to already present Lake formation service linked role #993

Pivot Role nested stack failing due to already present Lake formation service linked role #993

TejasRGitHub commented Jan 22, 2024

noah-paige commented Jan 23, 2024

TejasRGitHub commented Jan 23, 2024

TejasRGitHub commented Jan 24, 2024

noah-paige commented Jan 24, 2024

Pivot Role nested stack failing due to already present Lake formation service linked role #993

Pivot Role nested stack failing due to already present Lake formation service linked role #993

Comments

TejasRGitHub commented Jan 22, 2024

Describe the bug

How to Reproduce

Expected behavior

Your project

Screenshots

OS

Python version

AWS data.all version

Additional context

noah-paige commented Jan 23, 2024

TejasRGitHub commented Jan 23, 2024

TejasRGitHub commented Jan 24, 2024

noah-paige commented Jan 24, 2024