Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent get_<DATA_ASSET> permissions - S3_Datasets #1727

Merged
merged 16 commits into from
Dec 10, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions backend/dataall/modules/s3_datasets/api/dataset/resolvers.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,12 @@ def get_dataset_statistics(context: Context, source: S3Dataset, **kwargs):
return DatasetService.get_dataset_statistics(source)


def get_dataset_restricted_information(context: Context, source: S3Dataset, **kwargs):
if not source:
return None
return DatasetService.get_dataset_restricted_information(uri=source.datasetUri, dataset=source)


@is_feature_enabled('modules.s3_datasets.features.aws_actions')
def get_dataset_assume_role_url(context: Context, source, datasetUri: str = None):
return DatasetService.get_dataset_assume_role_url(uri=datasetUri)
Expand Down
45 changes: 22 additions & 23 deletions backend/dataall/modules/s3_datasets/api/dataset/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
get_dataset_statistics,
get_dataset_glossary_terms,
resolve_dataset_stack,
get_dataset_restricted_information,
)
from dataall.core.environment.api.enums import EnvironmentPermission

Expand All @@ -23,6 +24,22 @@
],
)

DatasetRestrictedInformation = gql.ObjectType(
name='DatasetRestrictedInformation',
fields=[
gql.Field(name='AwsAccountId', type=gql.String),
gql.Field(name='region', type=gql.String),
gql.Field(name='S3BucketName', type=gql.String),
gql.Field(name='GlueDatabaseName', type=gql.String),
Comment on lines +32 to +33
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: A call out that these 2 fields (S3BucketName and GlueDatabaseName) are a part of index in opensearch (backend/dataall/modules/s3_datasets/indexers/dataset_indexer.py) - not sure really how "restricted" they are treated if that information can be found elsewhere

gql.Field(name='IAMDatasetAdminRoleArn', type=gql.String),
gql.Field(name='KmsAlias', type=gql.String),
gql.Field(name='importedS3Bucket', type=gql.Boolean),
gql.Field(name='importedGlueDatabase', type=gql.Boolean),
gql.Field(name='importedKmsKey', type=gql.Boolean),
gql.Field(name='importedAdminRole', type=gql.Boolean),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we also want to include the following share expiration / config related information as part of restricted?

        gql.Field(name='autoApprovalEnabled', type=gql.Boolean),
        gql.Field(name='enableExpiration', type=gql.Boolean),
        gql.Field(name='expirySetting', type=gql.String),
        gql.Field(name='expiryMinDuration', type=gql.Integer),
        gql.Field(name='expiryMaxDuration', type=gql.Integer),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially yes, but this information is fetched when opening a share request. At the end the user needs it for the share request creation

],
)

Dataset = gql.ObjectType(
name='Dataset',
fields=[
Expand All @@ -35,29 +52,13 @@
gql.Field(name='created', type=gql.String),
gql.Field(name='updated', type=gql.String),
gql.Field(name='admins', type=gql.ArrayType(gql.String)),
gql.Field(name='AwsAccountId', type=gql.String),
gql.Field(name='region', type=gql.String),
gql.Field(name='S3BucketName', type=gql.String),
gql.Field(name='GlueDatabaseName', type=gql.String),
gql.Field(name='GlueCrawlerName', type=gql.String),
gql.Field(name='GlueCrawlerSchedule', type=gql.String),
gql.Field(name='GlueProfilingJobName', type=gql.String),
gql.Field(name='GlueProfilingTriggerSchedule', type=gql.String),
gql.Field(name='IAMDatasetAdminRoleArn', type=gql.String),
gql.Field(name='KmsAlias', type=gql.String),
gql.Field(name='bucketCreated', type=gql.Boolean),
gql.Field(name='glueDatabaseCreated', type=gql.Boolean),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: making note we can remove the following from data model if we wanted to clean up sometime in the future

  • bucketCreated
  • glueDatabaseCreated
  • iamAdminRoleCreated
  • lakeformationLocationCreated
  • bucketPolicyCreated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not want to add a migration script because backfilling it in 2.6.2 was going to be a pain; but totally agree on the cleanup

gql.Field(name='iamAdminRoleCreated', type=gql.Boolean),
gql.Field(name='lakeformationLocationCreated', type=gql.Boolean),
gql.Field(name='bucketPolicyCreated', type=gql.Boolean),
gql.Field(name='SamlAdminGroupName', type=gql.String),
gql.Field(name='businessOwnerEmail', type=gql.String),
gql.Field(name='businessOwnerDelegationEmails', type=gql.ArrayType(gql.String)),
gql.Field(name='importedS3Bucket', type=gql.Boolean),
gql.Field(name='importedGlueDatabase', type=gql.Boolean),
gql.Field(name='importedKmsKey', type=gql.Boolean),
gql.Field(name='importedAdminRole', type=gql.Boolean),
gql.Field(name='imported', type=gql.Boolean),
gql.Field(
name='restricted',
type=DatasetRestrictedInformation,
resolver=get_dataset_restricted_information,
),
gql.Field(
name='environment',
type=gql.Ref('EnvironmentSimplified'),
Expand Down Expand Up @@ -130,8 +131,6 @@
name='GlueCrawler',
fields=[
gql.Field(name='Name', type=gql.ID),
gql.Field(name='AwsAccountId', type=gql.String),
gql.Field(name='region', type=gql.String),
gql.Field(name='status', type=gql.String),
],
)
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,12 @@ def resolve_dataset(context, source: DatasetStorageLocation, **kwargs):
return d


def get_folder_restricted_information(context: Context, source: DatasetStorageLocation, **kwargs):
if not source:
return None
return DatasetLocationService.get_folder_restricted_information(uri=source.locationUri, folder=source)


def resolve_glossary_terms(context: Context, source: DatasetStorageLocation, **kwargs):
if not source:
return None
Expand Down
48 changes: 10 additions & 38 deletions backend/dataall/modules/s3_datasets/api/storage_location/types.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
from dataall.base.api import gql
from dataall.modules.s3_datasets.api.storage_location.resolvers import resolve_glossary_terms, resolve_dataset
from dataall.modules.s3_datasets.api.storage_location.resolvers import (
resolve_glossary_terms,
resolve_dataset,
get_folder_restricted_information,
)

DatasetStorageLocation = gql.ObjectType(
name='DatasetStorageLocation',
Expand All @@ -11,13 +15,15 @@
gql.Field(name='owner', type=gql.String),
gql.Field(name='created', type=gql.String),
gql.Field(name='updated', type=gql.String),
gql.Field(name='region', type=gql.String),
gql.Field(name='tags', type=gql.ArrayType(gql.String)),
gql.Field(name='AwsAccountId', type=gql.String),
gql.Field(name='S3BucketName', type=gql.String),
gql.Field(name='S3Prefix', type=gql.String),
gql.Field(name='locationCreated', type=gql.Boolean),
gql.Field(name='dataset', type=gql.Ref('Dataset'), resolver=resolve_dataset),
gql.Field(
name='restricted',
type=gql.Ref('DatasetRestrictedInformation'),
resolver=get_folder_restricted_information,
),
gql.Field(name='userRoleForStorageLocation', type=gql.Ref('DatasetRole')),
gql.Field(name='environmentEndPoint', type=gql.String),
gql.Field(
Expand All @@ -40,37 +46,3 @@
gql.Field(name='hasPrevious', type=gql.Boolean),
],
)


DatasetAccessPoint = gql.ObjectType(
name='DatasetAccessPoint',
fields=[
gql.Field(name='accessPointUri', type=gql.ID),
gql.Field(name='location', type=DatasetStorageLocation),
gql.Field(name='dataset', type=gql.Ref('Dataset')),
gql.Field(name='name', type=gql.String),
gql.Field(name='description', type=gql.String),
gql.Field(name='owner', type=gql.String),
gql.Field(name='created', type=gql.String),
gql.Field(name='updated', type=gql.String),
gql.Field(name='region', type=gql.String),
gql.Field(name='AwsAccountId', type=gql.String),
gql.Field(name='S3BucketName', type=gql.String),
gql.Field(name='S3Prefix', type=gql.String),
gql.Field(name='S3AccessPointName', type=gql.String),
],
)


DatasetAccessPointSearchResult = gql.ObjectType(
name='DatasetAccessPointSearchResult',
fields=[
gql.Field(name='count', type=gql.Integer),
gql.Field(name='page', type=gql.Integer),
gql.Field(name='pageSize', type=gql.Integer),
gql.Field(name='pages', type=gql.Integer),
gql.Field(name='hasNext', type=gql.Integer),
gql.Field(name='hasPrevious', type=gql.Integer),
gql.Field(name='nodes', type=gql.ArrayType(DatasetAccessPoint)),
],
)
2 changes: 1 addition & 1 deletion backend/dataall/modules/s3_datasets/api/table/mutations.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
syncTables = gql.MutationField(
name='syncTables',
args=[gql.Argument(name='datasetUri', type=gql.NonNullableType(gql.String))],
type=gql.Ref('DatasetTableSearchResult'),
type=gql.Integer,
resolver=sync_tables,
)

Expand Down
6 changes: 6 additions & 0 deletions backend/dataall/modules/s3_datasets/api/table/resolvers.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,3 +72,9 @@ def list_table_data_filters(context: Context, source, tableUri: str = None, filt
if not filter:
filter = {'page': 1, 'pageSize': 5}
return DatasetTableDataFilterService.list_table_data_filters(uri=tableUri, data=filter)


def get_dataset_table_restricted_information(context: Context, source: DatasetTable, **kwargs):
if not source:
return None
return DatasetTableService.get_table_restricted_information(uri=source.tableUri, table=source)
17 changes: 13 additions & 4 deletions backend/dataall/modules/s3_datasets/api/table/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
resolve_dataset,
get_glue_table_properties,
resolve_glossary_terms,
get_dataset_table_restricted_information,
)

TablePermission = gql.ObjectType(
Expand All @@ -21,6 +22,15 @@
gql.Field(name='nodes', type=gql.ArrayType(TablePermission)),
],
)
DatasetTableRestrictedInformation = gql.ObjectType(
name='DatasetTableRestrictedInformation',
fields=[
gql.Field(name='AwsAccountId', type=gql.String),
gql.Field(name='GlueDatabaseName', type=gql.String),
gql.Field(name='GlueTableName', type=gql.String),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to call out above that GlueDatabaseName is a part of index in opensearch (backend/dataall/modules/s3_datasets/indexers/table_indexer.py) - not sure really how "restricted" they are treated if that information can be found elsewhere

++ to note I would think best if anything would be to restrict on open search and keep the other server side restricted logic the same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be the solution, restricting it also in OpenSearch

gql.Field(name='S3Prefix', type=gql.String),
],
)

DatasetTable = gql.ObjectType(
name='DatasetTable',
Expand All @@ -35,12 +45,11 @@
gql.Field(name='created', type=gql.String),
gql.Field(name='updated', type=gql.String),
gql.Field(name='admins', type=gql.ArrayType(gql.String)),
gql.Field(name='AwsAccountId', type=gql.String),
gql.Field(name='GlueDatabaseName', type=gql.String),
gql.Field(name='GlueTableName', type=gql.String),
gql.Field(name='LastGlueTableStatus', type=gql.String),
gql.Field(name='S3Prefix', type=gql.String),
gql.Field(name='GlueTableConfig', type=gql.String),
gql.Field(
name='restricted', type=DatasetTableRestrictedInformation, resolver=get_dataset_table_restricted_information
),
gql.Field(
name='GlueTableProperties',
type=gql.String,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
from sqlalchemy import inspect
dlpzx marked this conversation as resolved.
Show resolved Hide resolved
from dataall.modules.s3_datasets.indexers.dataset_indexer import DatasetIndexer
from dataall.base.context import get_context
from dataall.core.permissions.services.resource_policy_service import ResourcePolicyService
from dataall.core.permissions.services.tenant_policy_service import TenantPolicyService
from dataall.modules.catalog.db.glossary_repositories import GlossaryRepository
from dataall.base.db.exceptions import ResourceShared, ResourceAlreadyExists
from dataall.base.db.exceptions import ResourceUnauthorized, ResourceAlreadyExists
dlpzx marked this conversation as resolved.
Show resolved Hide resolved
from dataall.modules.s3_datasets.services.dataset_service import DatasetService
from dataall.modules.s3_datasets.aws.s3_location_client import S3LocationClient
from dataall.modules.s3_datasets.db.dataset_location_repositories import DatasetLocationRepository
Expand Down Expand Up @@ -59,7 +60,6 @@ def list_dataset_locations(uri: str, filter: dict = None):
return DatasetLocationRepository.list_dataset_locations(session=session, uri=uri, data=filter)

@staticmethod
@ResourcePolicyService.has_resource_permission(GET_DATASET_FOLDER)
def get_storage_location(uri):
with get_context().db_engine.scoped_session() as session:
return DatasetLocationRepository.get_location_by_uri(session, uri)
Expand Down Expand Up @@ -135,3 +135,10 @@ def _delete_dataset_folder_read_permission(session, dataset: S3Dataset, location
}
for group in permission_group:
ResourcePolicyService.delete_resource_policy(session=session, group=group, resource_uri=location_uri)

@staticmethod
@ResourcePolicyService.has_resource_permission(GET_DATASET_FOLDER)
def get_folder_restricted_information(uri: str, folder: DatasetStorageLocation):
context = get_context()
with context.db_engine.scoped_session() as session:
return DatasetRepository.get_dataset_by_uri(session, folder.datasetUri)
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import os
import json
import logging
from sqlalchemy import inspect
dlpzx marked this conversation as resolved.
Show resolved Hide resolved
from typing import List
from dataall.base.aws.quicksight import QuicksightClient
from dataall.base.db import exceptions
Expand Down Expand Up @@ -39,6 +40,7 @@
DATASET_READ,
IMPORT_DATASET,
DATASET_TABLE_ALL,
GET_DATASET,
)
from dataall.modules.datasets_base.services.dataset_list_permissions import LIST_ENVIRONMENT_DATASETS
from dataall.modules.s3_datasets.db.dataset_repositories import DatasetRepository
Expand Down Expand Up @@ -338,6 +340,11 @@ def get_dataset_statistics(dataset: S3Dataset):
'upvotes': count_upvotes or 0,
}

@staticmethod
@ResourcePolicyService.has_resource_permission(GET_DATASET)
def get_dataset_restricted_information(uri: str, dataset: S3Dataset):
return dataset

@staticmethod
@TenantPolicyService.has_tenant_permission(MANAGE_DATASETS)
@ResourcePolicyService.has_resource_permission(CREDENTIALS_DATASET)
Expand Down Expand Up @@ -391,8 +398,6 @@ def start_crawler(uri: str, data: dict = None):

return {
'Name': dataset.GlueCrawlerName,
'AwsAccountId': dataset.AwsAccountId,
'region': dataset.region,
'status': crawler.get('LastCrawl', {}).get('Status', 'N/A'),
}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import logging

from sqlalchemy import inspect
dlpzx marked this conversation as resolved.
Show resolved Hide resolved
from dataall.base.context import get_context
from dataall.core.permissions.services.resource_policy_service import ResourcePolicyService
from dataall.core.permissions.services.tenant_policy_service import TenantPolicyService
Expand Down Expand Up @@ -44,6 +44,11 @@ def get_table(uri: str):
with get_context().db_engine.scoped_session() as session:
return DatasetTableRepository.get_dataset_table_by_uri(session, uri)

@staticmethod
@ResourcePolicyService.has_resource_permission(GET_DATASET_TABLE)
def get_table_restricted_information(uri: str, table: DatasetTable):
return table

@staticmethod
@TenantPolicyService.has_tenant_permission(MANAGE_DATASETS)
@ResourcePolicyService.has_resource_permission(UPDATE_DATASET_TABLE, parent_resource=_get_dataset_uri)
Expand Down Expand Up @@ -127,11 +132,7 @@ def sync_tables_for_dataset(cls, uri):
DatasetTableIndexer.upsert_all(session=session, dataset_uri=dataset.datasetUri)
DatasetTableIndexer.remove_all_deleted(session=session, dataset_uri=dataset.datasetUri)
DatasetIndexer.upsert(session=session, dataset_uri=dataset.datasetUri)
return DatasetRepository.paginated_dataset_tables(
session=session,
uri=uri,
data={'page': 1, 'pageSize': 10},
)
return DatasetRepository.count_dataset_tables(session, dataset.datasetUri)

@staticmethod
def sync_existing_tables(session, uri, glue_tables=None):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ def validate_dataset_share_selector_input(data):


def list_shared_tables_by_env_dataset(context: Context, source, datasetUri: str, envUri: str):
return S3ShareService.list_shared_tables_by_env_dataset(datasetUri, envUri)
return S3ShareService.list_shared_tables_by_env_dataset(dataset_uri=datasetUri, uri=envUri)


@is_feature_enabled('modules.s3_datasets.features.aws_actions')
Expand Down
2 changes: 1 addition & 1 deletion frontend/src/modules/Folders/components/FolderOverview.js
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ export const FolderOverview = (props) => {
<Grid item lg={4} xl={3} xs={12}>
<ObjectMetadata
environment={folder.dataset.environment}
region={folder.dataset.region}
region={folder.restricted?.region || 'UNAUTHORIZED_INFO'}
dlpzx marked this conversation as resolved.
Show resolved Hide resolved
organization={folder.dataset.environment.organization}
owner={folder.owner}
admins={folder.dataset.SamlAdminGroupName || '-'}
Expand Down
12 changes: 8 additions & 4 deletions frontend/src/modules/Folders/components/FolderS3Properties.js
Original file line number Diff line number Diff line change
Expand Up @@ -19,31 +19,35 @@ export const FolderS3Properties = (props) => {
S3 URI
</Typography>
<Typography color="textPrimary" variant="body2">
{`s3://${folder.dataset.S3BucketName}/${folder.S3Prefix}/`}
{`s3://${folder.restricted?.S3BucketName || 'UNAUTHORIZED_INFO'}/${
folder.S3Prefix
}/`}
</Typography>
</CardContent>
<CardContent>
<Typography color="textSecondary" variant="subtitle2">
S3 ARN
</Typography>
<Typography color="textPrimary" variant="body2">
{`arn:aws:s3:::${folder.dataset.S3BucketName}/${folder.S3Prefix}/`}
{`arn:aws:s3:::${
folder.restricted?.S3BucketName || 'UNAUTHORIZED_INFO'
}/${folder.S3Prefix}/`}
</Typography>
</CardContent>
<CardContent>
<Typography color="textSecondary" variant="subtitle2">
Region
</Typography>
<Typography color="textPrimary" variant="body2">
{folder.dataset.region}
{folder.restricted?.region || 'UNAUTHORIZED_INFO'}
</Typography>
</CardContent>
<CardContent>
<Typography color="textSecondary" variant="subtitle2">
Account
</Typography>
<Typography color="textPrimary" variant="body2">
{folder.dataset.AwsAccountId}
{folder.restricted?.AwsAccountId || 'UNAUTHORIZED_INFO'}
</Typography>
</CardContent>
</Card>
Expand Down
Loading
Loading