Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] fix: Use CTE in Enrollment analytics queries [DHIS-16705] #19519

Draft
wants to merge 30 commits into
base: master
Choose a base branch
from

Conversation

luciano-fiandesio
Copy link
Contributor

@luciano-fiandesio luciano-fiandesio commented Dec 18, 2024

WIP

Changes

This PR addresses the generation of SQL queries for Enrollment analytics

Problem

Currently, the Enrollment queries are structured so that sub-queries are used to fetch events values from the analytics_event_* tables.
For instance:

select
    enrollment,
    ...
    ax."ou",
    (select
         "fyjPqlHE7Dn"
     from
	 analytics_event_M3xtLkYBlKI
     where
	 analytics_event_M3xtLkYBlKI.eventstatus != 'SCHEDULE'
	 and analytics_event_M3xtLkYBlKI.enrollment = ax.enrollment
	 and ps = 'CWaAcQYKVpq'
limit 1 ) as "CWaAcQYKVpq[-1].fyjPqlHE7Dn",
from
     analytics_enrollment_m3xtlkyblki as ax
...

The above query works in Postgres but does not work in Doris, because correlation with outer layers of the parent query is not supported.

Mitigation

The current approach is trying to refactor the Enrollment query so that the subqueries (both as select and as where conditions) are "moved" into CTE (Common Table Expressions).
Common Table Expressions are temporary result sets in SQL, defined within a WITH clause, that simplify complex queries by improving readability and modularizing logic. They are particularly useful for recursive queries and can be referenced multiple times within the main query.

The above query can be rewritten like so:

with ps_cwaacqykvpq_fyjpqlhe7dn as (
select
	distinct on
	(enrollment) enrollment,
	"fyjPqlHE7Dn" as value
from
	analytics_event_M3xtLkYBlKI
where
	eventstatus != 'SCHEDULE'
	and ps = 'CWaAcQYKVpq'
	
order by
	enrollment,
	occurreddate desc,
	created desc)
select
	ax.enrollment,
	...
	ps_cwaacqykvpq_fyjpqlhe7dn.value as "CWaAcQYKVpq.fyjPqlHE7Dn"
from
	analytics_enrollment_m3xtlkyblki as ax
left join ps_cwaacqykvpq_fyjpqlhe7dn on
	ax.enrollment = ps_cwaacqykvpq_fyjpqlhe7dn.enrollment
where
	(((enrollmentdate >= '2021-01-01'
	and enrollmentdate < '2022-01-01')))
	and (ax."uidlevel1" = 'ImspTQPwCqd' )
...

The above query structure is compatible with Doris and it makes the execution of the query faster.
As a comparison:

Original query with sub-select

image

Refactored query with CTEs

image

Testing strategy

I am using the e2e project and aim at having 100% green tests on both Postgres and Doris.


void contributeCTE(
ProgramIndicator programIndicator,
RelationshipType relationshipType,

Check notice

Code scanning / CodeQL

Useless parameter Note

The parameter 'relationshipType' is never used.
void contributeCTE(
ProgramIndicator programIndicator,
RelationshipType relationshipType,
AnalyticsType outerSqlEntity,

Check notice

Code scanning / CodeQL

Useless parameter Note

The parameter 'outerSqlEntity' is never used.
return fromClause.toString();
}

protected String getSortClause(EventQueryParams params) {

Check notice

Code scanning / CodeQL

Missing Override annotation Note

This method overrides
AbstractJdbcEventAnalyticsManager.getSortClause
; it is advisable to add an Override annotation.
return "";
}

protected String getSqlFilter(QueryFilter filter, QueryItem item) {

Check notice

Code scanning / CodeQL

Missing Override annotation Note

This method overrides
AbstractJdbcEventAnalyticsManager.getSqlFilter
; it is advisable to add an Override annotation.
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch from 8c33952 to 3e070ce Compare December 18, 2024 16:56
@larshelge larshelge changed the title DHIS-16705 Use CTE in Enrollment analytics queries [DRAFT] Use CTE in Enrollment analytics queries [DHIS-16705] Dec 19, 2024
@larshelge larshelge changed the title [DRAFT] Use CTE in Enrollment analytics queries [DHIS-16705] [DRAFT] fix: Use CTE in Enrollment analytics queries [DHIS-16705] Dec 19, 2024
@luciano-fiandesio luciano-fiandesio force-pushed the DHIS-16705_ENROLLMENT_WITH_CTE branch from 5170115 to d8119c0 Compare December 20, 2024 15:19
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
34 New issues
34 New Code Smells (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

this.programStageUid = programStageUid;
this.cteDefinition = cteDefinition;
this.offset = offset;
this.alias = new RandomStringGenerator.Builder().withinRange('a', 'z').build().generate(5);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
Builder.build
should be avoided because it has been deprecated.
this.programIndicatorUid = programIndicatorUid;
this.programStageUid = null;
this.offset = -999;
this.alias = new RandomStringGenerator.Builder().withinRange('a', 'z').build().generate(5);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
Builder.build
should be avoided because it has been deprecated.
this.isProgramIndicator = true;
}

public CteDefinitionWithOffset(String cteFilterName, String cteDefinition, boolean isFilter) {

Check notice

Code scanning / CodeQL

Useless parameter Note

The parameter 'cteFilterName' is never used.
this.programIndicatorUid = null;
this.programStageUid = null;
this.offset = -999;
this.alias = new RandomStringGenerator.Builder().withinRange('a', 'z').build().generate(5);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
Builder.build
should be avoided because it has been deprecated.
/**
* Returns a select SQL clause for the given query.
*
* @param params the {@link EventQueryParams}.
*/
protected abstract String getSelectClause(EventQueryParams params);

protected abstract String getColumnWithCte(QueryItem item, String suffix, CTEContext cteContext);

Check notice

Code scanning / CodeQL

Useless parameter Note

The parameter 'suffix' is never used.
@@ -579,6 +711,40 @@
return ColumnAndAlias.EMPTY;
}

protected String getColumnWithCte(QueryItem item, String suffix, CTEContext cteContext) {

Check notice

Code scanning / CodeQL

Missing Override annotation Note

This method overrides
AbstractJdbcEventAnalyticsManager.getColumnWithCte
; it is advisable to add an Override annotation.

for (QueryItem item : params.getItems()) {

String itemId = item.getItem().getUid();

Check notice

Code scanning / CodeQL

Unread local variable Note

Variable 'String itemId' is never read.
CTEContext cteContext) {

// Generate a unique CTE name for this program indicator
String cteName = "pi_" + programIndicator.getUid().toLowerCase();

Check notice

Code scanning / CodeQL

Unread local variable Note

Variable 'String cteName' is never read.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant