Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IBX-8562: Command to remove duplicated entries after faulty IBX-5388 fix #410

Merged
merged 2 commits into from
Oct 22, 2024

Conversation

Nattfarinn
Copy link
Contributor

@Nattfarinn Nattfarinn commented Aug 14, 2024

Do not merge: TEMP commit inside (ref. 46e0e4a ) (commit removed)

This PR is not meant to be merged. Code will be moved to documentation once review and QA is done.

Branch base is set to 98b7b50 commit to make QA life a bit easier.

Question Answer
JIRA issue IBX-8562
Type improvement
Target Ibexa version v3.3
BC breaks no

Command removes duplicated database entries after faulty IBX-5388 fix (base commit for this branch).

Usage:

  ibexa:content:remove-duplicate-fields [options]

Options:

  -b, --batch-size[=BATCH-SIZE]          Number of attributes affected per iteration [default: 10000]
  -i, --max-iterations[=MAX-ITERATIONS]  Max iterations count (default or -1: unlimited) [default: -1]
  -s, --sleep[=SLEEP]                    Wait between iterations, in milliseconds [default: 0]
  -f, --force                            Force operation (implies non-interactive mode)

Code is compatible with PHP 7.3+. There should be no need to provide separate command for upper DXP versions (4.6 and above) with code adjusted to newer PHP versions and coding standards.

@micszo micszo self-assigned this Aug 16, 2024
@mnocon mnocon added the Doc needed The changes require some documentation label Aug 19, 2024
@vidarl
Copy link
Member

vidarl commented Aug 20, 2024

I think it is a bad idea not to merge and release the script in the product.
User would need to download it and place it in src/Command/ then?

Running this command would be a one-off, so we can make the command hidden : https://symfony.com/doc/current/console/hide_commands.html

That will prevent the command from being displayed by console list command and confuse users later.. Command could be removed from 5.0 again IMO

@adamwojs

@mnocon
Copy link
Member

mnocon commented Aug 20, 2024

I didn't know about the hidden Command feature - and I like Vidar's suggestion a lot.

It would greatly improve developer experience for people executing the update (no manual steps needed) without polluting the ibexa command namespace (which we wanted to avoid by having the Command in the doc)

@alongosz
Copy link
Member

The problem with that is that it doesn't scale properly. Command might be hidden, true, but if we wanted for every upgrade to have such command, the DI container would get inflated with commands. We already have a lot of them. We need a streamlined solution that is able to execute such updates via e.g. ibexa:upgrade. Planned and postponed for ages.

@mnocon
Copy link
Member

mnocon commented Aug 20, 2024

The problem with that is that it doesn't scale properly. Command might be hidden, true, but if we wanted for every upgrade to have such command, the DI container would get inflated with commands. We already have a lot of them. We need a streamlined solution that is able to execute such updates via e.g. ibexa:upgrade. Planned and postponed for ages.

You mention the DI container getting inflated with Commands as the main argument against it - but what is the real cost of it? The container taking miliseconds longer to build?

Also we're not discussing every future update here - only this specific case where @Nattfarinn decided that it's better to use a Command than a SQL script.

For me the hidden command is a good tradeoff, because we gain a lot in developer experience for the Partners - they don't have to go through all the cumbersome steps (create files manually, add service definitions) to fix an issue we're responsible for.

@alongosz
Copy link
Member

alongosz commented Aug 20, 2024

You mention the DI container getting inflated with Commands as the main argument against it - but what is the real cost of it? The container taking miliseconds longer to build?
Also we're not discussing every future update here - only this specific case

I'm talking about the scale. You cannot say that this is only for this case, because it sets precedence. Or rather in this case continues anti-pattern we've used before. We already have a lot of commands that could be wrapped into one streamlined solution.

However, if you really want it in the product, it should follow patterns existing in product:

  1. Move all SQL to a Gateway.
  2. Add integration tests against that gateway.

This adds significant amount of work to it which could be better spent on a generic upgrade solution. But yes, still much less work to fix this individually than work on a generic one.

For the Official Documentation the solution was fine due to its compactness.

So, to sum up, I'd be fine with this command being hidden and in the product, but it still needs to have layer separation in that case.

@Steveb-p
Copy link
Contributor

Steveb-p commented Aug 20, 2024

Theoretical SQL fix for the issue:

PostgreSQL:

DELETE FROM ezcontentobject_attribute
WHERE ezcontentobject_attribute.id IN (
    SELECT ea_duplicate.id FROM ezcontentobject_attribute AS ea_duplicate
        JOIN ezcontentobject_attribute AS ea ON
            ea.version = ea_duplicate.version
            AND ea.language_id = ea_duplicate.language_id
            AND ea.contentclassattribute_id = ea_duplicate.contentclassattribute_id
            AND ea.contentobject_id = ea_duplicate.contentobject_id
        WHERE ea.id < ea_duplicate.id
)

MySQL:

DELETE ea_duplicate FROM ezcontentobject_attribute AS ea_duplicate
     JOIN ezcontentobject_attribute AS ea ON
        ea.version = ea_duplicate.version
        AND ea.language_id = ea_duplicate.language_id
        AND ea.contentclassattribute_id = ea_duplicate.contentclassattribute_id
        AND ea.contentobject_id = ea_duplicate.contentobject_id
     WHERE ea.id < ea_duplicate.id

(Note: for databases it does not matter if condition is part of JOIN or WHERE)

Providing this in case it's valuable, and so it does not get lost in private messages.

Yes, I've tested this manually on databases, and SQL is correct (with their respective language differences). It should give the same result as the above command, but I haven't checked as I don't currently have an instance of Ibexa DXP with that particular issue.

@vidarl
Copy link
Member

vidarl commented Aug 20, 2024

You cannot say that this is only for this case, because it sets precedence.

Well, in that case we already have precedence as this is what we have done in the past already. For instance : ibexa/fieldtype-richtext#77

And yes, there we have the gateways, tests etc as you mentioned is needed

@mnocon mnocon removed their request for review August 23, 2024 07:54
Copy link
Member

@alongosz alongosz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamwojs we need to make a decision here, either:

  1. we decide to go with the in-product hidden command and refactor it to have separate gateway layer at least and integration coverage
  2. we go with a regular upgrade script, assuming the SQL provided a few comments above does the job
  3. we put this after final review in doc only (there are IMO valid objections in the comments above).

@micszo
Copy link
Member

micszo commented Aug 28, 2024

ad.2.
A downside of this approach is no support for batches and iterations. Which could be an issue with a significant number of duplicates (in best case, no monitoring of progress).

Update: In my tests with 10, 100, 1000 duplicates there is no problematic difference in time between command and direct SQL. With 10k duplicates the command takes seconds and direct SQL several minutes. With 100k duplicates I started the direct SQL in the afternoon and interrupted execution in the morning. With the command 100k is still a matter of seconds.

(In theory this could be related to individual DB config. But then we would have to assume partners would cover tweaking that.)

@micszo
Copy link
Member

micszo commented Sep 3, 2024

Summing up last meeting. Got following error on a big DB from a live project (14M records):

  An exception occurred while executing 'DELETE FROM ezcontentobject_attribute WHERE id IN ()':

  SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near ')' at line 1

with stack trace:

at /Users/michalszoltysek/Downloads/Ibexa-project_latest_repo_22_august_take1/vendor/doctrine/dbal/lib/Doctrine/DBAL/Driver/AbstractMySQLDriver.php:98
 Doctrine\DBAL\Driver\AbstractMySQLDriver->convertException() at /Users/michalszoltysek/Downloads/Ibexa-project_latest_repo_22_august_take1/vendor/doctrine/dbal/lib/Doctrine/DBAL/DBALException.php:182
 Doctrine\DBAL\DBALException::wrapException() at /Users/michalszoltysek/Downloads/Ibexa-project_latest_repo_22_august_take1/vendor/doctrine/dbal/lib/Doctrine/DBAL/DBALException.php:159
 Doctrine\DBAL\DBALException::driverExceptionDuringQuery() at /Users/michalszoltysek/Downloads/Ibexa-project_latest_repo_22_august_take1/vendor/doctrine/dbal/lib/Doctrine/DBAL/Connection.php:2226
 Doctrine\DBAL\Connection->handleExceptionDuringQuery() at /Users/michalszoltysek/Downloads/Ibexa-project_latest_repo_22_august_take1/vendor/doctrine/dbal/lib/Doctrine/DBAL/Connection.php:1538
 Doctrine\DBAL\Connection->executeStatement() at /Users/michalszoltysek/Downloads/Ibexa-project_latest_repo_22_august_take1/vendor/doctrine/dbal/lib/Doctrine/DBAL/Query/QueryBuilder.php:216
 Doctrine\DBAL\Query\QueryBuilder->execute() at /Users/michalszoltysek/Downloads/Ibexa-project_latest_repo_22_august_take1/vendor/ezsystems/ezplatform-kernel/src/bundle/Core/Command/VirtualFieldDuplicateFixCommand.php:254
 Ibexa\Bundle\Core\Command\VirtualFieldDuplicateFixCommand->deleteAttributes() at /Users/michalszoltysek/Downloads/Ibexa-project_latest_repo_22_august_take1/vendor/ezsystems/ezplatform-kernel/src/bundle/Core/Command/VirtualFieldDuplicateFixCommand.php:119
 Ibexa\Bundle\Core\Command\VirtualFieldDuplicateFixCommand->execute() at /Users/michalszoltysek/Downloads/Ibexa-project_latest_repo_22_august_take1/vendor/symfony/console/Command/Command.php:298
 Symfony\Component\Console\Command\Command->run() at /Users/michalszoltysek/Downloads/Ibexa-project_latest_repo_22_august_take1/vendor/symfony/console/Application.php:1058
 Symfony\Component\Console\Application->doRunCommand() at /Users/michalszoltysek/Downloads/Ibexa-project_latest_repo_22_august_take1/vendor/symfony/framework-bundle/Console/Application.php:96
 Symfony\Bundle\FrameworkBundle\Console\Application->doRunCommand() at /Users/michalszoltysek/Downloads/Ibexa-project_latest_repo_22_august_take1/vendor/symfony/console/Application.php:301
 Symfony\Component\Console\Application->doRun() at /Users/michalszoltysek/Downloads/Ibexa-project_latest_repo_22_august_take1/vendor/symfony/framework-bundle/Console/Application.php:82
 Symfony\Bundle\FrameworkBundle\Console\Application->doRun() at /Users/michalszoltysek/Downloads/Ibexa-project_latest_repo_22_august_take1/vendor/symfony/console/Application.php:171
 Symfony\Component\Console\Application->run() at /Users/michalszoltysek/Downloads/Ibexa-project_latest_repo_22_august_take1/vendor/symfony/runtime/Runner/Symfony/ConsoleApplicationRunner.php:54
 Symfony\Component\Runtime\Runner\Symfony\ConsoleApplicationRunner->run() at /Users/michalszoltysek/Downloads/Ibexa-project_latest_repo_22_august_take1/vendor/autoload_runtime.php:35
 require_once() at /Users/michalszoltysek/Downloads/Ibexa-project_latest_repo_22_august_take1/bin/console:11

@vidarl
Copy link
Member

vidarl commented Sep 11, 2024

We have several customers waiting for this fix. Could we get some progress on it?
@adamwojs, please see andrew's comment #410 (review) too

@mateuszbieniek

@Nattfarinn Nattfarinn force-pushed the performance-fix-command branch from bdeae0f to 46e0e4a Compare October 1, 2024 08:21
Copy link
Member

@micszo micszo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue found on provided database has been resolved.

I have not found other issues.

@tischsoic tischsoic force-pushed the performance-fix-command branch from ea5186c to b552a54 Compare October 22, 2024 07:49
@tischsoic
Copy link

CI green before removing temp commit. ✅

Copy link

@Steveb-p Steveb-p merged commit b19be8d into 1.3 Oct 22, 2024
24 checks passed
@Steveb-p Steveb-p deleted the performance-fix-command branch October 22, 2024 08:46
@mnocon mnocon removed the Doc needed The changes require some documentation label Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

8 participants