Test randomness #125

IonBazan · 2020-12-09T07:14:21Z

Summary

This topic was brought up by @krsriq in #82 and partially addressed in #90. This issue is to discuss possible solutions to make sure our tests are properly checking the library behavior.

Versions

	Version
PHP	ALL
`fakerphp/faker`	`main`

Self-enclosed code snippet for reproduction

Tests are now seeded with 1 to assure test result consistency but that opens another potential issue:

Faker/test/Faker/Provider/PaymentTest.php

Lines 36 to 39 in 0d72e9f

    
           public function testCreditCardTypeReturnsValidVendorName() 
        
           { 
        
               self::assertContains($this->faker->creditCardType, ['Visa', 'Visa Retired', 'MasterCard', 'American Express', 'Discover Card']); 
        
           }

In this code, $this->faker->creditCardType will always return MasterCard because generator is seeded with 1 before each test method execution so removing the last element in array will still make the test pass every time.

Possible solutions

While testing random data generation in a reproducible way is difficult, we should make sure our tests are working properly.

Retry tests

I have tried experimenting with PHPUnit --repeat 100 flag to repeat the test several times but setUp (and therefore seed(1)) is called before each repeat too.

Mark tests that require seeding

Another approach would be to introduce a custom @seed <int> annotation for test methods or classes that explicitly require seeding before test to get specific results:
https://github.com/FakerPHP/Faker/blob/main/test/Faker/Provider/ja_JP/InternetTest.php and https://github.com/FakerPHP/Faker/blob/main/test/Faker/Provider/uk_UA/PersonTest.php

This approach together with --repeat 100 flag in one of our test matrix should allow us to make sure all the tests are actually making sure that generated data is always correct.

Other notes

How to make sure that each element is returned at least once after N repetitions?

The text was updated successfully, but these errors were encountered:

stale · 2021-01-07T08:21:09Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 1 week if no further activity occurs. Thank you for your contributions.

pimjansen · 2021-01-07T09:20:02Z

@localheinz also noticed we could use the unique generator to exhaust the resource pool. Problem is that this will consume a lot of RAM though.

Im not a fan of just looping hundreds of times. I fact we know for sure we can not guarantee this 100% at this point however also the risk of issues is pretty low right?

localheinz · 2023-09-06T08:27:22Z

@IonBazan

Any test that tests that asserts that a provider, given a specific seed of the generator, returns a specific value is bound to fail when the randomization engine changes - see #691, for example.

The question is, what are we going to do when the randomization engine changes? Fix all the failing tests by adjusting the expectations? Seems painful.

For example, I do not see any value in asserting that $faker->email() returns a specific email address. I would not recommend to anyone using fakerphp/faker to rely on $faker->email() to return a specific value. All that should matter to them is that the returned value is semantically correct, that is, that the returned value is an email address.

On a separate note, the --repeat option has been removed from phpunit/phpunit:10.0.0.

curry684 · 2024-01-10T13:43:48Z

You should differentiate between what you are testing for in a library like this. Because indeed:

For example, I do not see any value in asserting that $faker->email() returns a specific email address.

Not just no value, it's downright wrong to test like that, as you are testing something that, when it fails, does not imply anything is wrong with the code. If it returns [email protected] today and [email protected] tomorrow the code is still working fine - it returned an email address. The only proper test for the email function is $this->assertSame($email, filter_var($email, VALIDATE_EMAIL));

It does however make sense to test repeatable seeded determinism. email is NOT required to always return the same email address between runs, on different computers or operating systems. It is however required to return same email address when run with the same seed in the same run in the same environment. So yes, it does make sense to get a random seed value during testing and use that to test repeated calls.

IonBazan mentioned this issue Dec 11, 2020

test seeding and repeat #134

Closed

4 tasks

pimjansen added the bug Something isn't working label Dec 24, 2020

IonBazan mentioned this issue Jan 4, 2021

[Core] Improve checks in blood tests #245

Merged

4 tasks

stale bot added the lifecycle/stale label Jan 7, 2021

pimjansen added pinned and removed lifecycle/stale labels Jan 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test randomness #125

Test randomness #125

IonBazan commented Dec 9, 2020 •

edited

Loading

stale bot commented Jan 7, 2021

pimjansen commented Jan 7, 2021

localheinz commented Sep 6, 2023

curry684 commented Jan 10, 2024 •

edited

Loading

Test randomness #125

Test randomness #125

Comments

IonBazan commented Dec 9, 2020 • edited Loading

Summary

Versions

Self-enclosed code snippet for reproduction

Possible solutions

Retry tests

Mark tests that require seeding

Other notes

stale bot commented Jan 7, 2021

pimjansen commented Jan 7, 2021

localheinz commented Sep 6, 2023

curry684 commented Jan 10, 2024 • edited Loading

IonBazan commented Dec 9, 2020 •

edited

Loading

curry684 commented Jan 10, 2024 •

edited

Loading