When running integration tests with TEST_RUN_UNTIL_FAILURE=true the integration test framework won't stop running once a failure happens, it just keeps re-running tests, which causes two problems:
- Important test artifacts left by the failed test for debugging are removed from the VM.
- We might miss that a test has failed if we're not watching the tests run closely
The specific command I was using:
TEST_RUN_UNTIL_FAILURE=true AGENT_VERSION="9.4.0-SNAPSHOT" TEST_PLATFORMS="windows/amd64" mage -v integration:single TestFilebeatReceiverLogAsFilestream
Some logs where we can see the failed test, 0 successful tests and another test run starting:
>>> (windows-amd64-2022-default) Test output (non-sudo) (stderr): Running target: Integration:TestOnRemote
>>> (windows-amd64-2022-default) Test output (non-sudo) (stderr): exec: go "list" "-m"
>>> (windows-amd64-2022-default) Test output (non-sudo) (stderr): Found Elastic Beats dir at C:\Users\windows\agent
>>> (windows-amd64-2022-default) Test output (non-sudo) (stderr): Running dependency: main.Build.TestFakeComponent
>>> (windows-amd64-2022-default) Test output (non-sudo) (stderr): exec: go "build" "-v" "-o" "C:\\Users\\windows\\agent\\pkg\\component\\fake\\component\\component.exe" "C:\\Users\\windows\\agent\\pkg\\component\\fake\\component"
>>> (windows-amd64-2022-default) Test output (non-sudo) (stderr): Running dependency: github.com/elastic/elastic-agent/dev-tools/mage.InstallGoTestTools
>>> (windows-amd64-2022-default) Test output (non-sudo) (stderr): exec: go "install" "gotest.tools/gotestsum"
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): >> go test: remote-windows-amd64-2022-default.ess Testing
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): >> ARGS: remote-windows-amd64-2022-default.ess Command: gotestsum --no-color --junitfile-hide-skipped-tests -f standard-verbose --junitfile build/TEST-go-remote-windows-amd64-2022-default.ess.xml --jsonfile build/TEST-go-remote-windows-amd64-2022-default.ess.out.json -- -tags integration -test.shuffle on -test.timeout 2h0m0s -test.run ^(TestFilebeatReceiverLogAsFilestream)$ github.com/elastic/elastic-agent/testing/integration/ess
>>> (windows-amd64-2022-default) Test output (non-sudo) (stderr): exec: gotestsum --no-color --junitfile-hide-skipped-tests -f standard-verbose --junitfile build/TEST-go-remote-windows-amd64-2022-default.ess.xml --jsonfile build/TEST-go-remote-windows-amd64-2022-default.ess.out.json -- -tags integration -test.shuffle on -test.timeout 2h0m0s -test.run ^(TestFilebeatReceiverLogAsFilestream)$ github.com/elastic/elastic-agent/testing/integration/ess
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): exec: gotestsum --no-color --junitfile-hide-skipped-tests -f standard-verbose --junitfile build/TEST-go-remote-windows-amd64-2022-default.ess.xml --jsonfile build/TEST-go-remote-windows-amd64-2022-default.ess.out.json -- -tags integration -test.shuffle on -test.timeout 2h0m0s -test.run ^(TestFilebeatReceiverLogAsFilestream)$ github.com/elastic/elastic-agent/testing/integration/ess
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): -test.shuffle 1775682694358783700
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): === RUN TestFilebeatReceiverLogAsFilestream
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): fixture.go:394: Extracting artifact elastic-agent-9.4.0-SNAPSHOT-windows-x86_64.zip to C:\Users\windows\AppData\Local\Temp\TestFilebeatReceiverLogAsFilestream1675914859
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): fixture.go:412: Completed extraction of artifact elastic-agent-9.4.0-SNAPSHOT-windows-x86_64.zip to C:\Users\windows\AppData\Local\Temp\TestFilebeatReceiverLogAsFilestream1675914859
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): fixture.go:1184: Components were not modified from the fetched artifact
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): otel_log_as_filestream_test.go:194:
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): Error Trace: C:/Users/windows/agent/testing/integration/ess/otel_log_as_filestream_test.go:71
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): C:/Program Files/Go/src/runtime/asm_amd64.s:1693
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): Error: Not equal:
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): expected: 100
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): actual : 150
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): Messages: expecting 100 events, got 150
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): otel_log_as_filestream_test.go:194:
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): Error Trace: C:/Users/windows/agent/testing/integration/ess/otel_log_as_filestream_test.go:60
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): C:/Users/windows/agent/testing/integration/ess/otel_log_as_filestream_test.go:194
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): Error: Condition never satisfied
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): Test: TestFilebeatReceiverLogAsFilestream
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): Messages: did not find the expected number of events
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): fixture.go:1602: Temporary directory "C:\\Users\\windows\\AppData\\Local\\Temp\\TestFilebeatReceiverLogAsFilestream1675914859" preserved for investigation/debugging
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): fs.go:101: Temporary directory saved: C:\Users\windows\agent\build\TestFilebeatReceiverLogAsFilestream2504705962
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): --- FAIL: TestFilebeatReceiverLogAsFilestream (109.06s)
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): FAIL
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): FAIL github.com/elastic/elastic-agent/testing/integration/ess 109.252s
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): === Failed
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): === FAIL: testing/integration/ess TestFilebeatReceiverLogAsFilestream (109.06s)
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): fixture.go:394: Extracting artifact elastic-agent-9.4.0-SNAPSHOT-windows-x86_64.zip to C:\Users\windows\AppData\Local\Temp\TestFilebeatReceiverLogAsFilestream1675914859
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): fixture.go:412: Completed extraction of artifact elastic-agent-9.4.0-SNAPSHOT-windows-x86_64.zip to C:\Users\windows\AppData\Local\Temp\TestFilebeatReceiverLogAsFilestream1675914859
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): fixture.go:1184: Components were not modified from the fetched artifact
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): otel_log_as_filestream_test.go:194:
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): Error Trace: C:/Users/windows/agent/testing/integration/ess/otel_log_as_filestream_test.go:71
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): C:/Program Files/Go/src/runtime/asm_amd64.s:1693
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): Error: Not equal:
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): expected: 100
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): actual : 150
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): Messages: expecting 100 events, got 150
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): otel_log_as_filestream_test.go:194:
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): Error Trace: C:/Users/windows/agent/testing/integration/ess/otel_log_as_filestream_test.go:60
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): C:/Users/windows/agent/testing/integration/ess/otel_log_as_filestream_test.go:194
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): Error: Condition never satisfied
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): Test: TestFilebeatReceiverLogAsFilestream
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): Messages: did not find the expected number of events
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): fixture.go:1602: Temporary directory "C:\\Users\\windows\\AppData\\Local\\Temp\\TestFilebeatReceiverLogAsFilestream1675914859" preserved for investigation/debugging
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): fs.go:101: Temporary directory saved: C:\Users\windows\agent\build\TestFilebeatReceiverLogAsFilestream2504705962
>>> (windows-amd64-2022-default) Test output (non-sudo) (stdout): DONE 1 tests, 1 failure in 109.255s
>>> (windows-amd64-2022-default) Test output (non-sudo) (stderr): Error: go test returned a non-zero value: exit status 1
>>> (windows-amd64-2022-default) non-sudo tests failed: Process exited with status 1
>>> Testing completed (0 successful)
>>> Console output written here: build/TEST-go-integration.out
>>> Console JSON output written here: build/TEST-go-integration.out.json
>>> JUnit XML written here: build/TEST-go-integration.xml
>>> Diagnostic output (if present) here: build/diagnostics
exec: go "list" "-m"
Found Elastic Beats dir at /home/tiago/devel/elastic-agent
>>> Creating zip archive of repo to send to remote hosts
>>> (windows-amd64-2022-default) Starting SSH; connect with `ssh -i /home/foo/bar/elastic-agent/.integration-cache/id_rsa windows@1.2.3.4`
>>> (windows-amd64-2022-default) Connected over SSH
>>> (windows-amd64-2022-default) Copying repo
>>> (windows-amd64-2022-default) Running make mage and prepareOnRemote
When running integration tests with
TEST_RUN_UNTIL_FAILURE=truethe integration test framework won't stop running once a failure happens, it just keeps re-running tests, which causes two problems:The specific command I was using:
Some logs where we can see the failed test, 0 successful tests and another test run starting: