Skip to content

Commit c459a9e

Browse files
ianksChrisBr
authored andcommitted
Don't count requeued test failures toward circuit breaker
report_failure! was called before the requeue check, so successfully requeued tests incremented the consecutive failure counter. With max_consecutive_failures=N and N+ deterministic failures that are all requeued, the circuit breaker fired prematurely and the worker exited, stranding requeued tests in the queue with no worker to process them. Now report_failure!/report_success! is only called when a test genuinely fails or passes. Requeued tests are invisible to the circuit breaker — they don't increment or reset the consecutive failure counter.
1 parent e4d29d4 commit c459a9e

File tree

4 files changed

+58
-7
lines changed

4 files changed

+58
-7
lines changed

ruby/lib/minitest/queue.rb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -180,14 +180,14 @@ def handle_test_result(reporter, example, result)
180180
# When we do a bisect, we don't care about the result other than the test we're running the bisect on
181181
result.mark_as_flaked!
182182
failed = false
183-
elsif failed
184-
queue.report_failure!
185-
else
186-
queue.report_success!
187183
end
188184

189185
if failed && CI::Queue.requeueable?(result) && queue.requeue(example)
190186
result.requeue!
187+
elsif failed
188+
queue.report_failure!
189+
else
190+
queue.report_success!
191191
end
192192
reporter.record(result)
193193
end

ruby/test/ci/queue/redis_test.rb

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,27 @@ def test_initialise_from_rediss_uri
270270
assert_instance_of CI::Queue::Redis::Worker, queue
271271
end
272272

273+
def test_circuit_breaker_does_not_count_requeued_failures
274+
# Bug: report_failure! was called before the requeue check, so successfully
275+
# requeued tests incremented the consecutive failure counter. With
276+
# max_consecutive_failures=3 and 3+ deterministic failures that are all
277+
# requeued, the circuit breaker fired prematurely and the worker exited,
278+
# stranding requeued tests in the queue with no worker to process them.
279+
queue = worker(1, max_requeues: 5, requeue_tolerance: 1.0, max_consecutive_failures: 3)
280+
281+
# All tests fail (deterministic failures that get requeued)
282+
tests_processed = poll(queue, false)
283+
284+
# With 4 tests and max_requeues=5, all tests should be processed multiple
285+
# times (requeued after each failure). The circuit breaker should NOT fire
286+
# because requeued failures are transient, not consecutive "real" failures.
287+
assert tests_processed.size > TEST_LIST.size,
288+
"Expected tests to be requeued and re-processed, but only #{tests_processed.size} " \
289+
"tests were processed (circuit breaker likely fired prematurely). " \
290+
"Circuit breaker open? #{queue.config.circuit_breakers.any?(&:open?)}"
291+
end
292+
293+
273294
private
274295

275296
def shuffled_test_list

ruby/test/integration/minitest_redis_test.rb

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -296,9 +296,35 @@ def test_circuit_breaker
296296
)
297297
end
298298

299+
# Requeued failures don't count toward the circuit breaker. The worker
300+
# processes more tests than the old behavior (which stopped at 3).
301+
# Exact count depends on queue ordering, but must be > 3.
302+
output = normalize(out.lines.last.strip)
303+
ran_count = output.match(/Ran (\d+) tests/)[1].to_i
304+
assert ran_count > 3,
305+
"Expected more than 3 tests to run (requeues shouldn't trip breaker), got: #{output}"
306+
end
307+
308+
def test_circuit_breaker_without_requeues
309+
out, err = capture_subprocess_io do
310+
system(
311+
@exe, 'run',
312+
'--queue', @redis_url,
313+
'--seed', 'foobar',
314+
'--build', '1',
315+
'--worker', '1',
316+
'--timeout', '1',
317+
'--max-requeues', '0',
318+
'--max-consecutive-failures', '3',
319+
'-Itest',
320+
'test/failing_test.rb',
321+
chdir: 'test/fixtures/',
322+
)
323+
end
324+
299325
assert_equal "This worker is exiting early because it encountered too many consecutive test failures, probably because of some corrupted state.\n", err
300326
output = normalize(out.lines.last.strip)
301-
assert_equal 'Ran 3 tests, 3 assertions, 0 failures, 0 errors, 0 skips, 3 requeues in X.XXs', output
327+
assert_equal 'Ran 3 tests, 3 assertions, 3 failures, 0 errors, 0 skips, 0 requeues in X.XXs', output
302328
end
303329

304330
def test_redis_runner

ruby/test/support/queue_helpers.rb

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,12 @@ def poll(queue, success = true)
99
test_order << test
1010
failed = !(success.respond_to?(:call) ? success.call(test) : success)
1111
if failed
12-
queue.report_failure!
13-
queue.requeue(test) || queue.acknowledge(test.id)
12+
if queue.requeue(test)
13+
# Requeued — don't report to circuit breaker
14+
else
15+
queue.report_failure!
16+
queue.acknowledge(test.id)
17+
end
1418
else
1519
queue.report_success!
1620
queue.acknowledge(test.id)

0 commit comments

Comments
 (0)