The geometric sequence of limits will continue on forever, so eventually when you get tired of waiting (about a billion on my computer), you can just stop it. Another efficiency trick of note: we avoid passing the limit in as a parameter to the inner helper routine, but instead supply the limit via the lexical scope. Likewise, the accumulators are referenced lexically, so only the triples themselves need to be passed downward, and nothing needs to be returned.
Here is an alternate version that avoids naming any scalars that can be handled by vector processing instead. Using vectorized ops allows a bit more potential for parallelization in theory, but techniques like the use of complex numbers to add two numbers in parallel, and the use of gather/take generate so much overhead that this version runs 70-fold slower than the previous one.