@frm.mwz, Thanks for your reply, messaged you all the details and we're running new campaigns
Following up on the switching/quota errors, another test with several DS set to have all the same limits and the cron to match that limit did not work in terms of switching server at (with that same limit), but it overran the quota and switched significantly later randomly.
It seems to set "giveups" as 'sent' "yes" (bug), seems to miscount them for quota usage (bug), and then continues to use the same DS even though it gets 4xx errors a couple hundred times more, and then (randomly) switches between the other available DS.
Perhaps it would be good to take care of the 4xx errors by quickly dropping (marking it as used up for the current period) the DS where they came from, categorize those correctly as 'sent' "no" and then keep them in the queue and send them at the end of the campaign again with the next cron run after the whole camp is through (I thought it did that, but could not see it), and then with another DS (if available).
Further differentiation by 4xx error might be useful.
# "421-Service unavailable" / "421 Too many connections." could be used to reduce connections, if possible (perhaps very hard to do during pcntl).
# "450-Requested mail action not taken: mailbox unavailable" / "450 Mail send limit exceeded." could be used to mark the DS as "unavailable" since the quota is used up for that period (this seems like the easiest, but could still be hard while pcntl is running).
# "451 Requested action aborted: local error in processing" could used to slow down as well, since it is oft related to DNS overload.
While the above may seem not really a mwz problem, it is in the end something the mwz user needs to cope with, as the switching needs to work under various (not just the above circumstances), and as such it would really be good to resolve this step-by-step.
Any better ideas/solutions most welcome!
Thx
Might amend this post with further observations/suggestions.