Segmenting Performance - Large List

eggerda

Member
Hello! We are about to start migrating our data to MailWizz for a go live. A question before we do this.

Segmenting is extremely important to our business - and I want to make sure performance will not be impacted too much the way I plan to use segmenting.

One list will be the "catch all" for all subscribers, and it has a custom text field which is updated each time a subscribers moves to a different list, etc. I've defined standards for how the data in this field populates/changes - all for segmenting purposes.

Here's an example of a segment I've defined:

CUSTOM FIELD: "Mastering Customer" NOT CONTAINS Value "YES"
CUSTOM FIELD: "Current List" CONTAINS Value "2_smm_broadcast"
CUSTOM FIELD: "Current List" NOT CONTAINS Value "_IN_"

You see - there are 2 rules using portions of the same field.

The question: with a list of 1 million in this "catch all" list - will this segment work so all emails will go out within a couple hours maximum? (i.e. in reading MailWizz forums, I'm worried segmenting may hurt performance drastically - but they are old posts. )

It would be bad to migrate over and find out segmenting in this way would crash MailWizz and/or sending would take 2 days!

Can anyone speak to the performance of segmenting in this way? Are we "OK" here?

Thank you!

Dan
 
Segmentation in MailWizz is not as slow as it used to be a few years back, we constantly make improvements to it, but it is still slow compared to other operations, unfortunately.

Now, how slow it will be, depends entirely on the data set and the combined conditions you apply on the segment.

If you filter by one column and that one column happens to be the email address, then there will be no performance hit as we do the query on the list subscriber table, which is very indexed and fast.
But if you include other custom fields, then mailwizz has to filter down in the list_field_value table, which is one of the largest table in the application, if not the largest, and searching hundred million values will be slow, no matter how you put it. Also, each filter you apply, increases this slowness exponentially.

This is a trade-off of the application, it allows you to quickly manage custom fields at the price of a slower search, and unfortunately this is so rooted in the application that this is not something we will be able to change in the near future.

How can you alleviate this? There a few things you can do, and this is their order:
1. Use latest version of the database server (mariadb, mysql, percona db).
2. Use a separate server only for the database. In same network as the web server.
3. Make sure your database server has plenty processing power and memory.
4. Properly configure the database server to make use of the resources.

A few words related to the amount of RAM you need.
Your database server must have lot of ram so the whole dataset will fit in the RAM, while the database server does the search.
How much RAM? This again depends on your dataset. You might be okay with 16GB, 32GB or with 256GB or more, you don't know till you actually have so much data that your current setup cannot handle it.

While the above do seem a bit pessimistic, it helps to be realistic related to what to expect if you rely on segmentation.
 
Back
Top