Should we split the campaign tables into multiple smaller ones?

Status
Not open for further replies.

twisted1919

Administrator
Staff member
Hey guys,

Now as i am polishing the 1.3.7 version, which will contain small fixes, i am thinking that in next big release, 1.3.8 we should actually start doing some massive changes as in how the campaign stats and logs are stored.

As you know, right now the biggest problem is the campaign_delivery_log table which can grow to few hundred GB's of data in some cases which makes mysql queries damn slow and which means selecting emails for delivery will get slower too.

I am thinking we could have each campaign have it's own delivery table, this way we can speed up things a lot i believe.
This will introduce some challenges/complexity into system in order to have it implemented properly but i think we can do it and the end result will be a good one.

We will start with the campaign_delivery_log table and after you will install the updates, all newly created campaigns will have their own separate tables. If this all works good, we will then do the same for the opens and click tracking tables, and finally, later, for list subscribers as well.

Please note that this is a huge change but i intend to keep backward compatibility so that future upgrades will not affect existing campaigns/lists/etc.

I am open to discussions about this as your feedback always matters as you already know.
Thank you.
 

pradeep sharma

Active Member
@twisted1919

I Would like you to Consider following

1. The Option of putting campaign delivery log & delivery server usage log to a new table is for each campaign is a very good idea. but if we can give admin an option of putting delivery server log tables and usage logs table to a new DATABase servers (if wanted) will also be a another good idea which will further divide the load in 2 DB servers. and also we we can add and enable a new DB server and for new logs and would keep old delivery server for read query only no new records will be added to inactive servers. so this will give more flexibility of adding more DATAbase Storage over time.

2. Would be good if we can give an option/settings
2.1 To delete delivery server logs tables & Usage Table which are older than 3 months/6 months (just an example). this will also delete all data of those campaign but we can add a new flat Table which will keep all important statics in just a single record for each campaigns we can include fields like openrate/count, bounce rate/count etc into this table for just a record keeping purpose. As an ESP We would not like to keep any list subscribers data like openers/clickers etc which older than 6 months. But we can keep a statistics of each campaign in new table before deleting its data. this data will also useful at the time of billing/reporting. This would solve our problem of loosing important stats if we loos its data.
2.2 Just to reduce load on a single database servers if we can give inbuilt facility to split read & write query.

2.3 Throughput is also a major concern in terms of scaleability. Can We consider of dividing the load on multiple servers for send-campaign-cron. We tested our 2 Quadcore CPU (8 Core ) DDR3 16GB we reached maximum at 10K per minute but consumed all resources(CPU)
We would like you to kindly consider to include or built a functionality to distribute entire load on multiple servers at the time of sending campaigns.
2.4 We tested our Mailmerge feature of PowerMTA with PHP and Python today and results are surprising .We got a throughput of 20K per minute with 1% of CPU Usage if we send it via Mailmerge feature and tomorrow i ll test pickup directory method to increase my throughput. We strongly recommends you to include a native mailmerge feature of PowerMTA to generate a final merged message file. its relatively easy to implement. We can offer our PowerMTA servers for testing this feature. all parsing can be done by powerMTA and PHP/CPU would be free to do more other stuffs.

3. Mailwizz is designed in for single install purpose but what if some1 wanted to run more mailwizz install(1 for one big customer and another for another big customer) in production with many PowerMTA servers or Postfix servers.

Problem 1 : How we will identify bounces of each mailwizz install if both mailwizz have common sending & bounces servers.
Problem 2 : How we will identify feedback loop of each mailwizz install. in feedback ISP generally send feedback to single email for all kind of mailling.
Problem 3 : tracking domain (solved) as we need to put tracking domain different for each mailwizz install.
Problem 4 : how to identify regular reply message from bounces

Possible solutions: is if we can include the mailwizz host URL to base encode and also include it in x-header. and at bounce processing & feedback loop base decode the URL to original state to identify the install location for throwing bounces/feedbackloop.
We are interested in purchasing more licences but due to above problem we are not able to do that.
So i would suggest you to make a another branch of Mailwizz which may be called Mailwizz-Cluster with more licence fee and advance features like controlling/operating/managing distributed mailwizz install and distributed database and using 1 single interface.

4. Its about reporting : More detailed reporting is anticipated in next major version (We can wait for it but its required)
4.1. ISP based reporting of a campaigns and a list for clickers/openers/abuse/unsubscribes including Pie charts, morris charts. i recommends to consider zoom chart
4.2 Time pattern based reporting in 24 hours range for entire specified date range to guess best time of campaign performance.
4.3 Device based reporting/ OS based reporting/Browser based reporting
4.4 add to compare feature to compare 2 or more campaigns.

5. Solve a very earlier reported problem .i.e if we have a millions of records all dashboards loads very slow(customers/admin/listes) in one mailwizz.

Regards
 
  • Like
Reactions: VVT

pradeep sharma

Active Member
@twisted1919 More Improvement can be done for logging..

1. When we send a campaign we found that logging is done for each subscriber and for each subscriber a SQL Querry is issued .
My Suggestion is if you can improve it and loging can be done for entire emails relased for each batch or per change of delivery server etc.
to be more clear we are issueing if we are relayign 10000 Email per minute we are actually issue 10000+ SQL Querry to insert 10000 records information in SQL. we can do it in Batches of 100, or 1000, or batch size of email defined in send campaign settings, or for each change of delivery server.
it would be in the same way as suggested by @VVT for email per connetion and emails per connetion setting in our latest version have improved delivery almost twice.

@VVT Kindly Comments and suggest and join this discussion..
 

frm.mwz

Well-Known Member
@twisted1919 More Improvement can be done for logging..

1. When we send a campaign we found that logging is done for each subscriber and for each subscriber a SQL Querry is issued .
My Suggestion is if you can improve it and loging can be done for entire emails relased for each batch or per change of delivery server etc.
to be more clear we are issueing if we are relayign 10000 Email per minute we are actually issue 10000+ SQL Querry to insert 10000 records information in SQL. we can do it in Batches of 100, or 1000, or batch size of email defined in send campaign settings, or for each change of delivery server.
it would be in the same way as suggested by @VVT for email per connetion and emails per connetion setting in our latest version have improved delivery almost twice.

@VVT Kindly Comments and suggest and join this discussion..

It would be generally advisable to minimize any operations that on large scale will slow down mwz speed.
While at the beginning of the app it was more testing and adoption, now seems perhaps a good time to do such optimizations as the app has grown and keeps growing and so does its usage...
 

frm.mwz

Well-Known Member
email per connetion and emails per connetion setting
It is probably worth reading that whole thread to see that the setting was there already but hidden and with a default. Most importantly, only the discussion brought about some more clarity and usage options. But please see for yourself, as the value is in the details ;)
 

VVT

Active Member
@pradeep sharma First off, thanks for the invitation :D

I didn't comment here because I'm not a dev and I don't have much idea about optimizing db queries. Basically I work on server side and I make the optimizations there. So, I'm not sure if combining a few deliveries into a single query would improve things or if it's practically possible or if it ends up in any data integrity violations. On the other thread, we were talking about sending x no.of emails under a "single open session" to avoid any repeated reconnect/authenticate/handshake signals again and again. Here also, similar theory may apply if MW doesn't persist a db conn, but I guess it does. Again, I'm not an expert in this area :).

From an ESP standpoint on a long term vision, what matters (to me at least) most is archiving and deleting old campaigns/logs from the system but keeping the basic statistics (like total sent, open, clicks etc without susbcriber details ) optionally in the system. I was also thinking about using a secondary db server to keep the archived logs, I second your suggestion there :). An archiving mechanism is already in place (now it goes to a diff table on same server), but not sure how confident @twisted1919 is about it and I didn't want to try that in prod yet. It would be really nice if the archiving logs would fly to a different db server. We can dump this data in some cheap hardware and this will speed up the main db operations due to its smaller size. And finally when we decide to delete some aging campaigns from the archive db server, some basic data can be kept w/o subscriber details as an option.

For the clustering part, here is a nice link that actually helps in distributing the load across multiple servers in the lightest way possible. Single point of failure is the only downside I see there. Jfyi.

@twisted1919 So, by the new method, a new table will be created for every new campaign, does that mean I will have 1000 diff tables for 1000 different campaigns ? and if I have 10 customers and each has 1000 campaigns (including test), will that contribute to 10k new tables in the db ? It will look pretty messy and difficult to troubleshoot if that's the case (?). Just a thought. But if the result it's going to produce can nullify these maintenance issues, let's go for it.
 
Last edited:

twisted1919

Administrator
Staff member
So, by the new method, a new table will be created for every new campaign, does that mean I will have 1000 diff tables for 1000 different campaigns ? and if I have 10 customers and each has 1000 campaigns (including test), will that contribute to 10k new tables in the db ? It will look pretty messy and difficult to troubleshoot if that's the case (?). Just a thought. But if the result it's going to produce can nullify these maintenance issues, let's go for it.
Yeah, it would mean that many number of tables, BUT this is fine with mysql as it does not have a table/database limit so in terms of database, this is not an issue. An issue would be when for example you have to add a new column to these tables, then it will take a while to alter each table. Another issue is to get combined stats, since you'd need to join all the tables together to get overall stats.
As i said earlier, this has it's own downsides, BUT i think we can work around them, i just need to get the code going to see what type if issues i will encounter.
 

frm.mwz

Well-Known Member
Yeah, it would mean that many number of tables, BUT this is fine with mysql as it does not have a table/database limit so in terms of database, this is not an issue. An issue would be when for example you have to add a new column to these tables, then it will take a while to alter each table. Another issue is to get combined stats, since you'd need to join all the tables together to get overall stats.
As i said earlier, this has it's own downsides, BUT i think we can work around them, i just need to get the code going to see what type if issues i will encounter.

as one can set mysql settings to have a file per table, maybe the whole thing can be simplified for you @twisted1919

also, the complicated/demanding stats could be put into the background with low priority (if the app needs resources) or run when the server is less strained
 

twisted1919

Administrator
Staff member
as one can set mysql settings to have a file per table, maybe the whole thing can be simplified for you @twisted1919
This would still be problematic. if your tracking table has 10G it doesn't matter if it's on one file or not, access to it will be slower anyway. The idea is that instead of having a 10G table, we have 10 of 1G for example with faster access to them.

also, the complicated/demanding stats could be put into the background with low priority (if the app needs resources) or run when the server is less strained
Yes, that's something i take into consideration as well.
 

OptiBiz1

Active Member
I am also a bit concerned about performance, not that I have any signs of limitations of that today since I sofar mainly use my license for plugin development.

We will introduce a whole concept which, if performance is good, will potentially enable MW to get enterprise customers. It's partially a new admin level with resellers that are a special kind of sub-admins based on "privilege templates" where the resellers will have different rights (potentially all settings in MW + our plugins (e.g. enable a reseller to offer their customers access to our A/B testing capability), except creating new reseller plans and resellers (but there are a lot of other capabilities too with these Resellers)).
This might perhaps introduce performance issues (since there might be more customers and campaigns per server), so your ambition to optimize MW is greatly appreciated. Also the idea of introducing a special bad ass license for clusters/clouds is a good idea. Not only to us, but also to you since you (twisted) will get a bigger piece of what is to come. That would be more fair.

Thank you.
 
Last edited:

Lakjin

Active Member
>>I am thinking we could have each campaign have it's own delivery table, this way we can speed up things a lot i believe.
This will introduce some challenges/complexity into system in order to have it implemented properly but i think we can do it and the end result will be a good one.

This would create a *LOT* of tables. I'm not sure this is the way to go. We send over a million emails per day, and we are fine with the current system with the addition of the archive log. If you occasionally archive the campaign_delivery_log, it has no major impact on performance.
 

darkmag16

New Member
I think the main problem is about that we are using SQL. Even if we will store every campaign delivery log into different table we will still have performance issues after 500 campaign which was sent to different 20K list. So why I think the next db upgrade should be NoSQL. We can use MongoDB or Maria. Both of them can speak well with Yii Framework. We can get significant performance increase after using these.
 

Lakjin

Active Member
I think the main problem is about that we are using SQL. Even if we will store every campaign delivery log into different table we will still have performance issues after 500 campaign which was sent to different 20K list. So why I think the next db upgrade should be NoSQL. We can use MongoDB or Maria. Both of them can speak well with Yii Framework. We can get significant performance increase after using these.
And then you'd make MailWizz unusable for 95% of people who use it.
 

frm.mwz

Well-Known Member
And then you'd make MailWizz unusable for 95% of people who use it.
To make mwz faster, it is highly recommended to use MariaDB (with optimized mysql, innodb, etc settings). It makes my mwz fly so that it has nearly zero lag time between pre-processing and sending (and I've seen servers where a non-optimized 5.1 mysql takes minutes before sending).
InnoDB and MariaBD are highly compatible with MySQL, in fact, I never had any error message so far from mwz dbf usage or crm or wp or ...

For background reading
https://seravo.fi/2015/10-reasons-to-migrate-to-mariadb-if-still-using-mysql
https://mariadb.com/blog/why-should-you-migrate-mysql-mariadb
 
Last edited:

darkmag16

New Member
To make mwz faster, it is highly recommended to use MariaDB (with optimized mysql, innodb, etc settings). It makes my mwz fly so that it has nearly zero lag time between pre-processing and sending (and I've seen servers where a non-optimized 5.1 mysql takes minutes before sending).
InnoDB and MariaBD are highly compatible with MySQL, in fact, I never had any error message so far from mwz dbf usage or crm or wp or ...
Can you please explain how exactly u r using MariaDB ?
Also this is of course perfext idea.
 

Lakjin

Active Member
To make mwz faster, it is highly recommended to use MariaDB (with optimized mysql, innodb, etc settings). It makes my mwz fly so that it has nearly zero lag time between pre-processing and sending (and I've seen servers where a non-optimized 5.1 mysql takes minutes before sending).
InnoDB and MariaBD are highly compatible with MySQL, in fact, I never had any error message so far from mwz dbf usage or crm or wp or ...

For background reading
https://seravo.fi/2015/10-reasons-to-migrate-to-mariadb-if-still-using-mysql
https://mariadb.com/blog/why-should-you-migrate-mysql-mariadb
You are confusing two different things. MariaDB and MySQL are not mutually exclusive. Anyone can upgrade from MySQL to MariaDB and MailWizz will work just fine. I was referring to the proposal to move to NoSQL, which would kill compatibility for most people.
 

Lakjin

Active Member
NoSQL is a no go, and MongoDB has serious issues with keeping data intact, just see https://www.google.ro/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=mongodb horror stories
I would recommend ironing out the bugs in the archiving method of campaign delivery log and making archiving a feature activated for most people. For example, archive all campaigns older than 30 days. This would not solve the problem totally, but it would greatly help. I don't think creating hundreds of tables will be a solution.
 
Status
Not open for further replies.
Top