What is your opinion about MailWizz offering a global blacklist service?

twisted1919 · Jan 2, 2020

Hello everyone,

A while back one of the forum members reached out with an interesting idea related to better handling bad emails in MailWizz.

In big lines, the idea is that since we have lots of active MailWizz installs, the owners of these installs could opt-in to send their blacklist info to a central service, of course, anonymously and the email address would be a md5 or sha1 hash (we will call these hashes from now on), so the actual email address is never sent but a hash of it, so no privacy rule is breached.

Using certain algorithms, we can detect the grade of a hash, so if a hash comes in from various sources, we can understand if that hash should be marked as a real blacklist or not.
Each hash would expire after a certain period of time because as we know, emails can be invalid today but valid tomorrow.

So to sum it up, lots of MailWizz installs can opt-in to send their blacklist emails to a central service but as hashes, never the actual email address. This service would then store these hashes for a certain period of time and based on various algorithms would decide if these are bad emails or not. This means we will have a huge database of bad emails in form of hashes which changes continuously.

Then same MailWizz installs would have access to this service to check their existing email lists against the stored hashes, thus finding ahead of time emails that should be blacklisted which in theory would help a lot since you'd have access to a huge database of bad email hashes and you can compare them against your subscribers list and see ahead of time bad emails and take actions against them.

Now, i don't know the exact specifics of how this will work, i only have a general idea as i shared above and i wanted to post it here to see if it worth exploring and eventually implementing, since of course, this would imply a lot of work but has the potential to actually change things a lot as far as delivery would go.

We welcome any ideas / thoughts on this.
Best.

CaLViN · Jan 2, 2020

Well,
this would actually be great to have.Maybe even this might enable list-check feature which would clean bounces automatically on a list without sending an email.
BUT,do you think you can handle such a load ? You will receive billions of emails/hashes and you will need to store,check,verify,send them all.
Thinking that mailwizz is very cheap,this feature might cost you some.

twisted1919 · Jan 2, 2020

CaLViN said:
You will receive billions of emails/hashes and you will need to store,check,verify,send them all.
Thinking that mailwizz is very cheap,this feature might cost you some.

That's a good catch!

We most likely need to run a few servers, around 5 i'd think now from the top of my mind, but can be more.
So it would involve some cost indeed. We as a company could support some of the costs, the rest of it would maybe be supported by introducing some kind of subscription, small enough to allow the service to run, but i don't know if people are willing to pay to get this service, hopefully this thread will answer to this.

nadworks · Jan 3, 2020

+1
Ace idea.

Vpul Shah · Jan 4, 2020

Nice idea Twisted, actually needed.

Add this.

MadMan · Jan 4, 2020

Hi there,

In regard to your concerns I can give you a few proposals.

As far as I know, each license for the platform is connected to a particular IP. Thus, the platform knows which are the licensed versions and where they are as well as how long the license is valid.

In regard to the above said and in order to lower the traffic and requests towards one place only, a decentralization of blacklists could be established.
According to me, the most appropriate way to do so is to add a new extension - BlackList Sync, for example. It should synchronize blacklists on the principal of either blockchain or torrent systems of all licensed platforms. Each blacklisted email should be counted in how many platforms exists and if it's in more then 6 lists it should be transferred to Backend -> Email Blacklist, for example. If the listing fall on to less then 6, it could be authomatically erased from Backend -> Email Blacklist untill it gets into more then 6 lists. The number of listings (whether 6 or more/less) could be assigned in the platform settings.

Thus the traffic will be decreased and synchronization will be done not only to and from one server but also between all licensed online platforms. In this regard, everyone who wants to have synchronized blacklists will need to renew the license. Besides, this way the platform owner will have more income and will be interested to improve it and pay the licence.

If the platform is used for bulk send, everyone will be interested to have a license in order to escape IP reputation loss, etc.

It's good to be done as an extension so it could be used by interested platform owners because there might be people who use the platform for personal subscribers only and such people may not be interested in such synchronization of blacklists.

A question may arise: What if someone decides to sabotage all the platforms by submitting a huge list of blacklisted emails in order to overload the system? A good logic should be overthought to ignore platforms which only embarrass the others' work.
Some ideas in this direction might be:
- if all the emails in the blacklist have the following type: aaa@, aab@, aac@, aad@, etc., it's obvious such emails are automatically generated and the platform is a thread for the system
- if no emails are sent from the platform but the blacklist increase daily
If such platforms are found, synchronization from their IPs should be stopped.
Other ideas could be given as well for preventing systems' overload and eliminating synchronization of blacklists.

As far as the size of data bases and the traffic, I could say I don't see any problem in this. A list of 3,345,366 blacklisted emails is only 71,8 MB of size. Having in mind the current speed and continuously developing internet, I don't think this size might be of any problem for the synchronization.
Of course, if there are 500 licensed platforms online and they all have to be synchronized, there will be a kind of a system overload but it could be prevented by a certain limit for example: up to 10 synchronizations at a time and one only connection from IP. In order to prevent the system to start synchronization of a particular IP from the very begining each time, it would be necessary to record the last line it reached and to start from the last synchronized record. This will lower the traffic and the system load.

twisted1919 · Jan 6, 2020

@MadMan - As always, thanks for your thoughts

Using this via extension is not to be excluded, but i do have some concerns. You mentioned torrent trackers, they do need a tractor from where they get the peers list, so in this regard, the extension idea would still need a central place to get updated list of active apps that are willing to share their blacklist, though the traffic to this would be much much smaller, would be basically just a json file with some web addresses most likely.

My concerns and possible solutions:
1) If no auth is used between the platforms when they sync data, then anyone would get ahold of the url that accepts blacklist data and send malformed requests to make the platform accept the blacklist data as good data. This could be mitigated by using some sort of auth between the platforms, done via the central platform. If the central platform is not involved, then the auth can be easily faked.
Even if the platform is involved, a valid app can still get a valid auth and after this start doing weird requests. Without a central platform in between the two apps, there's really no way to stop this behavior, the central platform would have to act as a referee.

2) If the whole functionality is extension based, then this means people get full access to see the code, functionality, etc, which means people are more likely to exploit the functionality if desired.

3) When you have to sync between 2 platforms, that's easy peasy, but when you have to sync between 50 platforms, then things get complex and will take a lot of time for a sync to complete because of the large amount of data to be processed, which means a lot of traffic between platforms, in and out. The main issue here would be duplicate data, if you find 100 email addresses in same platforms, it means you have 5k duplicates which you should not process. 5k does not seem much, but what if it's 50k or 500k? Then things will be interesting.
Avoiding processing duplicate data is impossible in this scenario.

4) We do have customers with blacklists of literally hundred million records, i have no clue how much time would take to 50 other platforms connected to one of these platform to sync the data. Then there's also the disk space issue, while not so big of an issue, it's still there and has to be taken into consideration.

5) Out of sync extensions is a real problem, people with different versions of the extension could have different behavior. Of course, we can always check this when doing sync and only sync with platforms having proper extension versions, but again, a problem.

6) What else?

I don't say the above are in any way deal breakers, all of them have solutions, but they worth being added into discussion so we can look at the whole picture.

MadMan · Jan 6, 2020

@twisted1919 You are always welcome

You are probably right about extension. Rather, I meant that sync could be turned on/off.

1. There must be a central server or few central servers! That's why I was referring to the licensed versions only. This information is only available to the platform owner. In this connection, we cannot escape centrality. Rather, my idea is to synchronize blacklists through torrent.

2. It may be good to avoid various types of DDOS attacks by using a cloud server. Anyway, there is information about the IPs only at mailwizz and when they hide behind a cloud server, only they (mailwizz) will know the real IP addresses of the clients' platforms which will prevent from DDOS attacks while synchronization goes.

3. The best way to avoid Internet traffic congestion is by limiting concurrent requests to 1 per server and a maximum of 8 at a time. This way, first synchronization will take time, but there will be no overload of platforms. If necessary internet can be limited to speed.

4. What duplication are you talking about? Synchronization will be on the principle of find and count. Currently I cannot add an address that is already inside the blacklist. First, check whether there is an existing hash in the database. If not added but available +1 should be added in the count field. Concerning if one of the eight queries simultaneously checks the same hash, the following can be done: Each verification request has a different starting hash letter. For example: 1st start with "a", seconed with "b" and etc. This will avoid duplication of 8 concurrent requests because each request will be a different email address.

5. The other way to avoid hash duplication is to wait for queuing requests. Add a serial number to the request and wait one at a time. In this embodiment, the processing time will depend on the server resource.

6. In relation to the place that would occupy such a base, I will say that I do not think it is a problem. As I said, 3,5 million records are 72 MB. 3500 million records will be 72GB. Losing the reputation of IPs and recovering reputation is much more expensive than taking once a larger disk

Also Inbox and Open rate will be much more!

eggerda · Jan 6, 2020

@twisted1919 this is a great idea. You would definately have to offer this as an additional service to MailWiz users who can choose to USE this service or not. I can say, I run opt in lists for my business and my complaint rate is .0025 (2.5 times what Aweber allows) - and I run a real business with real opt ins from real people.

I would be interested in using this, not just to eliminate bounces before emailing them, but also to NOT mail people who mark emails as spam a lot. I think there should be a threshold where if an email address has marked X number of emails as spam X number of times - above a mean - they should appear on this global blacklist.

This alone could drop everyone's complaint rate a substantial amount.

I'm sure there's a variety of business running mailwiz for a variety of purposes, so we definately need options to enable/disable parts of the block list.

Wah · Jan 10, 2020

i would gladly love this feature, i currently manage my own blacklist it is around 60miillon and growing , im no tech but yes tis would work for me

nadworks · Feb 29, 2020

I agree in regards to the sabotaging concerns, plus simple irregular and odd blacklist behaviour. I am constantly fishing genuine and healthy email addresses out of the blacklist with no idea how they got there, simply because subscribers are alerting me to the fact that they had not received a message for a while.

twisted1919 · Mar 3, 2020

@nadworks - that's one of our concerns as well.

mercuryyy · May 23, 2020

@twisted1919 you can rank each blocked email by bounce counts
so if email xyz got 20 bounces from 100 users you can give it a score
and email zzz got 50 bounces from 50 users

You can also only add them to the list after each email was attempeted to be sent at least by 25 users or you can let us chose all those scores
and then we can chose what scores we want to block or allow.

Something like this will put mailwizz is in a separate arena then everybody else, not that it isn't already

twisted1919 · May 25, 2020

@mercuryyy - i get your point, and this is something that each mailwizz application can follow, but then my problem is what happens when someone does these requests outside of mailwizz?

Then all these pseudo-security checks will be bypassed.

171mails · May 26, 2020

It's good idea. However, if mailwizz it self has a facility / feature to connect global suppression list of a master mailwizz would be better to handle for large mailers / ESP whom have multiple installed mailwizz instances.

What is your opinion about MailWizz offering a global blacklist service?

twisted1919

Administrator

CaLViN

New Member

twisted1919

Administrator

nadworks

Active Member

Vpul Shah

Well-Known Member

MadMan

New Member

twisted1919

Administrator

MadMan

New Member

eggerda

Member

Wah

Member

nadworks

Active Member

twisted1919

Administrator

mercuryyy

New Member

twisted1919

Administrator

171mails

Member