Statistics for campaigns are not accurate

Does that library exclude Amazon, Google and Apple IP addresses used for privacy opt outs?
I am not sure TBH, looking at https://github.com/JayBizzle/Crawler-Detect/blob/master/src/Fixtures/Crawlers.php it seems they do some work in that direction.

Moving forward I'll be adding a 'hidden' url in all my templates, and discounting any address that clicks that link from the reports i give to my clients.
I am not sure how efficient this is, how exactly would you decide to count the clicks then? because you might have legit clicks, then you might get clicks triggered by some analysis tool from gmail/etc. I am genuinely asking, because to me it seems almost.
But I'd be interested to know if i can add a filter somewhere to do that within Mailwizz.
We do have various filters you could potentially access programatically but lets see the previous point first, maybe we can come up with something in the app core and you don't have to do it outside via filters.
 
Thanks for replying @twisted1919

Moving forward I'll be adding a 'hidden' url in all my templates, and discounting any address that clicks that link from the reports i give to my clients.
I am not sure how efficient this is, how exactly would you decide to count the clicks then? because you might have legit clicks, then you might get clicks triggered by some analysis tool from gmail/etc. I am genuinely asking, because to me it seems almost.

In terms of the hidden URL, I took advice from another post I saw. Hide the URL, logically, no human would click the link. e.g.
Code:
<div style="height:0px;line-height: 0px;display: none;" class="mso-hide" id="bot-track"><a href=”http://www.IamABot.com” style="color:#cecece; text-decoration: none;"></a></div>

If you have any Email addresses in your campaign stats registering that link click. Then discount them immediately. Particularly if that link click came within the first minute of receiving the email. If after a set period of time, say 15 minutes that same email address reports more interaction with the email then count it again.

It's never going to be 100% failproof accurate, as a bot may open first, then a genuine user might engage with the email. But you can certainly get to some degree of accuracy by discounting 100% known bot behaviour.

Currently I'm just exporting the CSV's and removing anything that's clicked 10+ times. Anything that opened and clicked EVERYTHING in the first minute, and providing the customers with my 'best guess' results.

You know what it's like, you've got customers that use you for email marketing, it's no use telling them that "Stats aren't that reliable anymore".
 
@Jof - Thanks for the explanation, that does make sense to me and it is something I explored briefly in the past.
If you have any Email addresses in your campaign stats registering that link click. Then discount them immediately.
My problem with this is we can lose valid clicks with this approach.

Particularly if that link click came within the first minute of receiving the email
I am just throwing ideas here, but what if we give you an option to disable click tracking and open tracking for the first x minutes after the campaign has been sent, where x would be a number you'd set per campaign, defaulting to 0? Would that make any difference to you? Or is it a stupid idea that won't help?
Alternatively, or complementary if you will, we can also do the hidden link approach, and then get a fingerprint of the device that executed the click, could be a combo between user agent and ip address, and then delete all the clicks registered by that fingerprint. Would this be a good solution?
Maybe both at the same time? Let's discuss ;)

@ghimes - care to chime in?
 
  • Like
Reactions: Jof
I am just throwing ideas here, but what if we give you an option to disable click tracking and open tracking for the first x minutes after the campaign has been sent
Thinking about this... What we are going to consider that is the sent time? The time when we record the status Ok in the campaign_delivery_log table? Also please consider that this timestamp is the one when we handled the email to the SMTP. If the SMTP has a full queue, the actual email delivery might happened later. So also this is not 100% bulletproof. Would this threshold be in milliseconds maybe?
 
Hi @ghimes @twisted1919

I have been suffering from a similar issue if not the same issue, I am trying to do b2b marketing and a lot of the emails have a firewall such as Barracuda and Mimecast, there are so many out there which end up scanning all the links to protect the user (not a bad thing). I analysed some of the domains where I was getting an immediate interaction with the emails and then ran an MX record check to identify the firewalls/security scanners/bots.
I Have been reading up and the following two links should hopefully help to come up with a solution:

Hope this is useful, I am currently at work so may take some time to respond. But I am also definitely looking for a solution for this issue.
 
Last edited:
@ghimes @twisted1919 I've got lots of thoughts on this, and some good data to share. But not sure of the best way forward given GDPR, i want to show you IP addresses, user agents, email addresses and open times that when viewed in an excel spreadsheet can be nothing but bots. If we can programmatically remove/filter them. that would be great. e.g in the attached, this is an open report of a campaign. All unique email addresses, mostly sharing the same IP, all clicks happening within the first minute of the campaign. With a few 'random' IPs thrown in, that I would assume to be genuine opens.

1704881572867.png

And of 5421 opens. duplicate IP address frequency slows down at around row 3000. Duplicates still exist, but they are less common.

See Video at this Dropbox link: https://www.dropbox.com/scl/fi/c7s3...4-54.mp4?rlkey=29pu1tpigytyw4smo30vtc5k9&dl=0 Pink IPs are duplicated.
 
Hi @ghimes @twisted1919,

The following would really help me and could be used to pull data to a separate campaign stats screen or apply a filter to the existing stats screen, thereby not interfering with the existing application although that said number 1 on the list would need to be configured at campaign level. For example it could be invoked in the email template using one of the placeholder tags.

It is based on a list of rules which we can apply against the events in order to create a dynamic ignore list.

The settings should be configurable/updatable.

I understand that there may be false positives but because this is all statical and filter based it shouldn't have any functional implications for the emails as it isn't functionally blocking any existing actions which the app does.

No solution in relation to the issue which I am facing would be perfect, there is potential to implement ML models but let's not get too technical.

In addition to the crawler-detect app which you have already implemented, I believe the following will be useful for my issues.

1. **Click Trap Rule**:
- Detect bots with hidden links or 1x1 pixel images.
- Example: A click on a hidden link indicates bot activity.
- Update Ignore List: Add entries for email domain-user-agent and IP-user-agent.

2. **IP, Email Domain, and User-Agent Analysis Rule**:
- Identify bots by unusual activity from an IP or email domain combined with specific user-agent patterns.
- Example: More than 50 actions in 10 minutes from an IP/email domain with a specific user-agent.
- Update Ignore List: Include entries for email domain-user-agent and IP-user-agent.

3. **Interval Analysis Rule**:
- Flag short intervals between email opens and clicks as bot behavior.
- Example: Email opened and link clicked within 5 seconds.
- Update Ignore List: Record entries for email domain-user-agent and IP-user-agent.

For the third rule you could also add a count before it's "active" in the ignore list, for example it must happen "x" number of times before it's made active, as sometimes some people do click the links in emails as soon as it is opened.

This again won't be perfect and will potentially create false positives, but It's a start.

Thanks for reading.

PS by email domain I mean the domain name which the email belongs to. This combined with the user agent in the ignore list, in addition to the IP address and user agent combination, should be added for any bot like activity detected with the rules.

This will typically ensure that if a real user clicks it wont be filtered by the ignore list as it is unlikely a real user would match these fingerprints. In case of false positives, the ignore list will need to have an "deactivate" flag against each entry, just incase we need to ignore the ignore entry. It is better than having to delete it or creating a seperate "ignore list for the ignore list" as the same entry could reappear based on future campaign runs based on the rules.

In addition to this it would also be good to see how many events each of the ignore list entries have filtered. In future iterations, this value could potentially trigger additional actions, I won't get into this now.

If we want to be really nuanced we could also look at adding event type as an additional field into the ignore list for example click and open, but I don't think that this will have a major impact based on the rules.

Time based fields should be specified in seconds.

As part of a stronger implementation you could also look at mx lookups on the email domain and matching these as part of the patterns .

Hope this helps, any questions please feel free to ask.
 
Last edited:
@Jof / @kayboxa - That's a lot of info, you both have good arguments, please allow us some time to check all this and get back to you with either some info, or some more questions, depends.
 
  • Like
Reactions: Jof
Hey,
I want to remind everybody here that MW is already having the possibility to exclude IPs from tracking. See in the Backend->Settings->Campaigns->Exclude IPs from tracking .

Cosmin
 
  • Like
Reactions: Jof
Thanks @ghimes How does that work in practice? Say I ban some of the IPs from my screenshot,. Does that mean they will not be tracked at all? So any email address at those IPs will not show in the open or click reports?
 
i want to show you IP addresses, user agents, email addresses and open times that when viewed in an excel spreadsheet can be nothing but bots.
This image is a perfect example why we can't rely on the user agent, that's a very common user agent, but the IP address does say it's google.
 
@Jof / @kayboxa - FYI, this has been added with high priority to our list of features and we will work on it starting next week most likely. It will take some back and forth on testing but hopefully we'll be amble to implement what it is going to become "Smart Tracking", in the near future.
We might contact you privately to ask more questions.
 
@Jof thanks for backing the suggestions/specifications. I am sure they will be of great help to us and others.

@twisted1919 that's great news, look forward to a positive outcome. Feel free to reach out if required, I'm a Consultant/PM/Analyst by trade, if you require a more detailed specification do let me know.

If you are also considering the MX patterns mentioned in my list of suggestions, I will gladly write-up the logic to be used for this.

PS I had to renew my support subscription to write this message :)
 
@kayboxa - This week we had a lot of back and forth related to how we go about this, this is because, it is not only about recording those tracking events, but also what happens when we records those tracking events(because there are dependent actions happening).

For example, you could have an AR or Subscriber Action (change a custom field, copy to list, etc) to happen when an email is opened or a link is clicked, etc. And if you send the AR or change the subscriber in any way based on the tracking action, and then you decide the open/click were not triggered by the subscriber itself, then you can simply remove the open/click but it is too late for the other stuff, like the AR or Subscriber Action, or webhook calls, etc.
So while the initial idea looked very promising, there are issues with it which prevent a proper implementation.

In order to avoid all the above pitfalls, which would render your app in a very inconsistent state (think AR triggering for clicks and you have no idea why because you don't have the click to justify it, it would be a mess), we decided that we will change the way we do tracking.
We will enter a queue system for clicks and opens and all the actions they trigger as dependencies.
In short, we will record all the opens and clicks like we do now, but we will do it in separate database tables. These tables will be subject to the rules you mentioned above, and more. Then, when we're sure a click and/or an open are for real, we will move them to the current table for opens/clicks and we will trigger the actions that are now triggered by default.
This way, we avoid having an inconsistent state in the app but also we are sure the clicks and opens are real.
Of course, there will be false positives, but I believe this system can be improved over time and we can introduce new ways to analyse the data, that will make it bullet proof.
The only downside with the upcoming approach is that people using this "Smart Tracking" system, will see clicks and/or opens pouring in with some delay, which should not matter in 99% of the cases, i.e: instead of seeing them in realtime as they happen, it will take a few minutes, time needed for us to do the analysis. Of course, people using this option in their campaigns, will get a notification in the campaign overview, in the tracking boxes, telling them their stats will show up with some delay.

Now, as you can imagine, such task is by no means an easy one, we need to carefully plan it and implement it in a way it does not affect the current way the app works. While I am eager to implement such feature, I don't have an exact ETA for when this will be ready. Upcoming version, 2.4.2 is already locked for testing, so maybe 2.4.3, in best case scenario, but again, we will not rush such change, it is huge and requires proper planning, implementation and testing.

Meanwhile, do let me know if you have further questions ;)
 
@twisted1919 Thanks for the update, you are correct on the dependent actions if you do a full fledged implementation. Hence why I was suggesting a purely statistical implementation which wouldn't impact existing functionality.

I agree with the approach and look forward to the awesome release.

If you require any help or input, please do reach out.
 
Back
Top