Disk Size Manager

Copyright © 2000-2005 Spadix Software.

 

Bulk Verifier - Domain & Email Verifier

Welcome to Bulk Verifier, a powerful email verification software, the efficient way to verify bad email addresses and invalid domains. Load email or domain lists from CSV and TXT files and the validation process starts with a syntax checking, follows with domain validation, and follows with different dynamic and real time tests to validate a given account. These tests reduces drastically the number of bad email / domain returns while increasing the accuracy of a mailing list. It has complex email address validation algorithms minimizes bounced emails. Verify Email addresses in mailings, stops email bouncing. It can perform several checks against an email address including syntax, dns MX lookup, top level domain name validation, and even mail server validation.

Bulk Verifier offers you two processing modes fast and deep.

In its fast mode Bulk Verifier works extremely fast being able to process mailing lists containing dozens of millions of e-mail addresses at a speed of several thousands addresses per second. This mode does not ensure the highest accuracy of checking/cleaning but is optimal by expended time and traffic and provides quite sufficient results. We recommend the fast processing mode of Bulk Verifier as a high-speed tool for sifting obvious rubbish out of large mailing lists containing millions of e-mail addresses. 

In its deep mode (default), Bulk Verifier works significantly slower but provides much more precise results. Optimal data amount for this mode is 70...100 thousand e-mail addresses. We recommend the deep processing mode of Bulk Verifier as a slow but high-quality tool for checking of not very large mailing lists. 

 

Email Verifier to verify/clean email addresses

Screen Shot

 

General Bulk Verifier features

Incoming file formats
After loading your mailing list or pasting some email addresses for validation, it begins to verify each e-mail address. Bulk Verifier domain and email verifier tool can process both plain list of e-mail addresses / domains where each line contains one item and files of more complex structure like CSV file where lines represents multi-field records of the same structure (i. e. containing the same fields separated with the same delimiter). For example, you can export a worksheet of an MS Excel file to check availability of e-mail addresses/domains listed there. It’s supposed that one line of an incoming file contains one e-mail address and/or one domain. To specify the format of the incoming file please turn to the Options dialog. 

Bulk Verifier internal cache
This advanced email verifier stores domain check results in the internal cache. If another e-mail address from the same domain will be found in the same mailing list, Bulk Verifer will not request the DNS server once again but will use the result from the cache. Cache size is limited only by the memory size of your computer. It takes 40 bytes of memory to store the result of one domain check. Thus, it will take 40 MBytes of memory to store the results of check of one million different domains. The time spent to find a previous check result in the cache practically does not depend on the cache size.

Timeouts
The quality of DNS servers list used by Bulk Verifier (Options\DNS) also influences deeply the application performance. If Bulk Verifier does not receive a response from a DNS server in a specified period of time (Options\Timeout, in seconds), it makes new attempts using another DNS service from the list each time. If all these attempts failed, the e-mail address is listed as not checked due to the connection timeout. The bigger the list of DNS servers which can be used by E-mail Verifier, the less is the probability that a couple of DNS servers which have operating problems will affect the application’s performance.

Multithread processing
This bulk domain and email verfier is a multi-thread application. You can define up to 600 threads which will be used simultaneously (one thread is used to check one e-mail address/domain from mailing list and determine if domain/e-mails are still valid).

Please note that using the maximal number of threads is not always the best choice. For example, if you use 600 threads, the application checks 600 domains at the same time sending up to 15 000 requests for DNS servers per minute. At that the traffic may amount to 700 kbps. A DNS server’s software may regard this as a hackers attack and block you up. 

It is also possible that DNS server can process only a certain limited amount of requests per second from the same address ignoring the rest of requests to ensure other users have enough resources to work with the server. In this case the application productivity will decrease significantly since some addresses will be checked repeatedly because previous attempts to check them were unsuccessful due to timeouts.

Thus, if your network connection is capable to provide the work of more than 50 threads, you should adjust your Bulk Verifier parameters as about one DNS server (Options\DNS) per each ten threads. In this case you can be sure that servers will not fail because of the overload. 

Multithread applications work in different ways with different operation systems of Windows family. Windows XP perfectly copes with 600 processing threads; at that the processor load increases insignificantly. Older operation systems (e. g. Windows’98, Windows NT4) are more sensitive to big threads number and even a hundred of threads may lead to a considerable processor load. We recommend you to use Bulk Verifier on computers controlled by Windows XP to reach the application’s maximal performance.



Fast mode of Bulk Verifier

After loading your mailing list or pasting some email addresses for validation, it begins to verify each e-mail address. In this mode Bulk Verifier is able to process mailing lists containing dozens of millions of e-mail addresses at a speed of several thousands addresses per second. To switch to this mode please UNcheck the option Advanced e-mail check using SMTP in the Bulk Verifier Options dialog (see also the section Bulk Verifier interface and options).

Working in the fast mode, Bulk Verifier determines about 25-30% of unavailable email addresses in a given mailing list. These figures may seem weak since theoretically up to 70% of unavailable e-mails can be determined in a list using software methods, but in fact these 30% can amount to 10% of the whole mailing list, which is quite significant.

More precise check which allows to define another 40% of unavailable e-mails is still available in the List check deep mode. But you should realize that the deep check may sometimes take 10 times more time and 5 times more network traffic, which often makes its use not quite reasonable for large e-mail lists.

In the fast mode, Bulk Verifier uses the stage of DNS requests to check e-mail addresses availability. During an e-mail address availability check the following actions are executed:

1. Bulk Verifier parses the address syntactically and singles out its mail domain.

2. The top-level domain is singled out from the mail domain (e. g. .com for the mail domain mail.com).

3. Bulk Verifier compares the top-level domain with the basic top-level domains list stored in the application’s main folder (the file Bulk Verifier.tld). If the initial e-mail address is syntactically incorrect or its top-level domain was not found in the file Bulk Verifier.tld, the address is regarded as invalid. The further processing is not performed for this address.

4. Bulk Verifier requests the DNS server for the mail server address of the mail domain. If the DNS server returns one or more addresses of mail servers which accept mail for the domain, the initial e-mail address is considered available and valid. If the address was not found by the DNS server at all or there are no mail servers which accept mail for the domain, the initial e-mail address is considered invalid. If the DNS server could not return a response because DNS servers serving the mail domain were unavailable, the initial address is considered invalid.



Deep (slow) mode of Bulk Verifier

In the deep (slow) mode Bulk Verifier performs a complete two-stage check of e-mail addresses availability. To switch to this mode please check the option Advanced e-mail check using SMTP in the Bulk Verifier Options dialog (see also the section Bulk Verifier interface and options).

The first stage of the check is absolutely the same as the one used by the fast mode of Bulk Verifier: the application extracts the mail server address of an e-mail address out of DNS. 

If the mail server address is extracted successfully, the second processing stage starts. Bulk Verifier attempts to connect to this mail server and emulate a message dispatch. No message is actually sent during the e-mail availability check. Bulk Verifier establishes the connection with the mail server, sends a hello message, transmits the sender’s address (Options\Sender) to pretend there is a message and then transmits the addressee’s mail box address (an e-mail address being checked). As soon as the receiving server confirms or denies the requested mail box availability, this domain / email validation tool disconnects.


E-mail addresses check technologies

There are 2 stages in e-mail message delivery to the addressee:

1. The sender’s mail server determines the addressee’s mail server using DNS service;
2. The sender’s mail server connects to the addressee’s mail server via the SMTP protocol and transmits the message.

To check an e-mail address availability, it’s necessary to emulate these stages. The problem is that some mail services do not check the addressees’ e-mail addresses (mail boxes) actual existence in their domains when accepting incoming mail. All messages are accepted and then, if an address does not exist in fact, the mail service just sends the original message’s sender a response containing a delivery failure message. The number of e-mail addresses which belong to such mail services is about 30% of all e-mails. Their availability cannot be checked using software methods. Thus, only about 70% of unavailable e-mail addresses can be determined with the help of domain or email validator tools.

In its turn, about 30% of unavailable addresses which can be determined with software tools, are discovered on the first checking stage (DNS request) and to discover the other 70% the 2nd stage is necessary (SMTP connection emulation). The 2nd checking stage usually takes 10 times more time and 5 times more network traffic then the 1st one. In fact, the complete two-stage check of an e-mail address existence takes the same time and traffic as sending a short message to this address.

Let’s look at the check stages in more details.

Stage 1. The email verifying software parses the e-mail address syntactically, singles out the mail domain and sends a request to the DNS server to get the mail server of this domain. During the exchange with the DSN serves the UDP protocol is used which is faster then TCP because doesn’t involve connection establishment between the servers. Usually it takes 1-2 seconds to request a DNS server. This includes sending a request package (about 60 bytes including the package header) and accepting a response package (usually 200-300 bytes but not more than 512). This stage filters out all syntactically incorrect e-mails as well as e-mails in non-existent domains.

Note: The syntactical check performed by Email Verifier is a very simple one: e-mail address must include one @ sign and must end with one of the basic top-level domains (TLD). TLDs list is stored in the file Bulk Verifier.tld in the application’s main folder. More precise syntactical check seems to be not quite reasonable since it will slow down the processing.

Stage 2. The email verifying software establishes connection to the mail server via the SMTP protocol (based on TCP). The TCP protocol is connection-oriented, so the servers dispatch service packages to establish the connection.  After the connection is established, the servers exchange hello messages (the first lines in the log below). Then the sender’s address is transmitted and the receiving server submits the message from this address to be accepted. Then the addressee’s address is transmitted.


Here is a log example:

< 220-ns.watson.ibm.com ESMTP Sendmail AIX4.3/8.9.3/8.9.0
< 220 Thu, 22 Aug 2002 20:44:07 +0500
> HELO cisco.my.net
< 250-ns.watson.ibm.com Hello cisco.my.net [112.44.72.94],
< 250 pleased to meet you
> MAIL FROM:<verify@testmail.com>
< 250 <verify@testmail.com>... Sender is valid.
> RCPT TO:<noshuchaddress@ibm.com>
< 550 <noshuchaddress@ibm.com>... User unknown
> RSET
< 250 Resetting the state.
> QUIT



As you can see, the receiving server responded that the user with the address noshuchaddress@ibm.com is unknown and refused to receive a message for this user. Then the servers exchanged commands to close the connection.

During the address check the servers exchanged 10 messages with the total size of about 500 bytes. But in fact it took more than 20 packages to deliver these messages which led to the total expended traffic of about 2 KBytes. At that most of the time was spent waiting the response from the other server.

Bulk Verifier can perform for you both complete (but slow) two-stage check of e-mail addresses availability and a high-speed check which involves only the 1st stage (DNS server request). This fast e-mail verifier checks a validity of e-mail addresses or domain in any bulk email lists, database, spreadsheets or excel file.