Cleaner
ListMotor Cleaner is designed
to extract e-mail addresses from raw data files and process them to get
the standard ListMotor lists (that is sorted, one item per line, no
duplicates).
The
List Motor Cleaner can process several input files simultaneously. To select the
files press the “Input files” button. How to make a selection is
described in the section “General notes”, subsection “Input files
selection”.
Shortly,
your actions to turn raw text files containing some e-mail addresses
within redundant text into sorted and de-duped e-mails lists are as
follows:
-
Open
the Cleaner’s tab in the
ListMotor application;
-
Choose
processing modes, indicate necessary settings and specify output
files;
-
Click
the “Go” button and wait for the processing to complete;
-
Check
the log and enjoy the results.
Processing
modes and options
The
ListMate Cleaner offers several processing modes which can be turned on in the
section “Output files and Options” of the Cleaner’s window. All
processing modes can be activated at the same time; each will generate a
separate output file.
Processing
mode “E-mail addresses”
if
this mode is chosen, the Cleaner will extract all syntactically valid
e-mail addresses and also will attempt to 'correct' any e-mail addresses
which have some illegal characters within them.
The output file is alphabetically sorted, one item per line, without
duplicates.
The
mode E-mail addresses has some
additional options:
Only
strip out addresses preceded by:
check this option and enter one or several words to indicate that in your
output e-mails list must be only the e-mails that follow this word or
words in the input file.
By
default, the Cleaner places into the output file all syntactically valid
e-mail addresses it finds. But in some cases you may need to get only
those ones which follow a certain word (or words) in the input file.
The
typical application of this feature is creation of unsubscribe-lists (or,
in other words, remove-lists). Usually in this case you have some e-mail
messages from people asking you not to mail them any more. Just export
these messages into one file and make the Cleaner process this file with
the option “Only strip out addresses preceded by” set and the word
“From:” written in the adjacent field. You’ll have your remove listmate
as the result.
Reject
any addresses longer then:
check this option and enter a maximal length (up to 80 characters) and the
Cleaner will reject e-mail addresses longer then this value. The default
maximal length is 45 characters.
No
duplicate domains:
check this option if you need only one e-mail address from each domain
present in the input file. For example, the input list is:
mary@company.com
alex@magazine.com
snail@yahoo.com
smith@market.com
jane@market.com
twiggy@yahoo.com
info@company.com
job@magazine.com
nicky@yahoo.com
If
the option “No duplicate domains” is not
checked, the Cleaner will provide the following result:
alex@magazine.com
info@company.com
jane@market.com
job@magazine.com
mary@company.com
nicky@yahoo.com
smith@market.com
snail@yahoo.comtwiggy@yahoo.com
If
the option “No duplicate domains” is checked (and the option “Except
mail services” unchecked, see below), the result will be as follows:
alex@magazine.com
mary@company.com
smith@market.com
snail@yahoo.com
The
option
“No duplicate domains” is really useful when a list contains e-mails
in corporative domains. Using this option you avoid mailing the same
message to the same company repeatedly. But at the same time you keep only
one e-mail address per a web-based service (domains yahoo, hotmail, msn,
etc.), while in fact each address in such domains belongs to a different
person.
To
solve this problem there is a sub-option Except
mail services to the option “No duplicate domains”. Check
“Except mail services” together with “No duplicate domains” to
keep in your list all e-mail addresses which belong to the domains listed
in the box “Mail Services” on the page “Options” of the ListMotor. Only one e-mail address will be kept in any other domain.
For
example, the input list is the same:
mary@company.com
alex@magazine.com
snail@yahoo.com
smith@market.com
jane@market.com
twiggy@yahoo.com
info@company.com
job@magazine.com
nicky@yahoo.com
If
the options “No duplicate domains” and “Except mail services” are
both checked (and the yahoo domain is indicated in the
“Options”\”Mail Services”), the result will be as follows:
alex@magazine.com
mary@company.com
nicky@yahoo.com
smith@market.com
snail@yahoo.com
twiggy@yahoo.com
Allow
embedded spaces in AOL usernames:
check this option to turn on extended processing of AOL e-mail addresses
which include spaces, such as “john smith@aol.com” or “write me@aol.com”.
If
the option is turned off, the Cleaner will not interpret a space as a
valid e-mail address character and thus will see these e-mails like
“smith@aol.com” and “me@aol.com”.
If
the option is turned on, the Cleaner will see these e-mails in full, but
before placing them into the output file will remove spaces. The result
will be “johnsmith@aol.com” and “writeme@aol.com”. Still, these
e-mails will be absolutely valid according to AOL rules.
Save
rejected e-mails into:
check this option and specify a file name to get the list of the addresses
which are rejected (i. e. not placed into the output file) by the Cleaner
due to some inadequacy, like:
-
the
lengths exceeds maximal allowable value;
-
there’s
only one character before the “@” symbol;
-
the
option “Only strip out addresses preceded by” is turned on and the
e-mail doesn’t star with the specified word
-
etc.
Processing
mode “IP addresses”
If
this mode is chosen, the Cleaner will place into the output file all
syntactically valid IP addresses it will find in the input file. The valid
IP address is the one which consists of 4 numbers separated by dots, each
number not greater than 255 (e. g. 230.121.1.0).
The
output file is sorted in numeric order, one item per line, without
duplicates.
Processing mode “IP addresses in ()”
If
this mode is chosen, the Cleaner will place into the output file only
those syntactically valid IP addresses which were within brackets or
parentheses in the input file.
Valid
bracket/parentheses types are: () {} [] <>.
The
output file is sorted in numeric order, one item per line, without
duplicates.
Processing mode “Proxies”
If
this mode is chosen, the Cleaner will extract from the input file all
syntactically valid proxies it can find. The valid proxy is the one which
consists of 4 numbers separated by dots (each number not greater than 255)
immediately followed by a colon and a port number from 10 to 65535 (e. g.
230.121.1.0:80).
The
output file is sorted in numeric order, one item per line, without
duplicates.
Processing mode “Proxies in ()”
If
this mode is chosen, the Cleaner will place into the output file only
those syntactically valid proxies which were within brackets or
parentheses in the input file.
Valid
bracket/parentheses types are: () {} [] <>.
The
output file is sorted in numeric order, one item per line, without
duplicates.
Processing mode “Phone numbers”
If
this mode is chosen, the Cleaner will extract from the input file phone
numbers in North American format: area code (3 digits), exchange (3
digits), local number (4 digits).
The
allowable separators between the parts of the number are space,
parentheses or hyphen. Examples: 305 120 5067; (903) 701-3018.
The
output file is sorted in numeric order, one item per line, without
duplicates.
The
option Only strip out addresses
preceded by is available for this processing mode. It performs the
same functions as in the E-mail
addresses mode (see above).
The
typical application of this feature is to extract the numbers of faxes or
mobile phones from e-mail messages. Export the messages into one file and
make the Cleaner process this file with the option “Only strip out
addresses preceded by” set and the word “Fax” or “
Mobile
”
written in the adjacent field. In the result you’ll have the
faxes/mobiles list.
Processing
features
There
are some processing features provided by the Cleaner.
- AOL
addresses processing.
The additional checks are performed during AOL e-mail addresses
processing to meet the standards of AOL:
- the address must begin with a letter
- the address must have the length from 3 to 16 characters to the left of the “@” sign.
- “Agglutinate”
addresses processing.
The Cleaner is able to
single out an e-mail address which is joined with the repeated domain
name (like “mary@hotmail.comhotmail.com”). Such a combination is
quite common in large e-mails bases.
We
would also give you a small tip: you can use the Cleaner
to easily single out the invalid e-mail addresses which returned you
the “undeliverable” messages. Export these messages into a single text
file and process this file with the Cleaner.
As the result you will get the sorted and de-duped list of invalid
addresses, which can be then processed with the ListMotor Remover
to get rid of the useless addresses.
Processing
results
You
should specify a separate output file for each Cleaner
processing mode you are going to use. There is a field for the output
file name next to each processing mode’s option. How to indicate an
output file is described in the section “General notes”, subsection
“Output files selection”.
After the
listmate Cleaner has
processed your raw file and extracted the items you need – e-mails, IPs,
phones, etc. – the results will be placed into the specified files in
the form of sorted lists (either alphabetically or numeric order,
according to the items type), one item per line, no duplicates.
Indicating the output file names please note that if there is no file with
the specified name in the specified path, it will be created. If the file
with such name already exists in the specified folder, it will be overwritten.
At that the backup copy of the file may be created according to the
application options (the same refers to any other output file).
The
original input file will remain unchanged.
Important note
Always
have all your raw files processed by the Cleaner
before you process them with any other ListMotor task. This ensures
you to have the e-mail addresses, IP addresses, phone numbers lists not in
the form of unpredictable text mess but in the perfect normalized
condition: one item per line, sorted, without duplicates. This is the only
form suitable for the processing by the Merger,
Remover, Keeper
and others.
Neglecting
the Cleaner may lead to error
messages and unwanted and useless results.
|