Dave wrote:
> Hi,
> Thanks for your reply. The following is the ip and abbreviated msg:
> (reason: 554 5.7.1 Service unavailable; Client host [65.24.5.137]
> blocked using dnsbl-1.uceprotect.net;
> To my squid issue, if aufs is less intensive and more efficient i'll
> definitely switch over to it. As for your suggestion about splitting in
> to multiple files I believe the version i have can do this, it has
> multiple acl statements for the safe_ports definition. My issue though
> is there's like 15000+ lines in this file, and investigating some like
> 500 are duplicates. I'd rather not have to manually go through this and
> do the split, is there a way i can split based on the dst, dstdomain, or
> url_regexp you referenced?
I just used the following commands, pulled off most of the job in a few
minutes. The remainders that got left as regex was small. There are some
that are duplicates of the domain-only list, but that can be dealt with
later.
# Pull out the IPs
grep -v -E "[a-z]+" porn | sort -u >porn.ipa
# copy everything else into a temp file
grep -v -E "[a-z]+" porn | sort -u >temp.1
# pull out lines with only domain name
grep -E "^([0-9a-z\-]\.)+[a-z]+$" temp.1 | sort -u >temp.d
# pull out everthing without a domain name into another temp
grep -v -E "^([0-9a-z\-]\.)+[a-z]+$" temp.1 | sort -u >temp.2
rm temp.1
# pull out lines that are domain/ or domain<space> and drop the end
grep -E "^([0-9a-z\-]\.)+[a-z]+[\/ ]$" temp.2 | sed s/\\/// | sed s/\\
// | sort -u >>temp.d
# leave the rest as regex patterns
grep -v -E "^([0-9a-z\-]\.)+[a-z]+[\/ ]$" temp.2 | sort -u >porn.regex
rm temp.2
# sort the just-domains and make sure there are no duplicate.
cat temp.d | sort -u > porn.domains
rm temp.d
Amos
Received on Wed Jul 04 2007 - 06:16:46 MDT
This archive was generated by hypermail pre-2.1.9 : Wed Aug 01 2007 - 12:00:03 MDT