Search:
Submit Articles | Member Login | Top Authors | Most Popular | Submission Guidlines | Ezine Notifications | Article RSS Feeds

Home | Computers



Automatically Validate HTTP Proxies

By: Igor Oz

Let’s say you downloaded a long list of Web proxy servers. Now you are stuck with the task of weeding out the proxies that are dead, slow, fake, or otherwise unusable. There are some applications out there that claim to validate proxy servers. The most common problem with these applications: they are excruciatingly slow. These apps also tend to get stuck once in a while. And, if your list of proxies is too long, these applications may crash altogether because of numerous memory leaks and other such examples of fine programming.

I would like to bring your attention to the following, hopefully, useful script that will go through some very long proxy lists in just a minute or two and will get rid of the trash. A few words about how it works are in order. I created a simple HTML page on my Web server (see $pvcurl variable below). This page contains a unique text string ($pvcstring variable).

The first step is to ping the proxy and see if it responds in a reasonable period of time. The ping commands are launched in background to speed up the process. If the proxy does respond, the next step is to use wget to see if you can download the $pvcurl and match the $pvcstring. If everything checks out, the proxy is added to the final list of good proxies. Just as the ping command, the wget threads are started in background mode with a 30-second timeout.

#!/bin/ksh

configure() {
pvcurl="http://www.krazyworks.com/pvc.html"
pvcstring="191628769290432845414226"
wget_timeout=30

proxyin="/tmp/proxylist.in"

if [ ! -f "$proxyin" ]
then
echo "Proxy list $proxyin not found. Exiting..."
exit 1
fi

proxyout="/root/proxylist.out"

if [ -f "$proxyout" ]
then
rm "$proxyout"
fi
}

cleanup() {
killall wget
for i in 1 2 3 4 5
do
if [ -f "/tmp/proxy_verify.tmp$i" ]
then
rm "/tmp/proxy_verify.tmp$i"
fi
done
}

wgetrun() {
if [ `wget -q --timeout=$wget_timeout --tries=1 -O - "$pvcurl" | grep -c "$pvcstring"` -eq 1 ]
then
echo "${proxy}:${port}" >> "$proxyout"
fi
}

pingrun() {
ping -q -c 1 -W 5 $proxy >/dev/null 2>&1

if [ $? -eq 0 ]
then
wgetrun &
fi
}

verify() {
sort "$proxyin" | uniq > "/tmp/proxy_verify.tmp1"
mv "/tmp/proxy_verify.tmp1" "$proxyin"
proxy_total=$(wc -l "$proxyin" | awk '{print $1}')

i=1
j=1
cat "$proxyin" | while read line
do
echo "Processing proxy $i of $proxy_total"
proxy=$(echo $line | awk -F':' '{print $1}')
port=$(echo $line | awk -F':' '{print $2}')
export http_proxy="${proxy}:${port}"
(( i = i + 1 ))

pingrun &

if [ $j -eq 100 ]
then
if [ `ps -ef | grep -c [w]get` -gt 100 ]
then
sleep $wget_timeout
killall wget
j=1
fi
else
(( j = j + 1 ))
fi
done

echo "Waiting for threads to finish ($wget_timeout seconds)..."
while [ `ps -ef | egrep -c "[w]get|[p]ing"` -gt 0 ]
do
sleep 5
done
}

# RUNTIME

configure
cleanup
verify
cleanup

Article Source: http://www.articlegalore.net

www.krazyworks.com



Please Rate this Article

 

Not yet Rated

Click the XML Icon Above to Receive Computers Articles Via RSS!

Powered by Article Dashboard