Web Indexing
Harvest-NG
posted bysxwinWeb Indexing
Harvest-NG is a collection of Perl modules and scripts which provide a powerful web crawling and summarizing agent. The code is aimed at providing an open source, standards compliant, tool for fetching content from a wide variety of information sources, summarising it into a set of resource descriptions, and storing these in an easily accessible database from which search services can be built and statistical information compiled.
Reviews0
PriceFree
Views5107
Internet Spy (i-spy)
posted byigorlinWeb Indexing
I-Spy is a Perl script which identifies new files on various remote FTP and Web sites. It grabs and compares contents of FTP directories and web pages. It will then compile a report and either send it via e-mail or save it as a web page. You may also request both deliveries of the report.
For e-mail reports, you may request plain text or HTML. I-Spy logs its activity as it chugs along. You may specify the log
directory, or I-Spy will try to find one automatically. For web page reports, I-Spy will attempt to store the log in such a place where it may be referenced by the report and served by the web server.
Reviews0
PriceFree
Views3559
WebAwk
posted byjbe28inWeb Indexing
This is a proof-of-concept of a tool to automate web browsing / data collection. It works like AWK except that instead of working on files and lines it works on HTML pages and hyperlinks. It is meant to be run as a command line script and includes base_url - the URL the script was initially invoked on, base_path - root of saved data tree, url - current URL being processed, linked_from - parent of current URL, and content - the actual data corresponding to the current URL.
Reviews0
PriceFree
Views2991
Web Secretary
posted bywebsecinWeb Indexing
Web Secretary is a web page monitoring software. However, it goes beyond the normal functionalities offered by such software. Not only does it detect changes based on content analysis (instead of date/time stamp or simple textual comparison), it will email the changed page to you with the new contents highlighted. Web Secretary is written in Perl and should be able to run on all Unix systems with the Perl interpreter (and LWP module) installed.
Reviews0
PriceFree
Views3118