Followers 3

Someone wanted to know what are "Spiders", here it is:

By Loralee Reach, January 20, 2011 in New to this? Things you should know...

Recommended Posts

Loralee Reach 245

Report post

Posted January 20, 2011

Before a search engine can tell you where a file or document is, it must be found. To find information on the hundreds of millions of Web pages that exist, a search engine employs special software robots, called spiders, to build lists of the words found on Web sites. When a spider is building its lists, the process is called Web crawling. (There are some disadvantages to calling part of the Internet the World Wide Web -- a large set of arachnid-centric names for tools is one of them.) In order to build and maintain a useful list of words, a search engine's spiders have to look at a lot of pages.

How does any spider start its travels over the Web? The usual starting points are lists of heavily used servers and very popular pages. The spider will begin with a popular site, indexing the words on its pages and following every link found within the site. In this way, the spidering system quickly begins to travel, spreading out across the most widely used portions of the Web.

"Spiders" take a Web page's content and create key search words that enable online users to find pages they're looking for.

Google began as an academic search engine. In the paper that describes how the system was built, Sergey Brin and Lawrence Page give an example of how quickly their spiders can work. They built their initial system to use multiple spiders, usually three at one time. Each spider could keep about 300 connections to Web pages open at a time. At its peak performance, using four spiders, their system could crawl over 100 pages per second, generating around 600 kilobytes of data each second.

Keeping everything running quickly meant building a system to feed necessary information to the spiders. The early Google system had a server dedicated to providing URLs to the spiders. Rather than depending on an Internet service provider for the domain name server (DNS) that translates a server's name into an address, Google had its own DNS, in order to keep delays to a minimum.

When the Google spider looked at an HTML page, it took note of two things:

* The words within the page

* Where the words were found

Words occurring in the title, subtitles, meta tags and other positions of relative importance were noted for special consideration during a subsequent user search. The Google spider was built to index every significant word on a page, leaving out the articles "a," "an" and "the." Other spiders take different approaches.

These different approaches usually attempt to make the spider operate faster, allow users to search more efficiently, or both. For example, some spiders will keep track of the words in the title, sub-headings and links, along with the 100 most frequently used words on the page and each word in the first 20 lines of text. Lycos is said to use this approach to spidering the Web.

Other systems, such as AltaVista, go in the other direction, indexing every single word on a page, including "a," "an," "the" and other "insignificant" words. The push to completeness in this approach is matched by other systems in the attention given to the unseen portion of the Web page, the meta tags.

Quote

Share this post

Link to post

Share on other sites

mod 135639

Report post

Posted January 20, 2011

Thanks Loralee, I moved this to the TECH corner.

That info is a little outdated (The specifics of some of the search engines) but it would give the basics as to what a Spider/Bot/Robot/Crawler/Scraper/etc... does (They have MANY names for these programs).

Quote

Share this post

Link to post

Share on other sites

Loralee Reach 245

Report post

Posted January 20, 2011

You work so hard for this site!!!!

You are most welcome.

Loralee

Quote

Share this post

Link to post

Share on other sites

Lexy Grace 103696

Report post

Posted January 21, 2011

I wanted to know what a spider was and thank you for letting the ones including me know. I love learning new things.

Big Hugs,

Lexy

Quote

Share this post

Link to post

Share on other sites

Guest jake_cdn

Report post

Posted January 25, 2011

I was reviewing the "Active Users" area and noticed that there was something called "Spiders" listed with about 175 users.

Does anyone have any idea what this refers to?

(Tried to post this thread about a week ago but it did not show up)

Quote

Share this post

Link to post

Share on other sites

Guest W***ledi*Time

Report post

Posted January 25, 2011

From the tech corner:

http://www.cerb.ca/vbulletin/showthread.php?t=43070

Quote

Share this post

Link to post

Share on other sites

mod 135639

Report post

Posted January 26, 2011

thanks again wit! beat me to it!!

Quote

Share this post

Link to post

Share on other sites

Guest jake_cdn

Report post

Posted January 26, 2011

Thanks for the quick response. I appreciate it and feel a litle more educated.

Thanks!

Quote

Share this post

Link to post

Share on other sites

BarrhavenWoody 10776

Report post

Posted February 3, 2011

I was wondering this myself. Thanks for asking the question, Jake. And thanks to WIT and Loralee for providing the answer. :)

Thanks! Spiders aren't so creepy after all. :)

Quote

Share this post

Link to post

Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

You are posting as a guest. If you have an account, please sign in.

Reply to this topic...

× Pasted as rich text. Paste as plain text instead

Only 75 emoji are allowed.

× Your link has been automatically embedded. Display as a link instead

× Your previous content has been restored. Clear editor

× You cannot paste images directly. Upload or insert images from URL.

Insert image from URL

Followers 3

Go To Topic Listing New to this? Things you should know...

Twitter Feed

Tweets by lyla_forum

Sign In

Someone wanted to know what are "Spiders", here it is:

Recommended Posts

Loralee Reach 245

Share this post

Link to post

Share on other sites

mod 135639

Share this post

Link to post

Share on other sites

Loralee Reach 245

Share this post

Link to post

Share on other sites

Lexy Grace 103696

Share this post

Link to post

Share on other sites

Guest jake_cdn

Share this post

Link to post

Share on other sites

Guest W***ledi*Time

Share this post

Link to post

Share on other sites

mod 135639

Share this post

Link to post

Share on other sites

Guest jake_cdn

Share this post

Link to post

Share on other sites

BarrhavenWoody 10776

Share this post

Link to post

Share on other sites

Join the conversation

Twitter Feed

Browse

Activity

My Activity Streams