Logo Search packages:      
Sourcecode: webcheck version File versions  Download package

crawler::Link Class Reference

List of all members.


Detailed Description

This is a basic class representing a url.

Some basic information about a url is stored in instances of this
class:

  url        - the url this link represents
  scheme     - the scheme part of the url
  netloc     - the netloc part of the url
  path       - the path part of the url
  query      - the query part of the url
  parents    - list of parent links (all the Links that link to this
               page)
  children   - list of child links (the Links that this page links to)
  pagechildren - list of child pages, including children of embedded
                 elements
  embedded   - list of links to embeded content
  anchors    - list of anchors defined on the page
  reqanchors - list of anchors requesten for this page anchor->link*
  depth      - the number of clicks from the base urls this page to
               find
  isinternal - whether the link is considered to be internal
  isyanked   - whether the link should be checked at all
  isfetched  - whether the lis is fetched already
  ispage     - whether the link represents a page
  mtime      - modification time (in seconds since the Epoch)
  size       - the size of this document
  mimetype   - the content-type of the document
  encoding   - the character set used in the document
  title      - the title of this document (unicode)
  author     - the author of this document (unicode)
  status     - the result of retreiving the document
  linkproblems - list of problems with retrieving the link
  pageproblems - list of problems in the parsed page
  redirectdepth - the number of this redirect (=0 not a redirect)

   Instances of this class should be made through a site instance
   by adding internal urls and calling crawl().
   

Definition at line 282 of file crawler.py.


Public Member Functions

def __init__
def add_anchor
def add_child
def add_embed
def add_linkproblem
def add_pageproblem
def add_reqanchor
def fetch
def follow_link
def redirect
def set_encoding

Public Attributes

 anchors
 author
 children
 depth
 embedded
 encoding
 isfetched
 isinternal
 ispage
 isyanked
 linkproblems
 mimetype
 mtime
 pagechildren
 pageproblems
 parents
 redirectdepth
 redirectlist
 reqanchors
 site
 size
 status
 title
 url

Private Member Functions

def __checkurl
def __tolink
def _pagechildren

Private Attributes

 _ischanged

The documentation for this class was generated from the following file:

Generated by  Doxygen 1.6.0   Back to index