License Python Software Foundation License (Python 2.x)
Lines 27
SGMLParser (1) url (4) web page (1)
Included in this Library
Viewable by Everyone
Editable by All Siafoo Users

Get a List of URLs from a Web Page Atom Feed 0

In Brief Use the SGMLParser to retrieve a list of URLs from a web page.
# 's
 1"""Extract list of URLs in a web page
3This program is part of "Dive Into Python", a free Python book for
4experienced programmers. Visit for the
5latest version.
8__author__ = "Mark Pilgrim ("
9__version__ = "$Revision: 1.2 $"
10__date__ = "$Date: 2004/05/05 21:57:19 $"
11__copyright__ = "Copyright (c) 2001 Mark Pilgrim"
12__license__ = "Python"
14from sgmllib import SGMLParser
16class URLLister(SGMLParser):
17 def reset(self):
18 SGMLParser.reset(self)
19 self.urls = []
21 def start_a(self, attrs):
22 href = [v for k, v in attrs if k=='href']
23 if href:
24 self.urls.extend(href)
26if __name__ == "__main__":
27 import urllib
28 usock = urllib.urlopen("")
29 parser = URLLister()
30 parser.feed(
31 parser.close()
32 usock.close()
33 for url in parser.urls: print url

Use the SGMLParser to retrieve a list of URLs from a web page.


over 5 years ago (07 Mar 2009 at 02:50 AM) by sadani

i have tried this code for getting URLS from a webpage.
it works fine.

now i want to make some changes,
like i want to get a list of only "image" urls.
e.g; jpeg,gif etc

can anyone help me or give me a hint how to do it

over 5 years ago (07 Mar 2009 at 03:14 AM) by David Isaacson
Do you mean you want to get the images on the page, or you want to get the <a> links that point to images?