License Python Software Foundation License (Python 2.x)
Lines 27
SGMLParser (1) url (4) web page (1)
Included in this Library
Viewable by Everyone
Editable by All Siafoo Users
Siafoo is here to make coding less frustrating and to save you time. Join Siafoo Now or Learn More

Get a List of URLs from a Web Page Atom Feed 0

In Brief Use the SGMLParser to retrieve a list of URLs from a web page.
# 's
 1"""Extract list of URLs in a web page
3This program is part of "Dive Into Python", a free Python book for
4experienced programmers. Visit for the
5latest version.
8__author__ = "Mark Pilgrim ("
9__version__ = "$Revision: 1.2 $"
10__date__ = "$Date: 2004/05/05 21:57:19 $"
11__copyright__ = "Copyright (c) 2001 Mark Pilgrim"
12__license__ = "Python"
14from sgmllib import SGMLParser
16class URLLister(SGMLParser):
17 def reset(self):
18 SGMLParser.reset(self)
19 self.urls = []
21 def start_a(self, attrs):
22 href = [v for k, v in attrs if k=='href']
23 if href:
24 self.urls.extend(href)
26if __name__ == "__main__":
27 import urllib
28 usock = urllib.urlopen("")
29 parser = URLLister()
30 parser.feed(
31 parser.close()
32 usock.close()
33 for url in parser.urls: print url

Use the SGMLParser to retrieve a list of URLs from a web page.


over 5 years ago (07 Mar 2009 at 02:50 AM) by sadani

i have tried this code for getting URLS from a webpage.
it works fine.

now i want to make some changes,
like i want to get a list of only "image" urls.
e.g; jpeg,gif etc

can anyone help me or give me a hint how to do it

over 5 years ago (07 Mar 2009 at 03:14 AM) by David Isaacson
Do you mean you want to get the images on the page, or you want to get the <a> links that point to images?