Skip to content

urllib

import urllib
response = urllib.urlopen('http://stackoverflow.com/documentation/')

Using urllib.urlopen() will return a response object, which can be handled similar to a file.

print response.code
# Prints: 200

The response.code represents the http return value. 200 is OK, 404 is NotFound, etc.

print response.read()
'<!DOCTYPE html>\r\n<html>\r\n<head>\r\n\r\n<title>Documentation - Stack. etc'

response.read() and response.readlines() can be used to read the actual html file returned from the request. These methods operate similarly to file.read*

import urllib.request
print(urllib.request.urlopen("http://stackoverflow.com/documentation/"))
# Prints: <http.client.HTTPResponse at 0x7f37a97e3b00>
response = urllib.request.urlopen("http://stackoverflow.com/documentation/")
print(response.code)
# Prints: 200
print(response.read())
# Prints: b'<!DOCTYPE html>\r\n<html>\r\n<head>\r\n\r\n<title>Documentation - Stack Overflow</title>

The module has been updated for Python 3.x, but use cases remain basically the same. urllib.request.urlopen will return a similar file-like object.

To POST data pass the encoded query arguments as data to urlopen()

import urllib
query_parms = {'username':'stackoverflow', 'password':'me.me'}
encoded_parms = urllib.urlencode(query_parms)
response = urllib.urlopen("https://stackoverflow.com/users/login", encoded_parms)
response.code
# Output: 200
response.read()
# Output: '<!DOCTYPE html>\r\n<html>\r\n<head>\r\n\r\n<title>Log In - Stack Overflow'
import urllib
query_parms = {'username':'stackoverflow', 'password':'me.me'}
encoded_parms = urllib.parse.urlencode(query_parms).encode('utf-8')
response = urllib.request.urlopen("https://stackoverflow.com/users/login", encoded_parms)
response.code
# Output: 200
response.read()
# Output: b'<!DOCTYPE html>\r\n<html>....etc'

Decode received bytes according to content type encoding

Section titled “Decode received bytes according to content type encoding”

The received bytes have to be decoded with the correct character encoding to be interpreted as text:

import urllib.request
response = urllib.request.urlopen("http://stackoverflow.com/")
data = response.read()
encoding = response.info().get_content_charset()
html = data.decode(encoding)
import urllib2
response = urllib2.urlopen("http://stackoverflow.com/")
data = response.read()
encoding = response.info().getencoding()
html = data.decode(encoding)