cURL - basic script part 2
Posted by Jo
In the first part of this basic cURL script tutorial it was demonstrated how to navigate to a web page using pycURL and then download it to a text file on your PC. See part 1 here.
This part of the tutorial will demonstrate navigating through to a secure page within a website to see protected content behind the login. The protected page will usually be a form for posting content.
Prerequisites:
- Python - http://www.python.org
- pycURL - http://pycurl.sourceforge.net
The steps
- Call cuRL
- Navigate to login page
- Login to site
- Post data to the protected page
- Clean up
Firstly we need to call cURL
import pycurl, StringIO, os
In the first part we simply had a download URL, now we need to introduce a new URL which is the login page. We also need to know what the username and password are for this login. Please note that in this tutorial the login details are stored within the script as plain text and as such are insecure, if you wish to keep this secure in its present format then please restrict access to your user area where you have the script stored.
The login form will have usually two form text areas and a submit button. Each form element has a ‘name’ value. We will be using this to identify where we want the text to appear. There can also be a form name which is included in this example. In order to find out what your form elements are called you should navigate to the login page and view source.
# Constants USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 6.0)' LOGIN_URL = 'http://www.somesite.com/login' LOGIN_POST_DATA = 'name=admin&pass=password123&form_id=user_login' SUBMISSION_URL = 'http://www.somesite.com//add-content' headers = ['Cache-control: max-age=0', 'Pragma: no-cache', 'Connection: Keep-Alive']
When the cURL script is executed the login details are submited to the login page and then we will need to store the session within the cookiejar file. This effectively keeps us logged into the site. Then we can navigate to the submission form and enter some data to submit to the site. As you can see this is done in a similar way to the login page.
# Request post page
slurpp = pycurl.Curl()
slurpp.setopt(pycurl.COOKIEFILE, 'cookiejar.txt')
slurpp.setopt(pycurl.URL, SUBMISSION_URL)
NEW_POST_DATA = 'submission_title=The title of the new post&data=This is some data that will make up our cURL post!&submission_button=Submit&form_id=submission_form
DATA = NEW_POST_DATA.encode('utf-8')
slurpp.setopt(pycurl.POSTFIELDS, DATA)
slurpp.setopt(pycurl.POST, 1)
slurpp.setopt(pycurl.VERBOSE, 1)
slurpp.perform()
The final code:
import pycurl, StringIO, os
# Constants
USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 6.0)'
LOGIN_URL = 'http://www.somesite.com/login'
LOGIN_POST_DATA = 'name=admin&pass=password123&form_id=user_login'
SUBMISSION_URL = 'http://www.somesite.com//add-content'
headers = ['Cache-control: max-age=0', 'Pragma: no-cache', 'Connection: Keep-Alive']
# Request post page
slurpp = pycurl.Curl()
slurpp.setopt(pycurl.COOKIEFILE, 'cookiejar.txt')
slurpp.setopt(pycurl.URL, SUBMISSION_URL)
NEW_POST_DATA = 'submission_title=The title of the new post&data=This is some data that will make up our cURL post!&submission_button=Submit&form_id=submission_form
slurpp.setopt(pycurl.POSTFIELDS, NEW_POST_DATA)
slurpp.setopt(pycurl.POST, 1)
slurpp.setopt(pycurl.VERBOSE, 1)
slurpp.perform()
# Clean up and close out
slurpp.close()
os.remove('cookiejar.txt')
The theory is simple, obviously to use this in the real world there may need to be some tweaking to this script, you may wish for example to encode your data in UTF-8.
I use the bones of this script to post data to a Drupal site. Certain form values are filled in with data that is generated dynamically from other Python scripts, making it almost impossible to post manually.
|
|
|
|
|
![]() |
cURL - basic script part 1
Posted by Jo
cURL is a really useful tool for the automatic scripting of websites. Basically anything which you can do on a website can be scripted through cURL. The uses are almost endless when you think about it. I’m currently using it to automate content generation for a Drupal based site and to generate user website user accounts via the Filemaker CRM system.
cURL comes in many flavours to be used on different platforms. The principles are the same though. Python is my current favourite language so this post will cover pycURL. pycURL is the Python interface to libcURL.
libcurl is a free and easy-to-use client-side URL transfer library, supporting FTP, FTPS, HTTP, HTTPS, GOPHER, TELNET, DICT, FILE and LDAP. libcurl supports HTTPS certificates, HTTP POST, HTTP PUT, FTP uploading, kerberos, HTTP form based upload, proxies, cookies, user+password authentication, file transfer resume, http proxy tunneling and more
Firstly let’s create a plausible scenario that will require us to use pycURL. The simplest is that we wish to download a webpage for offline processing/storage.
Prerequisites:
- Python - http://www.python.org
- pycURL - http://pycurl.sourceforge.net
The steps
- Call cuRL
- Navigate to webpage
- Download webpage
- Save locally
The code:
import pycurl, StringIO
# Constants
USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 6.0)'
DOWNLOAD_URL = 'http://pycurl.sourceforge.net/doc/pycurl.html'
headers = ['Cache-control: max-age=0', 'Pragma: no-cache', 'Connection: Keep-Alive']
# Set up objects
dev_null = StringIO.StringIO()
slurpp = pycurl.Curl()
slurpp.setopt(pycurl.URL, DOWNLOAD_URL)
slurpp.setopt(pycurl.HEADER, 1)
slurpp.setopt(pycurl.USERAGENT, USER_AGENT)
slurpp.setopt(pycurl.FOLLOWLOCATION, 1)
slurpp.setopt(pycurl.CONNECTTIMEOUT, 40)
slurpp.setopt(pycurl.TIMEOUT, 300)
slurpp.setopt(pycurl.COOKIEJAR, 'cookiejar.txt')
slurpp.setopt(pycurl.COOKIEFILE, 'cookiejar.txt')
slurpp.perform()
# Request post page
slurpp = pycurl.Curl()
slurpp.setopt(pycurl.COOKIEFILE, 'cookiejar.txt')
slurpp.setopt(pycurl.URL, DOWNLOAD_URL)
file = open("downloaded-file.txt", "w")
slurpp.setopt(pycurl.FILE, file);
slurpp.perform()
file.close()
The output will be a text file of the HTML source of the page. The next step will be to login to a secure webpage.
|
|
|
|
|
![]() |
