Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mca #59

Open
wants to merge 30 commits into
base: multi-submit
Choose a base branch
from
Open

mca #59

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
ba70b58
Add tests for default values on multi-option fields.
jmcarp Jul 27, 2014
e53ae57
Bump version.
jmcarp Jul 27, 2014
3c37c92
Clean up API; document changes; bump version.
jmcarp Jul 28, 2014
1ebc91c
Handle unicode in Form::__repr__.
jmcarp Sep 15, 2014
eed1e00
Do not install tests
voyageur Dec 4, 2014
c15b5ff
Switched requirements from == to >=.
StuntsPT Jan 13, 2015
70da61c
Added a test case where the parsing fails for empty select box
pratyushmittal Apr 16, 2015
56eddb9
Added a fix for paring empty select boxes
pratyushmittal Apr 16, 2015
2ee55f9
Fixed PIP Session error as suggested in other pull requests
pratyushmittal Apr 16, 2015
eee0cd8
The tests used to fail when the lxml library was installed: Error rai…
pratyushmittal Apr 16, 2015
4878ed3
List requirements in setup file.
jmcarp Apr 17, 2015
567d3ae
Handle iteration error in pypy tests.
jmcarp Apr 17, 2015
1a9aa8e
Merge pull request #31 from voyageur/do-not-install-tests
jmcarp Apr 18, 2015
bc9deb2
Use cached_property for parsed HTML.
jmcarp Apr 18, 2015
6587134
Merge remote-tracking branch 'pratyushmittal/master'
jmcarp Apr 18, 2015
3b60ed0
Lint with flake8.
jmcarp Apr 18, 2015
f842291
Release version 0.5.2.
jmcarp Apr 18, 2015
47397be
Add error message for selecting non-existent option in multi-option f…
rcutmore May 4, 2015
c249037
Don't serialize disabled inputs.
jmcarp May 25, 2015
fcf2446
Can't use `assert_not_in` in py2.6.
jmcarp May 25, 2015
d382f0c
Allow method override on `Browser#open`.
jmcarp May 26, 2015
e2480c4
Merge remote-tracking branch 'rcutmore/add-multi-option-field-error-m…
jmcarp May 27, 2015
4bf18ee
Add better error messages.
jmcarp May 27, 2015
a3ba87a
Fix default value for selects.
jmcarp Jun 1, 2015
a9ae952
Simplify value properties.
jmcarp Jun 1, 2015
06c3ae4
Update docs for change to Rap Genius website
rcutmore Jun 7, 2015
58b47b4
Merge pull request #45 from rcutmore/update-docs-for-rap-genius-change
jmcarp Jun 7, 2015
38d646a
Document custom session options.
jmcarp Jun 7, 2015
a243352
Offload retry logic to requests and urllib3.
jmcarp Jun 7, 2015
4284c11
Bump version and update changelog.
jmcarp Jun 7, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Config file for automatic testing at travis-ci.org

language: python
sudo: false

python:
- "3.4"
Expand All @@ -9,12 +8,13 @@ python:
- "2.6"
- "pypy"

# command to install dependencies, e.g. pip install -r requirements.txt --use-mirrors
install:
- pip install -r requirements.txt
install:
- python setup.py install
- pip install -U -r dev-requirements.txt
- pip install coverage coveralls nose responses

# command to run tests, e.g. python setup.py test
before_script: flake8 robobrowser

script: nosetests --with-coverage --cover-package=robobrowser

after_success: coveralls
29 changes: 29 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,35 @@
History
-------

0.5.3
++++++++++++++++++
* Improve documentation. Thanks tpugsley and rcutmore for improvements!
* Improve messages in error handling. Thanks again rcutmore!
* Fix default values for <select> fields. Thanks pohmelie for reporting!
* Simplify value property of field elements; delete `FieldMeta`.
* Move retry logic to requests and urllib3.

0.5.2
++++++++++++++++++
* Remove requirements parsing from `setup.py`.
* Don't pin to exact requirements versions. Thanks StuntsPT!
* Don't install tests along with package. Thanks voyageur!
* Handle empty select fields. Thanks pratyushmittal!
* Parse partial document correctly when lxml is installed. Thanks again pratyushmittal!
* Lint code with flake8.

0.5.0
++++++++++++++++++
* Add optional `session` argument to `RoboBrowser::__init__`
* Add optional `timeout` and `allow_redirects` options to `RoboBrowser::__init__`
* Allow `RoboBrowser::open`, `RoboBrowser::follow_link`, and `RoboBrowser::submit_form` to accept optional keyword arguments to requests (`timeout`, `verify`, etc.)
* *Backwards-incompatible*: Remove `auth`, `headers`, and `verify` arguments `from RoboBrowser::__init__`; session configuration should instead be passed in `session`
* *Backwards-incompatible*: Restrict `RoboBrowser::follow_link` to `link` argument; text strings and BeautifulSoup arguments no longer accepted

0.4.1
++++++++++++++++++
* Handle multi-option fields without "value" attributes

0.4.0
++++++++++++++++++
* Fix modeling of form fields to handle non-unique field names.
Expand Down
63 changes: 39 additions & 24 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ RoboBrowser: Your friendly neighborhood web scraper

.. image:: https://badge.fury.io/py/robobrowser.png
:target: http://badge.fury.io/py/robobrowser

.. image:: https://travis-ci.org/jmcarp/robobrowser.png?branch=master
:target: https://travis-ci.org/jmcarp/robobrowser

Expand All @@ -17,40 +17,41 @@ can fetch a page, click on links and buttons, and fill out and submit forms. If
that don't have APIs, RoboBrowser can help.

.. code-block:: python

import re
from robobrowser import RoboBrowser
# Browse to Rap Genius

# Browse to Genius
browser = RoboBrowser(history=True)
browser.open('http://rapgenius.com/')
# Search for Queen
browser.open('http://genius.com/')

# Search for Porcupine Tree
form = browser.get_form(action='/search')
form # <RoboForm q=>
form['q'].value = 'queen'
form['q'].value = 'porcupine tree'
browser.submit_form(form)

# Look up the first song
songs = browser.select('.song_name')
songs = browser.select('.song_link')
browser.follow_link(songs[0])
lyrics = browser.select('.lyrics')
lyrics[0].text # \n[Intro]\nIs this the real life...
lyrics[0].text # \nHear the sound of music ...

# Back to results page
browser.back()

# Look up my favorite song
browser.follow_link('death on two legs')
song_link = browser.get_link('trains')
browser.follow_link(song_link)

# Can also search HTML using regex patterns
lyrics = browser.find(class_=re.compile(r'\blyrics\b'))
lyrics.text # \n[Verse 1]\nYou suck my blood like a leech...
lyrics.text # \nTrain set and match spied under the blind...

RoboBrowser combines the best of two excellent Python libraries:
`Requests <http://docs.python-requests.org/en/latest/>`_ and
`BeautifulSoup <http://www.crummy.com/software/BeautifulSoup/>`_.
RoboBrowser represents browser sessions using Requests and HTML responses
RoboBrowser combines the best of two excellent Python libraries:
`Requests <http://docs.python-requests.org/en/latest/>`_ and
`BeautifulSoup <http://www.crummy.com/software/BeautifulSoup/>`_.
RoboBrowser represents browser sessions using Requests and HTML responses
using BeautifulSoup, transparently exposing methods of both libraries:

.. code-block:: python
Expand All @@ -75,11 +76,23 @@ using BeautifulSoup, transparently exposing methods of both libraries:
# <span class="mega-octicon octicon-checklist"></span>
# ...

You can also pass a custom `Session` instance for lower-level configuration:

.. code-block:: python

from requests import Session
from robobrowser import RoboBrowser

session = Session()
session.verify = False # Skip SSL verification
session.proxies = {'http': 'http://custom.proxy.com/'} # Set default proxies
browser = RoboBrowser(session=session)

RoboBrowser also includes tools for working with forms, inspired by
`WebTest <https://github.com/Pylons/webtest>`_ and `Mechanize <http://wwwsearch.sourceforge.net/mechanize/>`_.

.. code-block:: python

from robobrowser import RoboBrowser

browser = RoboBrowser()
Expand All @@ -102,9 +115,9 @@ RoboBrowser also includes tools for working with forms, inspired by
Checkboxes:

.. code-block:: python

from robobrowser import RoboBrowser

# Browse to a page with checkbox inputs
browser = RoboBrowser()
browser.open('http://www.w3schools.com/html/html_forms.asp')
Expand All @@ -113,26 +126,26 @@ Checkboxes:
form = browser.get_forms()[3]
form # <RoboForm vehicle=[]>
form['vehicle'] # <robobrowser.forms.fields.Checkbox...>

# Checked values can be get and set like lists
form['vehicle'].options # [u'Bike', u'Car']
form['vehicle'].value # []
form['vehicle'].value = ['Bike']
form['vehicle'].value = ['Bike', 'Car']

# Values can also be set using input labels
form['vehicle'].labels # [u'I have a bike', u'I have a car \r\n']
form['vehicle'].value = ['I have a bike']
form['vehicle'].value # [u'Bike']

# Only values that correspond to checkbox values or labels can be set;
# Only values that correspond to checkbox values or labels can be set;
# this will raise a `ValueError`
form['vehicle'].value = ['Hot Dogs']

Uploading files:

.. code-block:: python

from robobrowser import RoboBrowser

# Browse to a page with an upload form
Expand All @@ -150,6 +163,8 @@ Uploading files:
# Submit
browser.submit(upload_form)

By default, creating a browser instantiates a new requests `Session`.

Requirements
------------

Expand Down
4 changes: 2 additions & 2 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
-r requirements.txt

coverage
coveralls
docutils
flake8
mock
nose
sphinx
Expand Down
3 changes: 0 additions & 3 deletions requirements.txt

This file was deleted.

3 changes: 2 additions & 1 deletion robobrowser/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
__version__ = '0.3.3'
__version__ = '0.5.3'

from .browser import RoboBrowser

__all__ = ['RoboBrowser']
Loading