Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed 5 failing tests due to list(set(... operations in ParseResults.__init__ #9

Open
wants to merge 39 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
82cf864
Removed list(set(...)) de-duplicate operations in ParseResults.__init__
ianozsvald Sep 7, 2012
71b793a
Applied schwa's span addition
ianozsvald Sep 7, 2012
ff5a0c0
added span tests as a separate class
ianozsvald Sep 7, 2012
a202185
not sure what happened, unittest.main() does the job now
ianozsvald Sep 7, 2012
90fbc84
added test for hash and comma in URL
ianozsvald Sep 9, 2012
b25880a
uncovered two name-shielded tests and renamed, now also using non-htm…
ianozsvald Sep 10, 2012
536ba80
removed off-by-one offset for URL and hashtag matcher if a pre charac…
ianozsvald Sep 10, 2012
a8c77dc
added reference to the original project
ianozsvald Sep 13, 2012
f309568
changed URL
ianozsvald Sep 13, 2012
be4d2e3
first
ianozsvald Sep 13, 2012
489ca04
preparing for V1.0.0 release
ianozsvald Feb 11, 2013
e2c57a5
weird formatting bug
ianozsvald Feb 11, 2013
2ae04ff
weird formatting bug
ianozsvald Feb 11, 2013
c8e40cd
weird formatting bug
ianozsvald Feb 11, 2013
77ff625
weird formatting bug
ianozsvald Feb 11, 2013
4297316
weird formatting bug
ianozsvald Feb 11, 2013
22c73a9
weird formatting bug
ianozsvald Feb 11, 2013
e2e3615
weird formatting bug
ianozsvald Feb 11, 2013
4b8121c
weird formatting bug
ianozsvald Feb 11, 2013
c024c58
weird formatting bug
ianozsvald Feb 11, 2013
9b86dc1
weird formatting bug
ianozsvald Feb 11, 2013
400758b
minor
ianozsvald Feb 11, 2013
bdf7316
version bump after fixing up setup.py to use a subdirectory
ianozsvald Feb 11, 2013
52c6101
Fix t.co urls followed by a comma
lsemel Mar 25, 2013
a9973f9
added some notes for TODO
ianozsvald Mar 26, 2013
19e2368
bump of version nbr for this new working version, added a shortlink f…
ianozsvald Mar 28, 2013
1bab751
added requirements
ianozsvald Mar 28, 2013
79df69f
Merge branch 'master' of github.com:muckrack/twitter-text-python into…
ianozsvald Apr 4, 2013
dd4e932
adding some , parsing
ianozsvald Apr 4, 2013
4b2d7a0
extra note on how to run tests
ianozsvald Apr 4, 2013
f80d89c
used autopep8 to clean up the src
ianozsvald Jun 1, 2013
93f6985
minor
ianozsvald Jun 1, 2013
e00cad8
notes on pypi release and git tagging
ianozsvald Jun 1, 2013
0724099
note on pushing tags
ianozsvald Jun 1, 2013
033a5ab
cleanup
ianozsvald Jun 1, 2013
66c209b
cleanup
ianozsvald Jun 1, 2013
aa6bf1a
cleanup
ianozsvald Jun 1, 2013
756f947
point to Ed for his support
ianozsvald Jul 28, 2014
13f4990
point to Ed for his support
ianozsvald Jul 28, 2014
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 115 additions & 25 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,34 +1,94 @@
twitter-text-python
===================

**twitter-text-python** is a Tweet parser and formatter for Python.
**twitter-text-python** is a Tweet parser and formatter for Python. Extract users, hashtags, URLs and format as HTML for display.

It is based on twitter-text-java_ and passes all the unittests of
twitter-text-conformance_ plus some additional ones.
----
**UPDATE** this project is now maintained by Ed Burnett, please go here for the active version: https://github.com/edburnett/twitter-text-python
----

It is based on twitter-text-java_ and did pass all the unittests of
twitter-text-conformance_ plus some additional ones. Note that the conformance tests are now behind (easy PR for someone to work on: https://github.com/ianozsvald/twitter-text-python/issues/5 ):

.. _twitter-text-java: http://github.com/mzsanford/twitter-text-java
.. _twitter-text-conformance: http://github.com/mzsanford/twitter-text-conformance

This version was forked by Ian Ozsvald in January 2013 and released to PyPI, some bugs were fixed, a few minor changes to functionality added:
https://github.com/ianozsvald/twitter-text-python

PyPI release:
http://pypi.python.org/pypi/twitter-text-python/

The original ttp comes from Ivo Wetzel (Ivo's version no longer supported):
https://github.com/BonsaiDen/twitter-text-python

Usage::

>>> import ttp
>>> from ttp import ttp
>>> p = ttp.Parser()
>>> result = p.parse("@BonsaiDen Hey that's a great Tweet parser! #twp")
>>> result = p.parse("@ianozsvald, you now support #IvoWertzel's tweet parser! https://github.com/ianozsvald/")
>>> result.reply
'BonsaiDen'
'ianozsvald'
>>> result.users
['BonsaiDen']
['ianozsvald']
>>> result.tags
['twp']
['IvoWertzel']
>>> result.urls
[]
['https://github.com/ianozsvald/']
>>> result.html
u'<a href="http://twitter.com/BonsaiDen">@BonsaiDen</a> Hey that\'s a great Tweet Parser!
<a href="http://search.twitter.com/search?q=%23twp">#twp</a>'

u'<a href="http://twitter.com/ianozsvald">@ianozsvald</a>, you now support <a href="http://search.twitter.com/search?q=%23IvoWertzel">#IvoWertzel</a>\'s tweet parser! <a href="https://github.com/ianozsvald/">https://github.com/ianozsvald/</a>'

If you need different HTML output just subclass and override the ``format_*`` methods.

You can also ask for the span tags to be returned for each entity::

>>> p = ttp.Parser(include_spans=True)
>>> result = p.parse("@ianozsvald, you now support #IvoWertzel's tweet parser! https://github.com/ianozsvald/")
>>> result.urls
[('https://github.com/ianozsvald/', (57, 87))]


To use the shortlink follower:

>>> from ttp import utils
>>> # assume that result.urls == ['http://t.co/8o0z9BbEMu', u'http://bbc.in/16dClPF']
>>> print utils.follow_shortlinks(result.urls) # pass in list of shortlink URLs
{'http://t.co/8o0z9BbEMu': [u'http://t.co/8o0z9BbEMu', u'http://bbc.in/16dClPF', u'http://www.bbc.co.uk/sport/0/21711199#TWEET650562'], u'http://bbc.in/16dClPF': [u'http://bbc.in/16dClPF', u'http://www.bbc.co.uk/sport/0/21711199#TWEET650562']}
>>> # note that bad shortlink URLs have a key to an empty list (lost/forgotten shortlink URLs don't generate any error)


Installation
------------

**NOTE** this version (Ian's) is no longer maintained, see Ed's active version instead: https://github.com/edburnett/twitter-text-python

pip and easy_install will do the job::

# via: http://pypi.python.org/pypi/twitter-text-python
$ pip install twitter-text-python
$ python
>>> from ttp import ttp
>>> ttp.__version__
'1.0.0.2'

Changelog
---------

* 2013/2/11 1.0.0.2 released to PyPI
* 2013/6/1 1.0.1 new working version, adding comma parse fix (thanks https://github.com/muckrack), used autopep8 to clean the src, added a shortlink expander


Tests
-----

Checkout the code via github https://github.com/ianozsvald/twitter-text-python and run tests locally::

$ python ttp/tests.py
....................................................................................................
----------------------------------------------------------------------
Ran 100 tests in 0.009s
OK


Contributing
------------
Expand All @@ -37,23 +97,53 @@ The source is available on GitHub_, to
contribute to the project, fork it on GitHub and send a pull request.
Everyone is welcome to make improvements to **twp**!

.. _GitHub: http://github.com/BonsaiDen/twitter-text-python
.. _GitHub: https://github.com/ianozsvald/twitter-text-python


Todo
----

* Consider adding capitalised phrase identification
* Consider adding a repeated-char remover (e.g. grrrrrrr->grr)
* Make it 1 line to parse and get a results dict via __init__.py
* Tag the next release

Doing a release
---------------

In parent directory on Ian's machine see USE_THIS_FOR_PYPI_RELEASE.txt. The short form::

$ # edit setup.py to bump the version number
$ git tag -a v1.0.1 -m 'v1.0.1 release'
$ git push origin --tags
$ ianozsvald-twitter-text-python $ python setup.py sdist register upload -r http://pypi.python.org/pypi
$ # this uses ~/.pypirc with cached login details


License
=======
-------

*MIT*

Copyright (c) 2012 Ivo Wetzel.

Copyright (c) 2010 Ivo Wetzel
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

**twitter-text-python** is free software: you can redistribute it and/or
modify it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

**twitter-text-python** is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

You should have received a copy of the GNU General Public License along with
**twitter-text-python**. If not, see <http://www.gnu.org/licenses/>.
Copyright (c) 2010-2013 Ivo Wetzel

1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
requests==1.1.0
23 changes: 12 additions & 11 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,25 @@

setup(
name='twitter-text-python',
version='1.0',
description='Tweet parser and formatter',
long_description=open('README.rst').read(),
author='Ivo Wetzel',
author_email='',
url='http://github.com/BonsaiDen/twitter-text-python',
license='GPL',
py_modules=['ttp'],
version='1.0.1',
description='Twitter Tweet parser and formatter',
long_description="Extract @users, #hashtags and URLs (and unwind shortened links) from tweets including entity locations, also generate HTML for output. Visit https://github.com/ianozsvald/twitter-text-python for examples.",
#open('README.rst').read(),
author='Maintained by Ian Ozsvald (originally by Ivo Wetzel)',
author_email='[email protected]',
url='https://github.com/ianozsvald/twitter-text-python',
license='MIT',
packages=['ttp'],
include_package_data=True,
zip_safe=False,
install_requires=[],
classifiers=[
'Environment :: Web Environment',
# I don't know what exactly this means, but why not?
'Environment :: Console',
'Intended Audience :: Developers',
'License :: OSI Approved :: BSD License',
'License :: OSI Approved :: MIT License',
'Operating System :: OS Independent',
'Programming Language :: Python',
'Topic :: Software Development :: Libraries :: Python Modules',
'Topic :: Text Processing :: Linguistic',
]
)
Empty file added ttp/__init__.py
Empty file.
Loading