Skip to content

Commit

Permalink
add support for subscript and superscript numbers
Browse files Browse the repository at this point in the history
  • Loading branch information
marc authored and marc committed May 16, 2020
1 parent db4e3f3 commit 11c0b8f
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 1 deletion.
2 changes: 1 addition & 1 deletion dateparser/languages/locale.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ def translate(self, date_string, keep_formatting=False, settings=None):
def _translate_numerals(self, date_string):
date_string_tokens = NUMERAL_PATTERN.split(date_string)
for i, token in enumerate(date_string_tokens):
if token.isdigit():
if token.isdecimal():

This comment has been minimized.

Copy link
@thomasleveil

thomasleveil Aug 21, 2020

FYI this breaks compatibility with Python 2.7

You might want to either revert this change or modify the classifiers in setup.py

This comment has been minimized.

Copy link
@noviluni

noviluni Aug 25, 2020

Collaborator

hi @thomasleveil , thank you for commenting!

Why do you say it breaks the compatibility? Tests were passing for Python 2.7 in that version.

I debugged a little and it doesn't work as expected, but the result is the same because even if this part "fails" when using Python 2.7, the string is then "normalized" and transformed to the correct string.

Example: this u'²⁹/⁰⁵/²⁰¹⁵' is normalized to u'29/05/2015', so it works as expected.

I also tried with search_dates(u'²⁹/⁰⁵/²⁰¹⁵') and it seems to work properly.

Could you give us any other clue to see how did you detect the incompatibility?

BTW, that will be the last version supporting Python 2.7. If we detect an error and there is any interest from the community we could fix it, but we don't expect to support it anymore.

This comment has been minimized.

Copy link
@thomasleveil

thomasleveil Aug 25, 2020

here's the proof:

Python 2.7.17 (default, Jul 20 2020, 15:37:01) 
[GCC 7.5.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> "123".isdigit()
True
>>> "123".isdecimal()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'isdecimal'
>>> 
Python 3.6.9 (default, Jul 17 2020, 12:50:27) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> "123".isdigit()
True
>>> "123".isdecimal()
True
>>> 

I noticed this when looking at the test suite of project maya :

tests/test_maya.py:195: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/2.7.18/x64/lib/python2.7/site-packages/maya/core.py:371: in slang_date
    return _translate(dt, locale)
/opt/hostedtoolcache/Python/2.7.18/x64/lib/python2.7/site-packages/maya/core.py:797: in _translate
    base = en.translate(naturaldate, settings=dateparser.conf.settings)
/opt/hostedtoolcache/Python/2.7.18/x64/lib/python2.7/site-packages/dateparser/languages/locale.py:128: in translate
    date_string = self._translate_numerals(date_string)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <dateparser.languages.locale.Locale object at 0x7fb939952950>
date_string = 'tomorrow'

    def _translate_numerals(self, date_string):
        date_string_tokens = NUMERAL_PATTERN.split(date_string)
        for i, token in enumerate(date_string_tokens):
>           if token.isdecimal():
E           AttributeError: 'str' object has no attribute 'isdecimal'

/opt/hostedtoolcache/Python/2.7.18/x64/lib/python2.7/site-packages/dateparser/languages/locale.py:154: AttributeError
____________________________ test_slang_date_locale ____________________________

Note that using a unicode string, .isdecimal() works with python 2.7 :

Python 2.7.17 (default, Jul 20 2020, 15:37:01) 
[GCC 7.5.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> u"123".isdecimal()
True
>>> unicode("123").isdecimal()
True

This comment has been minimized.

Copy link
@thomasleveil

thomasleveil Aug 25, 2020

The test file test_date_parser.py is using from __future__ import unicode_literals which explains why the test suite does not detect this bug

This comment has been minimized.

Copy link
@Gallaecio

Gallaecio Aug 26, 2020

Member

Can you reproduce this issue with the dateparser API? We’ve tested a few inputs for parse() and search_dates() and it does not seem to break.

Mind that u"123".isdecimal() will work in Python 2. The method is only missing for str, not for unicode. We believe that the string reaching that code is always a Unicode string, which would mean that there is no bug in dateparser.

This comment has been minimized.

Copy link
@noviluni

noviluni Aug 26, 2020

Collaborator

Hi @thomasleveil ! Thank you for mentioning the Maya tests, now I'm able to reproduce it:

>>> import dateparser
>>> from dateparser.languages.loader import default_loader 
>>> locale = default_loader.get_locale("en")  
>>> locale.translate("any-non-utf-string", settings=dateparser.conf.settings) 
AttributeError: 'str' object has no attribute 'isdecimal'

We can say that it's not an incompatibility with Python 2.7, but something not documented: "now the translate() method only accepts Unicode strings".

I added a comment in you Maya issue explaining how to fix it.

Thanks again for pointing this out.

date_string_tokens[i] = str(int(token)).zfill(len(token))
if isinstance(date_string_tokens[i], bytes):
date_string_tokens[i] = date_string_tokens[i].decode('utf-8')
Expand Down
5 changes: 5 additions & 0 deletions tests/test_date_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -599,6 +599,11 @@ def test_parse_timestamp(self, date_string, expected):
param('16:10', expected=datetime(2015, 2, 15, 16, 10), period='day'),
param('2014', expected=datetime(2014, 2, 15), period='year'),
param('2008', expected=datetime(2008, 2, 15), period='year'),
# subscript and superscript dates
param('²⁰¹⁵', expected=datetime(2015, 2, 15), period='year'),
param('²⁹/⁰⁵/²⁰¹⁵', expected=datetime(2015, 5, 29), period='day'),
param('₁₅/₀₂/₂₀₂₀', expected=datetime(2020, 2, 15), period='day'),
param('₃₁ December', expected=datetime(2015, 12, 31), period='day'),
])
def test_extracted_period(self, date_string, expected=None, period=None):
self.given_local_tz_offset(0)
Expand Down

0 comments on commit 11c0b8f

Please sign in to comment.