Skip to content

Latest commit

 

History

History
1608 lines (1361 loc) · 31.5 KB

2-data.md

File metadata and controls

1608 lines (1361 loc) · 31.5 KB

title: Data processing with Python output: 2-data.html style: style.css theme: /home/as/wrk/seminar/cleaver-ribbon github-theme: shoorick/cleaver-ribbon author: name: "Alexander Sapozhnikov, Tatyana Vasilieva" was-github: shoorick company: "South Ural State University" was-twitter: "@shoorick77" email: as@susu.ru url: "https://as.susu.ru" -- title clear

Data processing with Python

Python

© Mike Wesemann -- clear

Part 2

-- ## Part 2
  • Variables
  • Objects
  • Data types --

Documentation

  • help() --

Navigate through help page

  • Enter ↲ or arrow — next line
  • arrow — previous line
  • space — next page
  • g — scroll to top
  • G — scroll to bottom
  • q — quit from help --

Documentation

Interactive programming environments

repl.it

-- black clear

repl.it

-- ## Interactive mode
>>>
⮬ prompt
-- ## Interactive mode
>>> 42 + 24
our input ⮭
-- ## Interactive mode
>>> 42 + 24
66 ⬅ result
-- ## Oops
>>> 42 + 'e'
Traceback (most recent call last):
  File "", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
⮬ error messages
-- ## Syntax peculiarities

No semicolon ; after single statement

counter = 42

--

Syntax peculiarities

Colon and indent instead of curly braces for blocks

for fruit in basket:
    # four spaces is recommended
    print(fruit)

--

Syntax peculiarities

Colon and indent instead of curly braces for blocks

for fruit in basket:
    # four spaces is recommended
    print(fruit)

--

Python is case sensitive

-- ## Python is case sensitive
A ≠ a

--

Variables

-- ## Variables

Assignment

name = 'value'

name → value

Assign a new value

name = 'value'
name = 42

name → value

--

Multiple assignment

>>> mice = cats = dogs = 3
>>> cats
3
![name → value](images/var-multiple-3.dot.svg)

--

Multiple assignment

>>> mice = cats = dogs = 3
>>> cats
3
>>> cats = 15
>>> dogs
3
![name → value](images/var-multiple.dot.svg)

--

Variable naming

There are only two hard things in Computer Science:
cache invalidation and naming things.

Phil Karlton

--

Variable naming

Available characters are:

  • Small and capital Latin letters a to z and A to Z
  • Digits (not in first position)
  • Underscore _
>>> theSun_and_8_planets = 'solar'

--

Variable naming

`b` (single lowercase letter)
`B` (single uppercase letter)
`CapitalizedWords`
or `CamelCase` 🐪

`lowercase` `lower_case_with_underscores` `UPPERCASE` `UPPER_CASE_WITH_UNDERSCORES`

See [PEP8](https://www.python.org/dev/peps/pep-0008/#naming-conventions)

-- ## Variable naming

Convention is to use lower_case_with_underscores — 🐍 snake case

for variables and functions

  • theSun_and_8_planets
  • the_sun_and_8_planets

--

You cannot use keyword as variable name

>>> global = 'World'
  File "<stdin>", line 1
    global = 'World'
           ^
SyntaxError: invalid syntax

--

You cannot use keyword as variable name

>>> help("keywords")

Here is a list of the Python keywords.  Enter any keyword to get more help.

False       class       from        or
None        continue    global      pass
True        def         if          raise

--

Keywords

FalseNoneTrueandas
assertasyncawaitbreakclass
continuedefdelelifelse
exceptfinallyforfromglobal
ifimportinislambda
nonlocalnotorpassraise
returntrywhilewithyield
--

Variable naming

Python 3 allows to use some non-ASCII letters but it’s a wrong way

>>> Öl = 'Barrel.'
>>> print(Öl * 3)
Barrel.Barrel.Barrel.
>>> Ø = 0
>>> Ж = 8
>>> Зима = 'Winter'

--

Don’t do that

>>>  = 'khar'
>>> ձ = 'ja'
>>> ж = 'zhe'
>>> ξ = 'xi'
>>> ש = 'shin'
>>> ش = 'sheen'

--

Don’t do that

>>> o = 'Latin'
>>> ο = 'Greek'
>>> о = 'Cyrillic'
>>> օ = 'Armenian'
>>>  = 'Georgian'
>>> print(o, ο, о, օ, )
Latin Greek Cyrillic Armenian Georgian

--

Non-letters are forbidden

>>> × = 'multiply'
  File "<stdin>", line 1
    × = 'multiply'
    ^

--

Non-letters are forbidden

>>> ⼤ = 'big'
  File "<stdin>", line 1
    ⼤ = 'big'
    ^
SyntaxError: invalid character in identifier

--

Process and output variable

>>> some = 'thing'
>>> len(some)
5
>>> print(some)
thing

--

Process and output variable

>>> print(some)
thing
>>> print('Any' + some)
Anything

--

Objects and classes

-- ## Let's imagine some animal

Properties

  • Size
  • Name
  • Genus
  • Color

Actions

  • Eat
  • Go
  • Run
  • Sleep

`cat` is an Object

      height = cat.size

result = cat.run(42) is_relaxed = cat.sleep()

--

Let's imagine some animal

Properties

  • Size

 

 

Methods

  • Run
  • Sleep

`cat` is an Object

      height = cat.size

result = cat.run(42) is_relaxed = cat.sleep()

-- ## Object has properties

Properties

  • Size

 

 

Methods

  • Run
  • Sleep

`cat` is an Object

      height = cat.size

result = cat.run(42) is_relaxed = cat.sleep()

--

Object has methods

Properties

  • Size

 

 

Methods

  • Run
  • Sleep

`cat` is an Object

      height = cat.size

result = cat.run(42) is_relaxed = cat.sleep()

-- ## Use dot . to call properties and methods

Properties

  • Size

 

 

Methods

  • Run
  • Sleep

`cat` is an Object

      height = cat.size

result = cat.run(42) is_relaxed = cat.sleep()

-- ## Class vs Object

**Class →**

 

 

**Objects →**

class Animal:
    def \__init__(self, name):
        self.name = name

cat = Animal('Tom') mouse = Animal('Jerry')

-- ## Which reason?

Any data type is a Class

  • int, float, str, list, tuple, dict, range...

Any data is an Object

  • 42, 3.14159265358, 'something', [2, 12, 85, 0.6]... --

Data types

-- ## Built-in data types
  • bool
  • int, float, complex
  • list, tuple, dict, set, range
  • str

Built-in data types

  • __bool__ean — logical
  • __int__eger, float, complexnumeric
  • list, tuple, __dict__ionary, set, rangesequences and structures
  • __str__ing — chain of characters

Detect type of data

>>> type(42)
<class 'int'>

--

Detect type of data

>>> type(42)
<class 'int'>
>>> type(3.14)
<class 'float'>

--

Detect type of data

>>> type(42)
<class 'int'>
>>> type(3.14)
<class 'float'>
>>> type('3.14')
<class 'str'>

--

Detect type of variables and expressions

>>> type(some)    # variable
<class 'str'>
>>> type(5 + 0.5) # expression
<class 'float'>

--

List methods with dir

>>> some = 'Thing'
>>> dir(some)     # or
>>> dir('Thing')  # or
>>> dir(str)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__',
... 'swapcase', 'title', 'translate', 'upper', 'zfill']

--

List methods with dir()

>>> some = 'Thing'
>>> dir(some)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__',
... 'swapcase', 'title', 'translate', 'upper', 'zfill']
>>> some.swapcase()
'tHING'

--

bool

Boolean type is used for logical data

  • True
  • False

Note: the capitalization

--

Logic

Answers have boolean type

>>> type(5 < 2)
<class 'bool'>
>>> type(apple == 'fruit')
<class 'bool'>

--

Convert to boolean

>>> bool(7)
True
>>> bool('non empty')
True
>>> bool([2020, 11, 22])
True

--

Convert to boolean

>>> bool(0)
False
>>> bool('')
False
>>> bool([])
False

--

Implicit conversion to boolean

>>> if list_name:
...    # do something with list_name

--

False value testing

  • constants defined to be false: None and False.
  • zero of any numeric type: 0, 0.0, 0j, Decimal(0), Fraction(0, 1)
  • empty sequences and collections: '', (), [], {}, set(), range(0)

--

int

integer number

>>> type(42)
<class 'int'>

--

Underscores for long numbers

>>> 4_294_967_296     # 232
4294967296
>>> +7_800_775_00_00  # even phone numbers
78007750000

--

int can use various bases

>>> 0xC0FFEE  # hexadecimal
12648430
>>> 0o777     # octal
511
>>> 0b1111    # binary
15

--

float

Floating point number

>>> 3.1415926
3.1415926
>>> 9.
9.0
>>> 3e8
300000000.0

--

float

3e8 = 3 × 108 = 300000000.0 # light speed, meters per second

125e-3 = 125 × 10−3 = 0.125

>>> 6.022e23 # Avogadro constant, mol−1
6.022e+23

--

Change type

Use name of type as function to convert data

>>> int(3.1415926)
3
>>> float(42)
42.0

--

Implicit type changing

>>> 3. + 2
5.0
>>> 3 + 2.
5.0

--

str

string is the sequence of characters

string

Individual characters are accessible

string with highlighted i

>>> 'string'[3]
'i'

--

as well as whole string

highlighted string

>>> 'string'.upper()
'STRING'

--

String parts are strings too

string with highlighted i

>>> 'string'[3].upper()
'I'

--

Methods of str

  • **capitalize**() → string
  • **center**(width[, fillchar]) → string
  • **count**(sub[, start[, end]]) → int
  • **decode**([encoding[,errors]]) → object
  • **encode**([encoding[,errors]]) → object
  • **endswith**(suffix[, start[, end]]) → bool
  • **expandtabs**([tabsize]) → string
  • **find**(sub [,start [,end]]) → int
  • **format**(\*args, \**kwargs) → string
  • **index**(sub [,start [,end]]) → int
-- ## Methods of `str`
  • **isalnum**() → bool
  • **isalpha**() → bool
  • **isdigit**() → bool
  • **islower**() → bool
  • **isspace**() → bool
  • **istitle**() → bool
  • **isupper**() → bool
  • **join**(iterable) → string
  • **just**(width[, fillchar]) → string
  • **lower**() → string
  • **lstrip**([chars]) → string or unicode
  • **partition**(sep) → (head, sep, tail)
  • **replace**(old, new[, count]) → string
  • **rfind**(sub [,start [,end]]) → int
-- ## Methods of `str`
  • **rindex**(sub [,start [,end]]) → int
  • **rjust**(width[, fillchar]) → string
  • **rpartition**(sep) → (head, sep, tail)
  • **rsplit**([sep [,maxsplit]]) → list of strings
  • **rstrip**([chars]) → string or unicode
  • **split**([sep [,maxsplit]]) → list of strings
  • **splitlines**(keepends=False) → list of strings
  • **startswith**(prefix[, start[, end]]) → bool
  • **strip**([chars]) → string or unicode
-- ## Methods of `str`
  • **swapcase**() → string
  • **title**() → string
  • **translate**(table [,deletechars]) → string
  • **upper**() → string
  • **zfill**(width) → string

38 methods!

-- ## String subtypes
>>> 'Generic' or "common"
'Generic'

--

Special characters

>>> 'B letter \x42'
'B letter B'
>>> "\x53ame behavior with double quotation marks"
'Same behavior with double quotation marks'
>>> 'Unicode: питон — это змея 🐍 蛇'
'Unicode: питон — это змея 🐍 蛇'

--

Special characters

C-like notation

  • \n — new line (LF — line feed)
  • \r — carriage return (CR)
  • \xNN — character having hexadecimal ASCII code NN

Use """triple delimiters""" to make multiple lines

>>> """
... Multiline strings
... often used as comments
... """
'\nMultiline strings\noften used as comments\n'

'''Triple delimiters'''

>>> '''
... These strings can contain
... 'single' or "double" quotation marks
... '''
'\nThese strings can contain\n\'single\' or "double" quotation marks\n'

--

Special characters

C-like notation

  • \n — new line (LF — line feed)
  • \r — carriage return (CR)
  • \xNN — character having hexadecimal ASCII code NN
>>> '\xAB'
'«'

--

Special characters

What about small Russian letter “ef”? Its hexadecimal code is 444.

>>> '\x444'
'D4'

--

Special characters

What about small Russian letter “ef”? Its hexadecimal code is 444.

>>> '\x444'
'D4'

--

Unicode characters

>>> '\u444'
SyntaxError: (unicode error) 'unicodeescape' codec
can't decode bytes in position 0-4: truncated \uXXXX escape
>>> '\u0444'
'ф'
>>> '\x01f41b'
'ὁb'

--

Unicode characters above U+FFFF

>>> '\U1f41b'
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec
can't decode bytes in position 0-6: truncated \UXXXXXXXX escape
>>> '\U0001f41b'
'🐛'

--

Unicode strings — u''

  • Same as generic string — Python 3
  • Different subtype — Python 2
>>> u'日'
'日'

--

Byte strings — b''

  • ASCII only — Python 3
  • Same as generic string which treats as byte sequence — Python 2
>>> b'Byte'
b'Byte'
>>> b'Жи-ши'
  File "<stdin>", line 1
SyntaxError: bytes can only contain ASCII literal characters.

--

Escape backslash

Just double it

>>> print('\\back')
\back

--

Raw strings — r''

There are no special characters

>>> r'\back\slash'
'\\back\\slash'

--

Raw strings

There are no special characters

>>> r'\back\slash'
'\\back\\slash'
>>> r'^\S+ome\regular\expr\e\s\Sio\n{7}'
'^\\S+ome\\regular\\expr\\e\\s\\Sio\\n{7}'

See also re — Regular expression operations

--

Raw strings

There are no special characters

>>> r'\back\slash'
'\\back\\slash'
>>> r'^\S+ome\regular\expr\e\s\Sio\n{7}'
'^\\S+ome\\regular\\expr\\e\\s\\Sio\\n{7}'
>>> r'C:\Windows\system32\drivers\hosts.txt'
'C:\\Windows\\system32\\drivers\\hosts.txt'

--

Raw strings with triple delimiters

>>> r'''
... TenorI = \context Voice = TenorI {
...     \global
...     \dynamicUp \stemUp \slurUp \tieUp
...     \tempo Moderato
... '''
'\nTenorI = \\context Voice = TenorI {\n    \\global\n    \\dynamicUp \\stemUp \\slurUp \\tieUp\n    \\tempo Moderato\n'

-- clear

Frescobaldi

--

Format strings — f''

>>> pi = 3.14159265358
>>> f'π is {pi}'
'π is 3.14159265358'

Since 2015 — Python 3.6. See also realpython.com/python-f-strings

Concatenate strings with +

>>> 'head ' + 'and' + ' tail'
'head and tail'

--

Concatenate strings with +

>>> 3 + ' is three'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
>>> str(3) + ' is three'
'3 is three'

--

list

List is a sequence of values

list

--

list

List items can have various types

list

>>> [1, 2, 3, 5, 7, 11, 'numbers']

--

Empty list

>>> empty = []

--

List items numbered from 0

Index of item is an offset from left edge of list

list

>>> prime_numbers = [1, 2, 3, 5, 7, 11]

--

List items numbered from 0

list

>>> prime_numbers = [1, 2, 3, 5, 7, 11]
>>> prime_numbers[3]
5

--

Use negative indices to access last items

list

>>> prime_numbers[-1]
11

--

Use negative indices to access last items

list

>>> prime_numbers[-2]
7

--

Slice — first:last

list

>>> prime_numbers[1:4]
[2, 3, 5]

--

Slice — first: without right bound

list

>>> prime_numbers[1:]
[2, 3, 5, 7, 11]

--

Slice — :last without left bound

list

>>> prime_numbers[:3]
[1, 2, 3]

--

Slice — ::step after second semicolon

list

>>> prime_numbers[1:6:2]
[2, 5, 11]

--

Without bounds but with ::step

list

>>> prime_numbers[::2]
[1, 3, 7]

--

Assign new value to certain item

>>> prime_numbers[3] = 'R'
>>> prime_numbers
[1, 2, 3, 'R', 7, 11]

--

String parts are accessible same way

list

>>> line = 'abcdefghi'
>>> line[3]
'd'

--

Get substring

list

>>> line = 'abcdefghi'
>>> line[:3]
'abc'

--

String parts are accessible same way

list

>>> line = 'abcdefghi'
>>> line[::2]
'acegi'

--

Access to single character

array

>>> names = ['Alice', 'Bob', 'Charlie']

--

Access to single character

access

>>> names[2][0]
'C'

--

Methods of list

>>> help(list)
Help on class list in module __builtin__:
...
 |  append(...)
 |      L.append(object) -- append object to end
 |
 |  count(...)

--

Methods of list

  • **append**(object)
  • **count**(value) → integer
  • **extend**(iterable)
  • **index**(value, [start, [stop]]) → integer
  • **insert**(index, object)
  • **pop**([index]) → item
  • **remove**(value)
  • **reverse**()
  • **sort**(cmp=None, key=None, reverse=False)
-- ## Methods of `list`

List methods

Add items to list

>>> abc = ['a', 'b', 'c']
>>> abc.append('e')
>>> abc.extend(['f', 'g'])
>>> abc.insert(3, 'd')
>>> abc
['a', 'b', 'c', 'd', 'e', 'f', 'g']

--

Remove items from list

>>> abc = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> abc.pop(3) # return deleted item
'd'
>>> del abc[1] # return nothing
>>> abc.remove('e')
>>> abc
['a', 'c', 'f', 'g']

tuple

Tuple is read-only list

>>> (1, 2, 3, 5, 7, 11)
(1, 2, 3, 5, 7, 11)
>>> (1, 2, 3, 5, 7, 11)[3]
5

--

Tuple is read-only list

>>> wheels = (2, 3, 4, 6, 8)
>>> wheels[2] = 7
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

--

Make tuple

Use parentheses to make a tuple

>>> (2)
2        # oops! It's an integer
>>> (2,)
(2,)     # tuple has one item
>>> ()
()       # tuple is empty

--

Convert tuple to list to make it writable

>>> list((1, 2, 3, 5, 7, 11))
[1, 2, 3, 5, 7, 11]
>>> tuple([1, 2, 3, 5, 7, 11])
(1, 2, 3, 5, 7, 11)

--

dict

Dictionary is list of pairs key: value

>>> apple = {'color': 'red', 'weight': 7, 'shape': 'ball'}
>>> apple['color']
'red'
>>> apple['shape']
'ball'

--

Change values and add new ones

>>> apple['color'] = 'yellow'
>>> apple['origin'] = 'Normandy'
>>> f"{apple['color']} apple came from {apple['origin']}"
'yellow apple came from Normandy'

--

Get nonexistent value

>>> apple['nonexistent']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'nonexistent'

--

Get nonexistent value

>>> apple['nonexistent']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'nonexistent'

>>> apple.get('nonexistent', 'none')
'none'

--

Get any value safely

>>> apple.get('nonexistent', 'none')
'none'

>>> apple.get('color', 'none')
'yellow'

--

Make an empty dict

>>> empty = dict()  # possible but ugly
>>> empty
{}

>>> hollow = {}     # better

--

set

>>> even = {0, 2, 4, 6, 2, 0, 0}
>>> even
{0, 2, 4, 6}

--

set

>>> 5 in even
False
>>> 2 in even
True

--

Methods of set

>>> dir(set)
['__and__', '__class__', '__contains__', '__delattr__', '__dir__',
'add', 'clear', 'copy', 'difference', 'difference_update',
'discard', 'intersection', 'intersection_update', 'isdisjoint',
'issubset', 'issuperset', 'pop', 'remove', 'symmetric_difference',
'symmetric_difference_update', 'union', 'update']

Methods of set

>>> help(set)
 |  add(...)
 |      Add an element to a set.
 |      This has no effect if the element is already present.
 |
 |  clear(...)
 |      Remove all elements from this set.

Call some methods of set type

>>> threes = {3, 6, 9, 12, 15, 18}
>>> fives  = {5, 10, 15, 20 }
>>> threes.union(fives)
{3, 5, 6, 9, 10, 12, 15, 18, 20}
>>> threes.difference(fives)
{3, 6, 9, 12, 18}

--

Convert range to set

>>> threes = set(range(3, 21, 3))
>>> fives  = set(range(5, 25, 5))
>>> threes.union(fives)
{3, 5, 6, 9, 10, 12, 15, 18, 20}
>>> threes.difference(fives)
{3, 6, 9, 12, 18}

--

Convert set to list

>>> threes[2]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'set' object is not subscriptable
>>> list(threes)[2]
9

--

range

Range is the sequence of monotonically uniformly changing integers — arithmetic progression:

  • 4, 5, 6, 7, 8, ...
  • 10, 9, 8, 7, 6, 5, 4, 3, 2, 1
  • 5, 15, 25, 35

--

Range

>>> teen = range(13, 20)

Mathematically, t = [13, 20)

Range

>>> teen = range(13, 20)

Mathematically, t = [13, 20)

include 13 and exclude 20

Print range

>>> teen = range(13, 20)
>>> teen
range(13, 20)

--

Print range

>>> teen = range(13, 20)
>>> teen
range(13, 20)
>>> for age in teen: print(age, end=', ')
...
13, 14, 15, 16, 17, 18, 19,

--

Get arbitrary item from range by its index

>>> for age in teen: print(age, end=', ')
...
13, 14, 15, 16, 17, 18, 19,
>>> teen[3]
16

--

Get part of range

>>> for age in teen: print(age, end=', ')
...
13, 14, 15, 16, 17, 18, 19,
>>> teen[3:]
range(16, 20)
>>> teen[:3]
range(13, 16)

--

Get part of range with third parameter

>>> for age in teen: print(age, end=', ')
...
13, 14, 15, 16, 17, 18, 19,
>>> teen[3::2]
range(16, 20, 2)

--

Range with step

>>> for item in range(0, 100, 9):
...     print(item, end=', ')
...
0, 9, 18, 27, 36, 45, 54, 63, 72, 81, 90, 99,

--

Range with negative step

>>> for item in range(3, 0, -1):
...     print(item, end=', ')
...
3, 2, 1,

--

Range with negative step

>>> for item in range(3, 0, -1):
...     print(item, end=', ')
...
3, 2, 1,

No zero here

--

Convert range to list

>>> before_ten = list(range(0, 10))
>>> before_ten
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

--

Conclusion

  • Data stored in variables.
  • Use proper names for variables.
  • All variables are objects, all data types are classes.
  • Python has a lot of built-in and external data types.
  • There are implicit and explicit changes of data types.
-- ## Next part