9.2. Comprehension List

9.2.1. Syntax

Short syntax:

>>> [x for x in range(0,5)]
[0, 1, 2, 3, 4]

Long Syntax:

>>> list(x for x in range(0,5))
[0, 1, 2, 3, 4]

9.2.2. Microbenchmark

>>> 
... %%timeit -r 1000 -n 1000
... result = []
... for x in range(0,5):
...     result.append(x)
...
457 ns ± 69.4 ns per loop (mean ± std. dev. of 1000 runs, 1000 loops each)
>>> 
... %%timeit -r 1000 -n 1000
... result = [x for x in range(0,5)]
...
411 ns ± 76.6 ns per loop (mean ± std. dev. of 1000 runs, 1000 loops each)

9.2.3. Manipulate Numbers

>>> [x+1 for x in range(0,5)]
[1, 2, 3, 4, 5]
>>>
>>> [x+10 for x in range(0,5)]
[10, 11, 12, 13, 14]
>>> [x*x for x in range(1,5)]
[1, 4, 9, 16]
>>>
>>> [x*(x+1) for x in range(1,5)]
[2, 6, 12, 20]
>>> [x**2 for x in range(0,5)]
[0, 1, 4, 9, 16]
>>>
>>> [x**3 for x in range(0,5)]
[0, 1, 8, 27, 64]
>>>
>>> [2**x for x in range(0,5)]
[1, 2, 4, 8, 16]
>>>
>>> [3**x for x in range(0,5)]
[1, 3, 9, 27, 81]
>>> [1/x for x in range(0,5)]
Traceback (most recent call last):
ZeroDivisionError: division by zero
>>>
>>> [1/x for x in range(1,5)]
[1.0, 0.5, 0.3333333333333333, 0.25]

9.2.4. Manipulate Strings

>>> DATA = ['a', 'b', 'c']
>>>
>>> ','.join(DATA)
'a,b,c'
>>> DATA = ['a', 'b', 'c']
>>>
>>> ','.join(x for x in DATA)
'a,b,c'
>>> DATA = ['a', 'b', 'c']
>>>
>>> ','.join(x.upper() for x in DATA)
'A,B,C'

9.2.5. Slice Sequences

>>> DATA = [
...     ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...     (5.8, 2.7, 5.1, 1.9, 'virginica'),
...     (5.1, 3.5, 1.4, 0.2, 'setosa'),
...     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...     (6.3, 2.9, 5.6, 1.8, 'virginica'),
...     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...     (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>>
>>> [row for row in DATA]  
[('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
 (5.8, 2.7, 5.1, 1.9, 'virginica'),
 (5.1, 3.5, 1.4, 0.2, 'setosa'),
 (5.7, 2.8, 4.1, 1.3, 'versicolor'),
 (6.3, 2.9, 5.6, 1.8, 'virginica'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor'),
 (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>> [row for row in DATA[1:]]  
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
 (5.1, 3.5, 1.4, 0.2, 'setosa'),
 (5.7, 2.8, 4.1, 1.3, 'versicolor'),
 (6.3, 2.9, 5.6, 1.8, 'virginica'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor'),
 (4.7, 3.2, 1.3, 0.2, 'setosa')]

9.2.6. Slice Data in Sequences

>>> DATA = [
...     ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...     (5.8, 2.7, 5.1, 1.9, 'virginica'),
...     (5.1, 3.5, 1.4, 0.2, 'setosa'),
...     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...     (6.3, 2.9, 5.6, 1.8, 'virginica'),
...     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...     (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>>
>>> [row[-1] for row in DATA[1:]]
['virginica', 'setosa', 'versicolor', 'virginica', 'versicolor', 'setosa']
>>>
>>> [row[0:4] for row in DATA[1:]]  
[(5.8, 2.7, 5.1, 1.9),
 (5.1, 3.5, 1.4, 0.2),
 (5.7, 2.8, 4.1, 1.3),
 (6.3, 2.9, 5.6, 1.8),
 (6.4, 3.2, 4.5, 1.5),
 (4.7, 3.2, 1.3, 0.2)]

9.2.7. Unpack Sequences

>>> DATA = [
...     ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...     (5.8, 2.7, 5.1, 1.9, 'virginica'),
...     (5.1, 3.5, 1.4, 0.2, 'setosa'),
...     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...     (6.3, 2.9, 5.6, 1.8, 'virginica'),
...     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...     (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>>
>>> [features for *features,label in DATA[1:]]  
[[5.8, 2.7, 5.1, 1.9],
 [5.1, 3.5, 1.4, 0.2],
 [5.7, 2.8, 4.1, 1.3],
 [6.3, 2.9, 5.6, 1.8],
 [6.4, 3.2, 4.5, 1.5],
 [4.7, 3.2, 1.3, 0.2]]
>>>
>>> [X for *X,y in DATA[1:]]  
[[5.8, 2.7, 5.1, 1.9],
 [5.1, 3.5, 1.4, 0.2],
 [5.7, 2.8, 4.1, 1.3],
 [6.3, 2.9, 5.6, 1.8],
 [6.4, 3.2, 4.5, 1.5],
 [4.7, 3.2, 1.3, 0.2]]
>>> DATA = [
...     ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...     (5.8, 2.7, 5.1, 1.9, 'virginica'),
...     (5.1, 3.5, 1.4, 0.2, 'setosa'),
...     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...     (6.3, 2.9, 5.6, 1.8, 'virginica'),
...     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...     (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>>
>>> [label for *features,label in DATA[1:]]
['virginica', 'setosa', 'versicolor', 'virginica', 'versicolor', 'setosa']
>>>
>>> [y for *X,y in DATA[1:]]
['virginica', 'setosa', 'versicolor', 'virginica', 'versicolor', 'setosa']

9.2.8. Use Case - 0x01

  • Increment

>>> [x+1 for x in range(0,5)]
[1, 2, 3, 4, 5]

9.2.9. Use Case - 0x02

  • Decrement

>>> [x-1 for x in range(0,5)]
[-1, 0, 1, 2, 3]

9.2.10. Use Case - 0x03

  • Sum

>>> sum(x for x in range(0,5))
10

9.2.11. Use Case - 0x04

  • Even or Odd

>>> [x for x in range(0,5)]
[0, 1, 2, 3, 4]
>>> [x%2==0 for x in range(0,5)]
[True, False, True, False, True]

9.2.12. Assignments

Code 9.2. Solution
"""
* Assignment: Comprehension List Translate
* Required: yes
* Complexity: easy
* Lines of code: 1 lines
* Time: 3 min

English:
    1. Use list comprehension to iterate over `DATA`
    2. If letter is in `PL` then use conversion value as letter
    3. Add letter to `result`
    4. Run doctests - all must succeed

Polish:
    1. Użyj rozwinięcia listowego do iteracji po `DATA`
    2. Jeżeli litera jest w `PL` to użyj skonwertowanej wartości jako litera
    3. Dodaj literę do `result`
    4. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `str.join()`
    * `dict.get()`

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert type(result) is str

    >>> result
    'zazolc gesla jazn'
"""

PL = {'ą': 'a', 'ć': 'c', 'ę': 'e',
      'ł': 'l', 'ń': 'n', 'ó': 'o',
      'ś': 's', 'ż': 'z', 'ź': 'z'}

DATA = 'zażółć gęślą jaźń'

# str: DATA with substituted PL diacritic chars to ASCII letters
result = ...

Code 9.3. Solution
"""
* Assignment: Comprehension List Split
* Required: no
* Complexity: medium
* Lines of code: 4 lines
* Time: 8 min

English:
    1. Using List Comprehension split `DATA` into:
        a. `features_train: list[tuple]` - 60% of first features in `DATA`
        b. `features_test: list[tuple]` - 40% of last features in `DATA`
        c. `labels_train: list[str]` - 60% of first labels in `DATA`
        d. `labels_test: list[str]` - 40% of last labels in `DATA`
    2. In order to do so, calculate pivot point:
        a. length of `DATA` times given percent (60% = 0.6)
        b. remember, that slice indicies must be `int`, not `float`
        c. for example: if dataset has 10 rows, then 6 rows will be for
           training, and 4 rows for test
    3. Run doctests - all must succeed

Polish:
    1. Używając List Comprehension podziel `DATA` na:
        a. `features_train: list[tuple]` - 60% pierwszych features w `DATA`
        b. `features_test: list[tuple]` - 40% ostatnich features w `DATA`
        c. `labels_train: list[str]` - 60% pierwszych labels w `DATA`
        d. `labels_test: list[str]` - 40% ostatnich labels w `DATA`
    2. Aby to zrobić, wylicz punkt podziału:
        a. długość `DATA` razy zadany procent (60% = 0.6)
        b. pamiętaj, że indeksy slice muszą być `int` a nie `float`
        c. na przykład: if zbiór danych ma 10 wierszy, to 6 wierszy będzie
        do treningu, a 4 do testów
    3. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `sequence[:split]`
    * `sequence[split:]`

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert type(features_train) is list, \
    'make sure features_train is a list'

    >>> assert type(features_test) is list, \
    'make sure features_test is a list'

    >>> assert type(labels_train) is list, \
    'make sure labels_train is a list'

    >>> assert type(labels_test) is list, \
    'make sure labels_test is a list'

    >>> assert all(type(x) is list for x in features_train), \
    'all elements in features_train should be list'

    >>> assert all(type(x) is list for x in features_test), \
    'all elements in features_test should be list'

    >>> assert all(type(x) is str for x in labels_train), \
    'all elements in labels_train should be str'

    >>> assert all(type(x) is str for x in labels_test), \
    'all elements in labels_test should be str'

    >>> features_train  # doctest: +NORMALIZE_WHITESPACE
    [[5.8, 2.7, 5.1, 1.9],
     [5.1, 3.5, 1.4, 0.2],
     [5.7, 2.8, 4.1, 1.3],
     [6.3, 2.9, 5.6, 1.8],
     [6.4, 3.2, 4.5, 1.5],
     [4.7, 3.2, 1.3, 0.2]]

    >>> features_test  # doctest: +NORMALIZE_WHITESPACE
    [[7.0, 3.2, 4.7, 1.4],
     [7.6, 3.0, 6.6, 2.1],
     [4.9, 3.0, 1.4, 0.2],
     [4.9, 2.5, 4.5, 1.7]]

    >>> labels_train
    ['virginica', 'setosa', 'versicolor', 'virginica', 'versicolor', 'setosa']

    >>> labels_test
    ['versicolor', 'virginica', 'setosa', 'virginica']
"""

DATA = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (7.6, 3.0, 6.6, 2.1, 'virginica'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (4.9, 2.5, 4.5, 1.7, 'virginica')]

ratio = 0.6
header, *data = DATA
split = int(len(data) * ratio)