Skip to content
Search
Generic filters
Exact matches only

A gentle introduction to iterators in C++ and Python

Part 2: A look at iterators in Python, using itertools

Ciarán Cooney
Source: Ciaran Cooney (drawn in Powerpoint).

In my previous post (here), I went into a general discussion on the virtues of using iterators in your code and ran through some beginner-level examples in C++. Here, I am going to extend this introduction to iterators by looking at how they are implemented in Python and how you can use them to improve your code.

name = ["Ciaran", "Cooney"]
it = iter(name)
print(next(it))
print(next(it))
#output
Ciaran
Cooney

Python comes with several built-in functions such as zip and map which facilitate iteration over data containers. These are very useful and time-saving tools once you have developed an intuition for when and how to use them. The zip function effectively works by using iter() and next() to to call and advance through each of the input arguments before returning an iterator which can return tuple containing input data with common indices.

a = zip([1,2,3], ['a','b','c'])
print(list(a))
#output
[(1, 'a'), (2, 'b'), (3, 'c')]

Map applies a function to each element in an interable before advancing to the next. Here, iter() is called on the second argument and the input function is applied to the corresponding element. Next() is then called until the iterator is exhausted.

b = map(len, ['hello', 'world'])
print(list(b))
class MyClass():

def __init__(self, container):
self.container = container

def __iter__(self):
self.count = 0
return self

def __next__(self):
if self.count < len(self.container):
x = self.container[self.count]
self.count += 1
return x
else:
raise StopIteration

myclass = MyClass(["Hello", "my", "name", "is", "Ciaran"])
myiter = iter(myclass)
for x in myiter:
print(x)
#output
Hello
my
name
is
Ciaran

One function I like is dropwhile() which allows you to make an iterator that drops elements from a iterable for as long as predicate is true, after which it returns all elements. Groupby() is a common iterator algorithm which returns consecutive keys and groups from the iterable. Another useful function it itertools is permutations(). As you might have guessed, this one returns permutations of the elements contained within the input iterable. The length of permutations can be constrained by a second argument, r (see code below), otherwise permuations will be the length of the input iterable. I have coded up some examples of using these functions:

print(list(dropwhile(lambda x: x<=3, [1,2,3,4,5,6,7,8,9,3])))
#output: [4, 5, 6, 7, 8, 9, 3]
print(list(list((list(g), k)) for k, g in groupby([1,2,2,2,2,3,4,4,4,4,5,5,2,1,1,1,1])))
#output: [[[1], 1], [[2, 2, 2, 2], 2], [[3], 3], [[4, 4, 4, 4], 4], [[5, 5], 5], [[2], 2], [[1, 1, 1, 1], 1]]
print(list(permutations([1,2,3])))
[(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]
print(list(permutations([1,2,3], 2)))
[(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)]

Let’s consider a simple example where we want to take two lists containing positive integers, determine all possible combinations of elements across lists (not within lists) and return the sum of each combination. Below, I have implemented a typical function with a couple of for loops to run over the lists and perform the summing operations.

a = [1,2,3]
b = [4,5,6]
def sum_combinations(a, b):
combinations, results = [], []
for i in a:
for j in b:
combinations.append(tuple((i,j)))
results.append(sum((i,j)))
return combinations, results
combs, res = sum_combinations(a,b)
print(combs, res)
#output
[(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)]
[5, 6, 7, 6, 7, 8, 7, 8, 9]

This is fine for the 3-element lists I used in this example. But what happens if we expand the inputs to contain 10000 integers each? To test this I imported the time module to see how long the function would run for on my admittedly less than special laptop:

import time
a = np.random.randint(5, size=10000)
b = np.random.randint(5, size=10000)
start = time.time()
combs, res = sum_combinations(a,b)
stop = time.time()
print(f"time: {stop-start}")
#output:
time: 108.07000184059143

Okay, 108s seems like a fairly long time to have to wait for some basic operations. Fortunately, we have an alternative: iterator algebra!

Here I use the itertools function product() along with the map function mentioned above. This function gives us a cartesian product of the input iterables, kind of like using nested for loops. We then use map to apply the sum function as we iterate through the inputs.

start = time.time()
res_1 = list(map(sum,itertools.product(a,b, repeat=1)))
stop = time.time()
print(f"time: {stop-start}")
#output: time: 34.44488835334778

Look at the time difference here! 108 s when we implement a standard looping function and 34 s when using itertools with iterator algebra. If you take nothing else from this post, at least notice the potential time gains iterators can offer when the dimensionality of data expands.

I hope this post will prove useful for some of you venturing into the world of iterators for more efficient data processing.

All of the Python examples and any additional functions I have used are available here: https://github.com/cfcooney/medium_posts

[2] S. Jaiswal, “Python Iterator Tutorial,” DataCamp, 2018. [Online]. Available: https://www.datacamp.com/community/tutorials/python-iterator-tutorial?utm_source=adwords_ppc&utm_campaignid=898687156&utm_adgroupid=48947256715&utm_device=c&utm_keyword=&utm_matchtype=b&utm_network=g&utm_adpostion=&utm_creative=332602034343&utm_targetid=a.