Python provides full-fledged support for implementing your own data structure using classes and custom operators. In this tutorial you will implement a custom pipeline data structure that can perform arbitrary operations on its data. We will use Python 3. Show
The Pipeline Data StructureThe pipeline data structure is interesting because it is very flexible. It consists of a list of arbitrary functions that can be applied to a collection of objects and produce a list of results. I will take advantage of Python's extensibility and use the pipe character ("|") to construct the pipeline. Live ExampleBefore diving into all the details, let's see a very simple pipeline in action: x = range(5) | Pipeline() | double | Ω print(x) [0, 2, 4, 6, 8] What's going on here? Let's break it down step by step. The first element The evaluation consists of taking the input and applying all the functions in the pipeline (in this case just the double function). Finally, we store the result in a variable called x and print it. Python ClassesPython supports
classes and has a very sophisticated object-oriented model including multiple inheritance, mixins, and dynamic overloading. An Here is a simple class that has an class A: def __init__(self, x=5): self.x = x def foo(self): return self.x * 3 Here is how to instantiate it with and without an explicit x argument: >>> a = A(2) >>> print(a.foo()) 6 a = A() print(a.foo()) 15 Custom OperatorsWith Python, you can use custom operators for your classes for nicer syntax. There are special methods known as "dunder" methods. The "dunder" means "double underscore". These methods like "__eq__", "__gt__" and "__or__" allow you to use operators like "==", ">" and "|" with your class instances (objects). Let's see how they work with the A class. If you try to compare two different instances of A to each other, the result will always be False regardless of the value of x: >>> print(A() == A()) False This is because Python compares the memory addresses of objects by default. Let's say we want to compare the value of x. We can add a special "__eq__" operator that takes two arguments, "self" and "other", and compares their x attribute: def __eq__(self, other): return self.x == other.x Let's verify: >>> print(A() == A()) True >>> print(A(4) == A(6)) False Implementing the Pipeline as a Python ClassNow
that we've covered the basics of classes and custom operators in Python, let's use it to implement our pipeline. The The "input" argument is the list of objects that the pipeline will operate on. Each item of the input will be processed by all the pipeline functions. The "terminals" argument is a list of functions, and when one of them is encountered the pipeline evaluates itself and returns the result. The terminals are by default just the print function (in Python 3, "print" is a function). Note that inside the constructor, a mysterious "Ω" is added to the terminals. I'll explain that next. The Pipeline ConstructorHere is the class definition and the class Pipeline: def __init__(self, functions=(), input=(), terminals=(print,)): if hasattr(functions, '__call__'): self.functions = [functions] else: self.functions = list(functions) self.input = input self.terminals = [Ω] + list(terminals) Python 3 fully supports Unicode in
identifier names. This means we can use cool symbols like "Ω" for variable and function names. Here, I declared an identity function called "Ω", which serves as a terminal function: I could have used the traditional syntax too: def Ω(x): return x The "__or__" and "__ror__" OperatorsHere comes the core of the Pipeline class. In order to use the "|" (pipe symbol), we need to override a couple of operators. The "|" symbol is used by Python for bitwise or of integers. In our
case, we want to override it to implement chaining of functions as well as feeding the input at the beginning of the pipeline. Those are two separate operations. The "__ror__" operator is invoked when the second operand is a Pipeline instance as long as the first operand is not. It considers the first operand as the input and stores it in the def __ror__(self, input): self.input = input return self Here is an
example where the The "__or__" operator is invoked when the first operand is a Pipeline (even if the second operand is also a Pipeline). It accepts the operand to be a callable function and it asserts that the "func" operand is indeed callable. Then, it appends the function to the def __or__(self, func): assert(hasattr(func, '__call__')) self.functions.append(func) if func in self.terminals: return self.eval() return self Evaluating the PipelineAs you add more and more non-terminal functions to the pipeline, nothing happens. The actual evaluation is deferred until the The evaluation consists of iterating over all the functions in the pipeline (including the terminal function if there is one) and running them in order on the output of the previous function. The first function in the pipeline receives an input element. def eval(self): result = [] for x in self.input: for f in self.functions: x = f(x) result.append(x) return result Using Pipeline EffectivelyOne of the best ways to use a pipeline is to apply it to multiple sets of input. In the following example, a pipeline with no inputs and no terminal functions is defined. It has two functions: the infamous Then, we provide it
three different inputs. In the inner loop, we add the p = Pipeline() | double | math.floor for input in ((0.5, 1.2, 3.1), (11.5, 21.2, -6.7, 34.7), (5, 8, 10.9)): result = input | p | Ω print(result) [1, 2, 6] [23, 42, -14, 69] [10, 16, 21] You could use the keep_palindromes = lambda x: (p for p in x if p[::-1] == p) keep_longer_than_3 = lambda x: (p for p in x if len(p) > 3) p = Pipeline() | keep_palindromes | keep_longer_than_3 | list (('aba', 'abba', 'abcdef'),) | p | print ['abba'] Future ImprovementsThere are a few improvements that can make the pipeline more useful:
ConclusionPython is a very expressive language and is well equipped for designing your own data structure and custom types. The ability to override standard operators is very powerful when the semantics lend themselves to such notation. For example, the pipe symbol ("|") is very natural for a pipeline. A lot of Python developers enjoy Python's built-in data structures like tuples, lists, and dictionaries. However, designing and implementing your own data structure can make your system simpler and easier to work with by elevating the level of abstraction and hiding internal details from users. Give it a try. Did you find this post useful? Principal Software Architect at Helix Gigi Sayfan is a principal software architect at Helix — a bioinformatics and genomics start-up. Gigi has been developing software professionally for more than 20 years in domains as diverse as instant messaging, morphing, chip fabrication process control, embedded multimedia applications for game consoles, brain-inspired machine learning, custom browser development, web services for 3D distributed game platforms, IoT sensors and virtual reality. He has written production code in many programming languages such as Go, Python, C, C++, C#, Java, Delphi, JavaScript, and even Cobol and PowerBuilder for operating systems such as Windows (3.11 through 7), Linux, Mac OSX, Lynx (embedded), and Sony PlayStation. His technical expertise includes databases, low-level networking, distributed systems, unorthodox user interfaces, and the general software development lifecycle. How do you write a data structure in Python?Just like a List, a Tuple can also contain elements of various types. In Python, tuples are created by placing a sequence of values separated by 'comma' with or without the use of parentheses for grouping of the data sequence. Note: Tuples can also be created with a single element, but it is a bit tricky.
Is Python good for learning data structures?In the facets of speed, accessibility, and syntax, python is a good language for Data Structures.
Can I use Python for Data Structures and Algorithms?In this Data Structures and Algorithms Through Python In Depth course, Python programs are used for implementing various concepts, but you can easily code them in any other programming language like C++, Java or C#.
|