Knowledge Booster Show
Learn more about Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below. Recommended textbooks for you Database System Concepts ISBN:9780078022159 Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan Publisher:McGraw-Hill Education Starting Out with Python (4th Edition) ISBN:9780134444321 Author:Tony Gaddis Publisher:PEARSON Digital Fundamentals (11th Edition) ISBN:9780132737968 Author:Thomas L. Floyd Publisher:PEARSON C How to Program (8th Edition) ISBN:9780133976892 Author:Paul J. Deitel, Harvey Deitel Publisher:PEARSON Database Systems: Design, Implementation, & Manag... ISBN:9781337627900 Author:Carlos Coronel, Steven Morris Publisher:Cengage Learning Programmable Logic Controllers ISBN:9780073373843 Author:Frank D. Petruzella Publisher:McGraw-Hill Education Database System Concepts ISBN:9780078022159 Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan Publisher:McGraw-Hill Education Starting Out with Python (4th Edition) ISBN:9780134444321 Author:Tony Gaddis Publisher:PEARSON Digital Fundamentals (11th Edition) ISBN:9780132737968 Author:Thomas L. Floyd Publisher:PEARSON C How to Program (8th Edition) ISBN:9780133976892 Author:Paul J. Deitel, Harvey Deitel Publisher:PEARSON Database Systems: Design, Implementation, & Manag... ISBN:9781337627900 Author:Carlos Coronel, Steven Morris Publisher:Cengage Learning Programmable Logic Controllers ISBN:9780073373843 Author:Frank D. Petruzella Publisher:McGraw-Hill Education File Input/OutputFile Input/Ouput (IO) requires 3 steps:
Python provides built-in functions and modules to support these operations. Opening/Closing a File
Reading/Writing Text FilesThe fileObj returned after the file is opened maintains a file pointer. It initially positions at the beginning of the file and advances whenever read/write operations are performed. Reading Line/Lines from a Text File
Writing Line to a Text File
Examples>>> f = open('test.txt', 'w') >>> f.write('apple\n') >>> f.write('orange\n') >>> f.write('pear\n') >>> f.close() >>> f = open('test.txt', 'r') >>> f.readline() 'apple\n' >>> f.readlines() ['orange\n', 'pear\n'] >>> f.readline() '' >>> f.close() >>> f = open('test.txt', 'r') >>> f.read() 'apple\norange\npear\n' >>> f.close() >>> f = open('test.txt') >>> line = f.readline() >>> while line: line = line.rstrip() print(line) line = f.readline() apple orange pear >>> f.close()Processing Text File Line-by-LineWe can use a with-statement to open a file, which will be closed automatically upon exit, and a for-loop to read line-by-line as follows: with open('path/to/file.txt', 'r') as f: for line in f: line = line.strip()The with-statement is equivalent to the try-finally statement as follows: try: f = open('path/to/file.txt') for line in f: line = line.strip() finally: f.close()Example: Line-by-line File CopyThe following script copies a file into another line-by-line, prepending each line with the line number.
Binary File Operations[TODO] Intro
For example [TODO] Directory and File ManagementIn Python, directory and file management are supported by modules os, os.path, shutil, ... Path Operations Using Module os.pathIn Python, a path could refer to:
A path could be absolute (beginning with root) or relative to the current working directory (CWD). The path separator is platform-dependent (Windows use '\', while Unixes/Mac OS use '/'). The os.path module supports platform-independent operations on paths, by handling the path separator intelligently. Checking Path Existence and Type
For examples, >>> import os >>> os.path.exists('/usr/bin') True >>> os.path.isfile('/usr/bin') False >>> os.path.isdir('/usr/bin') TrueForming a New PathThe path separator is platform-dependent (Windows use '\', while Unixes/Mac OS use '/'). For portability, It is important NOT to hardcode the path separator. The os.path module supports platform-independent operations on paths, by handling the path separator intelligently.
For examples, >>> import os >>> print(os.path.sep) / >>> print(os.path.join(os.path.sep, 'etc', 'apache2', 'httpd.conf')) /etc/apache2/httpd.conf >>> print(os.path.join('..', 'apache2', 'httpd.conf')) ../apache2/httpd.confManipulating Directory-name and Filename
For example, to form an absolute path of a file called out.txt in the same directory as in.txt, you may extract the absolute directory name of the in.txt, then join with out.txt, as follows: os.path.join(os.path.dirname(os.path.abspath('in.txt')), 'out.txt') os.path.join(os.path.dirname('in.txt'), 'out.txt')For example, import os print('__file__:', __file__) print('dirname():', os.path.dirname(__file__)) print('abspath():', os.path.abspath(__file__)) print('dirname(abspath()):', os.path.dirname(os.path.abspath(__file__)))When a module is loaded in Python, __file__ is set to the module name. Try running this script with various __file__ references and study their output: $ python3 ./test_ospath.py $ python3 test_ospath.py $ python3 ../parent_dir/test_ospath.py $ python3 /path/to/test_ospath.pyHandling Symlink (Unixes/Mac OS)
For example, import os print('__file__:', __file__) print('abspath():', os.path.abspath(__file__)) print('realpath():', os.path.realpath(__file__)) $ python3 test_realpath.py # Same output for abspath() and realpath() becuase there is no symlink $ ln -s test_realpath.py test_realpath_link.py $ python3 test_realpath_link.py #abspath(): /path/to/test_realpath_link.py #realpath(): /path/to/test_realpath.py (symlink resolved)Directory & File Managament Using Modules os and shutilThe modules os and shutil provide interface to the Operating System and System Shell. However,
Directory Management
File Management
For examples [TODO], >>> import os >>> dir(os) ...... >>> help(os) ...... >>> help(os.getcwd) ...... >>> os.getcwd() ... current working directory ... >>> os.listdir() ... contents of current directory ... >>> os.chdir('test-python') >>> exec(open('hello.py').read()) >>> os.system('ls -l') >>> os.name 'posix' >>> os.makedir('sub_dir') >>> os.makedirs('/path/to/sub_dir') >>> os.remove('filename') >>> os.rename('oldFile', 'newFile')List a Directory
For examples, >>> import os >>> help(os.listdir) ...... >>> os.listdir() [..., ..., ...] >>> for f in sorted(os.listdir('/usr')): print(f) ...... >>> for f in sorted(os.listdir('/usr')): print(os.path.abspath(f)) ......List a Directory Recursively via os.walk()
For example,
List a Directory Recursively via Module glob (Python 3.5)[TODO] Intro
Copying File
Shell Command [TODO]
Environment Variables [TODO]
fileinput ModuleThe fileinput module provides support for processing lines of input from one or more files given in the command-line arguments (sys.argv). For example, create the following script called "test_fileinput.py": import fileinput def main(): lineNumber = 0 for line in fileinput.input(): line = line.rstrip() lineNumber += 1 print('{}: {}'.format(lineNumber, line)) if __name__ == '__main__': main()Text ProcessingFor simple text string operations such as string search and replacement, you can use the built-in string functions (e.g., str.replace(old, new)). For complex pattern search and replacement, you need to master regular expression (regex). String OperationsThe built-in class str provides many member functions for text string manipulation. Suppose that s is a str object. Strip whitespaces (blank, tab and newline)
s.rstrip() is the most commonly-used to strip the trailing spaces/newline. The leading whitespaces are usually significant. Uppercase/Lowercase
Find
For examples, >>> s = '/test/in.txt' >>> s.find('in') 6 >>> s[0 : s.find('in')] + 'out.txt' '/test/out.txt'Find and Replace
str.replace() is ideal for simple text string replacement, without the need for pattern matching. For examples, >>> s = 'hello hello hello, world' >>> help(s.replace) >>> s.replace('ll', '**') 'he**o he**o he**o, world' >>> s.replace('ll', '**', 2) 'he**o he**o hello, world'Split into Tokens and Join
For examples, >>> 'apple, orange, pear'.split() ['apple,', 'orange,', 'pear'] >>> 'apple, orange, pear'.split(', ') ['apple', 'orange', 'pear'] >>> 'apple, orange, pear'.split(', ', maxsplit=1) ['apple', 'orange, pear'] >>> ', '.join(['apple', 'orange, pear']) 'apple, orange, pear'Regular Expression in Module reReferences:
I assume that you are familiar with regex, otherwise, you could read:
The re module provides support for regular expressions (regex). >>> import re >>> dir(re) ...... >>> help(re) ......Backslash (\), Python Raw String r'...' vs Regular StringRegex's syntax uses backslash (\):
On the other hand, Python' regular strings also use backslash for escape sequences, e.g., \n for newline, \t for tab. Again, you need to write \\ for \. To write the regex pattern \d+ (one or more digits) in a Python regular string, you need to write '\\d+'. This is cumbersome and error-prone. Python's solution is using raw string with a prefix r in the form of r'...'. It ignores interpretation of the Python's string escape sequence. For example, r'\n' is '\'+'n' (two characters) instead of newline (one character). Using raw string, you can write r'\d+' for regex pattern \d+ (instead of regular string '\\d+'). Furthermore, Python denotes parenthesized back references (or capturing groups) as \1, \2, \3, ..., which can be written as raw strings r'\1', r'\2' instead of regular string '\\1' and '\\2'. Take note that some languages uses $1, $2, ... for the back references. I suggest that you use raw strings for regex pattern strings and replacement strings. Compiling (Creating) a Regex Pattern Object
For examples, >>> import re >>> p1 = re.compile(r'[1-9][0-9]*|0') >>> type(p1) <class '_sre.SRE_Pattern'> >>> p2 = re.compile(r'^\w{6,10}$') >>> p3 = re.compile(r'xy*', re.IGNORECASE)Invoking Regex OperationsYou can invoke most of the regex functions in two ways:
Find using finaAll()
For examples, >>> p1 = re.compile(r'[1-9][0-9]*|0') >>> p1.findall('123 456') ['123', '456'] >>> p1.findall('abc') [] >>> p1.findall('abc123xyz456_7_00') ['123', '456', '7', '0', '0'] >>> re.findall(r'[1-9][0-9]*|0', '123 456') ['123', '456'] >>> re.findall(r'[1-9][0-9]*|0', 'abc') [] >>> re.findall(r'[1-9][0-9]*|0', 'abc123xyz456_7_00') ['123', '456', '7', '0', '0']Replace using sub() and subn()
For examples, >>> p1 = re.compile(r'[1-9][0-9]*|0') >>> p1.sub(r'**', 'abc123xyz456_7_00') 'abc**xyz**_**_****' >>> p1.subn(r'**', 'abc123xyz456_7_00') ('abc**xyz**_**_****', 5) >>> p1.sub(r'**', 'abc123xyz456_7_00', count=3) 'abc**xyz**_**_00' >>> re.sub(r'[1-9][0-9]*|0', r'**', 'abc123xyz456_7_00') 'abc**xyz**_**_****' >>> re.sub(p1, r'**', 'abc123xyz456_7_00') 'abc**xyz**_**_****' >>> re.subn(p1, r'**', 'abc123xyz456_7_00', count=3) ('abc**xyz**_**_00', 3) >>> re.subn(p1, r'**', 'abc123xyz456_7_00', count=10) ('abc**xyz**_**_****', 5)Notes: For simple string replacement, use str.replace(old, new, [max=-1]) -> str which is more efficient. See above section. Using Parenthesized Back-References \1, \2, ... in Substitution and PatternIn Python, regex parenthesized back-references (capturing groups) are denoted as \1, \2, .... You could use raw string (e.g., r'\1') to avoid escaping backslash in regular string (e.g., '\\1'). For examples, >>> re.sub(r'(\w+) (\w+)', r'\2 \1', 'aaa bbb ccc') 'bbb aaa ccc' >>> re.sub(r'(\w+) (\w+)', r'\2 \1', 'aaa bbb ccc ddd') 'bbb aaa ddd ccc' >>> re.subn(r'(\w+) (\w+)', r'\2 \1', 'aaa bbb ccc ddd eee') ('bbb aaa ddd ccc eee', 2) >>> re.subn(r'(\w+) \1', r'\1', 'hello hello world again again') ('hello world again', 2)Find using search() and Match Object
The search() returns a special Match object encapsulating the first match (or None if there is no matches). You can then use the following methods to process the resultant Match object:
For example, >>> p1 = re.compile(r'[1-9][0-9]*|0') >>> inStr = 'abc123xyz456_7_00' >>> m = p1.search(inStr) >>> m <_sre.SRE_Match object; span=(3, 6), match='123'> >>> m.group() '123' >>> m.span() (3, 6) >>> m.start() 3 >>> m.end() 6 >>> m = p1.search(inStr, m.end()) >>> m <_sre.SRE_Match object; span=(9, 12), match='456'> >>> m = p1.search(inStr) >>> while m: print(m, m.group()) m = p1.search(inStr, m.end()) <_sre.SRE_Match object; span=(3, 6), match='123'> 123 <_sre.SRE_Match object; span=(9, 12), match='456'> 456 <_sre.SRE_Match object; span=(13, 14), match='7'> 7 <_sre.SRE_Match object; span=(15, 16), match='0'> 0 <_sre.SRE_Match object; span=(16, 17), match='0'> 0To retrieve the back-references (or capturing groups) inside the Match object:
Find using match() and fullmatch()
The search() matches anywhere in the given inStr[begin:end]. On the other hand, the match() matches from the start of inStr[begin:end] (similar to regex pattern ^...); while the fullmatch() matches the entire inStr[begin:end] (similar to regex pattern ^...$). For example, >>> p1 = re.compile(r'[1-9][0-9]*|0') >>> m = p1.match('aaa123zzz456') >>> m >>> m = p1.match('123zzz456') >>> m <_sre.SRE_Match object; span=(0, 3), match='123'> >>> m = p1.fullmatch('123456') >>> m <_sre.SRE_Match object; span=(0, 6), match='123456'> >>> m = p1.fullmatch('123456abc') >>> mFind using finditer()
The finditer() is similar to findall(). The findall() returns a list of matched substrings. The finditer() returns an iterator to Match objects. For examples, >>> p1 = re.compile(r'[1-9][0-9]*|0') >>> inStr = 'abc123xyz456_7_00' >>> p1.findall(inStr) ['123', '456', '7', '0', '0'] >>> for s in p1.findall(inStr): print(s, end=' ') 123 456 7 0 0 >>> for m in p1.finditer(inStr): print(m) <_sre.SRE_Match object; span=(3, 6), match='123'> <_sre.SRE_Match object; span=(9, 12), match='456'> <_sre.SRE_Match object; span=(13, 14), match='7'> <_sre.SRE_Match object; span=(15, 16), match='0'> <_sre.SRE_Match object; span=(16, 17), match='0'> >>> for m in p1.finditer(inStr): print(m.group(), end=' ') 123 456 7 0 0Spliting String into Tokens
The split() splits the given inStr into a list, using the regex's Pattern as delimiter (separator). For example, >>> p1 = re.compile(r'[1-9][0-9]*|0') >>> p1.split('aaa123bbb456ccc') ['aaa', 'bbb', 'ccc'] >>> re.split(r'[1-9][0-9]*|0', 'aaa123bbb456ccc') ['aaa', 'bbb', 'ccc']Notes: For simple delimiter, use str.split([sep]), which is more efficient. See above section. Web ScrapingReferences:
Web Scraping (or web harvesting or web data extraction) refers to reading the raw HTML page to retrieve desired data. Needless to say, you need to master HTML, CSS and JavaScript. Python supports web scraping via packages requests and BeautifulSoup (bs4). Install PackagesYou could install the relevant packages using pip as follows: $ pip install requests $ pip install bs4Step 0: Inspect the Target Webpage
Step 1: Send a HTTP GET request to the target URL to retrieve the raw HTML page using module requests>>> import requests >>> url = "http://your_target_webpage" >>> response = requests.get(url) >>> type(response) <class 'requests.models.Response'> >>> response <Response [200]> >>> help(response) ...... >>> print(response.text) ...... >>> print(response.content) ......Step 2: Parse the HTML Text into a Tree-Structure using BeautifulSoup and Search the Desired Data>>> from bs4 import BeautifulSoup >>> soup = BeautifulSoup(response.text, "html.parser") >>> type(soup) <class 'bs4.BeautifulSoup'> >>> help(soup) ...... >>> img_tag = soup.find('img') >>> img_tag <img ...... /> >>> img_tags = soup.findAll('img') >>> img_tags [<img ... />, <img ... />, <img ... />, ...] >>> soup.find('div', attrs = {'id':'test'}) >>> soup.findAll('div', attrs = {'class':'error'})You could write out the selected data to a file: with open(filename, 'w') as fp: for row in rows: fp.wrire(row + '\n')You could also use csv module to write out rows of data with a header: >>> import csv >>> with open(filename, 'w') as fp: writer = csv.DictWriter(fp, ['colHeader1', 'colHeader2', 'colHeader3']) writer.writeheader() for row in rows: writer.writerow(row)Step 3: Download Selected Document Using urllib.requestYou may want to download documents such as text files or images. >>> import urllib.request >>> downloadUrl = '.....' >>> file = '......' >>> urllib.request.urlretrieve(download_url, file)Step 4: DelayTo avoid spamming a website with download requests (and flagged as a spammer), you need to pause your code for a while. >>> import time >>> time.sleep(1)REFERENCES & RESOURCES Which of the following commands can be used to read the entire contents of a file as string using the File object?fgets()– This function is used to read strings from files.
Which command is used to read the entire content of the file as a string?readlines() -> [str] : Read all lines into a list of strings. fileObj. read() -> str : Read the entire file into a string.
Which of the following commands can be used to read the entire contents of a file as a string using the File object file1 >? *?The correct option for the command to read the entire contents of a file as string using the object <file> is found to be option (d) file. readlines() .
Which of the following function can be used to read in the whole content of a file in a single string?The read() method reads all the data into a single string.
|