-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnotes.txt
1329 lines (785 loc) · 42.8 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
-Chapter I - Python Basics-
Math operators in Python (from higest to lowest precedence): ** % // / * - +
Expressions are just values combined with operators, and they always evaluate down to a single value.
Data type: category for values, and every value belongs to exactly one data type. Most common: integers (int) like -2, 0, 5; floating-point numbers (floats) like -1.25, -1.0, 0.0, 1.25; strings (str) like 'a', 'Hello!', '11 cats'.
Overwriting the variable: when a variable is assigned a new value, the old value is forgotten.
yourFirstProgram.py
Argument: a value that is passed to a function call.
print()
input()
len()
str(), int(), float()
Practice Questions
1:
* operator
'hello' value
-88.8 value
- operator
/ operator
+ operator
5 value
2:
spam variable
'spam' string
3:
integers, floats and strings
4:
values and operators
they evaluate to a single value
5:
the difference is on how the computer interprets the data - when the program is executed the expression is solved by the computer and it evaluates to a single value, while the statement feeds the computer with some kind of infomation and is not evaluated to a single value.
6:
20
7:
'spamspamspam'
'spamspamspam'
8:
a variable name can't begin with a number
9:
int(), float(), str()
10:
an integer can't be concatenated to a string nor a string can be added to a number
correction: 'I have eaten' + str(99) + ' burritos.'
Extra credit:
len():
Return the length (the number of items) of an object. The argument may be a sequence (such as a string, bytes, tuple, list, or range) or a collection (such as a dictionary, set, or frozen set).
round(number[, ndigits]):
Return number rounded to ndigits precision after the decimal point. If ndigits is omitted or is None, it returns the nearest integer to its input.
Floating Point Arithmetic: Issues and Limitations.
(https://docs.python.org/3/tutorial/floatingpoint.html#tut-fp-issues)
"On most machines today, floats are approximated using a binary fraction with the numerator using the first 53 bits starting with the most significant bit and with the denominator as a power of two. In the case of 1/10, the binary fraction is 3602879701896397 / 2 ** 55 which is close to but not exactly equal to the true value of 1/10."
Representation error -> explains the “0.1” example in detail.
-Chapter II - Flow Control-
Boolean values: True / False
Comparison operators: == != < > <= =>
Boolean operators: and / or / not
After any math and comparison operators evaluate, Python evaluates the not operators first, then the end operators, and the the or operators.
Elements of Flow Control
Flow Control Statements
if statements
else statements
elif statements
while loop statements
break statements
continue statements
for loops and the range() function
An Equivalent while loop
The Starting, Stopping, and Stepping Arguments to range()
Importing modules
from import Statements
Ending a Program Early with sys.exit()
Practice Questions
1:
True and False
2:
and or not
3:
True and True = True
True and False = False
False and True = False
False and False = False
True or True = True
True or False = True
False or True = True
False or False = False
not True = False
not False = True
4:
False
False
True
False
False
True
5:
< <= > >= == !=
6:
== equal (comparison operator)
= assignment operator (stores a value to a variable)
7:
A piece of a block of code that is evaluated to True or False and depending of the result of that evaluation it executes the rest of the block of code or just skip it.
Answer from the book: A condition is an expression used in a flow control statement that evaluates to a Boolean value.
8:
corrected - The three blocks are everything inside the if statement and the lines
print('bacon') and print('ham').
print('eggs')
if spam > 5:
print('bacon')
else:
print('ham')
print('spam')
9:
spam = input()
if spam == 1:
print('Hello')
elif spam == 2:
print('Howdy')
else:
print('Greetings!')
10)
ctrl + c
11)
a break statement makes the code leave the loop immediately while a continue statement restarts the loop right away
12)
there's no difference among those options. in the first case, only the final point of the range is informed which makes it start from 0. in the second range, the 0 is informed as the starting point, just as standard, and that's what happens in the third range as well, which brings the 1 as an argument to inform the steps from a number to another in the range. 1 is also standard (meaning that there's no need to inform it as it's predetermined).
13)
# for loop
for i in range(11):
print(i)
# while loop
i = 0
while i <= 10:
print(i)
i += 1
14)
spam.bacon().
Extra credit:
round(number[, ndigits]):
Return number rounded to ndigits precision after the decimal point. If ndigits is omitted or is None, it returns the nearest integer to its input.
abs():
Return the absolute value of a number. The argument may be an integer or a floating point number. If the argument is a complex number, its magnitude is returned.
-Chapter III - Functions -
def Statements with Parameters
parameter is a variable that an argument is stored in when a function is called
Return Values and return Statements
return value of the function is the value that a function call evaluates to
with the return statement one can specify what the return value is
The None Value
the value called None (NoneType data type) represents the absence of value
behind the scenes, Python adds return None to the end of any function definition with no return statement
if you use return statement without a value, then None is returned
Keyword Arguments and print()
print() keyword arguments: end='' and sep=''
Local and Global Scope
Local Variables Cannot Be Used in the Global Scope
Local Scopes Cannot Use Variables in Other Local Scopes
Global Variables Can Be Read from a Local Scope
Local and Global Variables with the Same Name
should be avoided
The global Statement
it modifies a global variable from within a function
in a function, a variable will either always be global or always be local
Functions as 'Black Boxes'
you treat functions as black boxes when you're not interested in its code, but only in its inputs (the parameters) and output value. that is possible only because there's no need to worry about variables in the local scope interacting with the global and other local scopes
Exception Handling
try and except clauses
once an error is detected, the except clause is executed and the code goes forward
A Short Program: Guess the Number
Practice Questions
1:
Functions are a good way to organize a code. They operate on a local scope that don't directly interact with other local scopes from different functions but there's the option to access (or even modify with the keyword global) information of the global scope.
Answer from book: Functions reduce the need for duplicate code. This makes programs
shorter, easier to read, and easier to update.
2:
The code in a function is executed when the function is called.
3:
The def statement.
def nameOfTheFuction(parameters):
4:
A function consists of the def statement and the code in its def clause. It describes instructions to the computer for the event of calling the function using arguments, when the function has parameters, or not.
A function call is the statement that consists of the name of the function followed by (), with arguments between the parenthesis, if the function predicts the use of parameters. That makes the code defined previously in the function be executed and evaluted to a return value.
5:
Only one global scope. The amount of local scopes depends on the number of functions in the program, since in each function there's a different local scope.
6:
The local scope is destroyed and all the assignments are undone, therefore the variables will be forgotten.
7:
A return value is the value that a function call evaluates to. It can be used as part of an expression
8:
The return value is None.
9:
You need to use the global statement.
10:
NoneType data type.
11:
It imports the module areallyourpetsnamederic, so you'll be able to use the functions from it.
12:
spam.bacon()
13:
You can use try and except clauses.
Answer from the book: Place the line of code that might cause an error in a try clause.
14:
In the try clause go function calls and in the except clause goes a code that will be executed when an error is encountered.
Answer from the book: The code that could potentially cause an error goes in the try clause.
The code that executes if an error happens goes in the except clause.
Practice Projects
(in the folder chapter3)
- Chapter IV - Lists -
The List Data Type
list value refers to the list itself (can be stored in a variable os passed to a function like any other value), no the values inside the list value
values inside the list are also called items, which are separated by commas
[] - empty list (just like '' is the empty string value)
Getting Individual Values in a List with Indexes
lists can also contain other list values, that can be accessed using multiple indexes (list[1][2] - accesses item list at index 1 and its item at index 2)
Negative Indexes
-1 is the last index in a list, -2 is the second-to-last, and so on
Getting Sublists with Slices [1:3]
a slice goes up to, but will not include, the value at the second index
Getting a List's Length with len()
Changing Values in a List with Indexes
List Concatenation and List Replication
Removing Values from Lists with del Statements
example: del list[1]
it also can be used on a simple variable to delete it, but it's mostly used to delete values from lists
Working with Lists
string concatenation - list + [element]
Using for Loops with Lists
common technique - for item in range(len(list)) - iterates over all the items
The in and not in Operators
they're used in expressions to connect two values: one to look for in a list and the list where it may be found - the expressions will evaluate to a Boolean value
The Multiple Assignment Trick
example:
cat = ['fat', 'black', 'loud']
size, color, disposition = cat
the number of variables and the length of the list must be exactly equal
Augmented Assignment Operators
+= -= *= /= %=
+= can be used for string and list concatenation, and the *= operator can do string and list replication
Methods
the same as a function but it's called on a value: listvalue.method(argument)
each data type has its own set of methods
Finding a Value in a List with the index() Method
listvalue.index('string to be found')
when there are dupliates of the value in the list, the index of its first appearance is returned
Adding Values to Lists with the append() and insert() Methods
append(): adds the argument to the of the list
insert(<index for the new value>, <new value to be inserted>): insert a value at any index in the list
Removing Values from Lists with remove()
listvalue.remove(<value to be removed>)
if the value appears multiple times in the list, only the first instance of if will be removed
the del statement (del list[index]) is good to use when you know the index of the value you want to remove; the remove() is good when you know the value you want to remove from the list
Sorting the Values in a List with the sort() Method
lists of number values or strings can be sorted, but you can't sort lists that have both number values and string values in them
“ASCIIbetical order” - uppercase letters come before lowercase letters
sorting in reverse order: list.sort(reverse=True)
sorting in alphabetical order: list.sort(key=str.lower)
Example Program: Magic 8 Ball with a List
Exceptions to Indentation Rules in Python
the multiline [] and the \ line continuation
List-like Types: Strings and Tuples
many of the things you can do with lists can also be done with strings: indexing; slicing; and using them with for loops, with len(), and with the in and not in operations
Mutable and Immutable Data Types
lists are mutable, strings are immutable
the only way to mutate a string is slicing it and creating a new string
The Tuple Data Type
almost identical to the list data type, except they're typed with parentheses () and they're immutable
if there's only one value in the tuple, placing a trailing comma after the value inside the parethenses is necessary for Python to interpret it as a tuple, otherwise it'd just interpret it as a value inside parentheses
tuples are used when you don't intend for some sequence of values to change
as they're immutable, code using tuples is slightly faster than code using lists
Converting Types with the list() and tuple() Functions
converting a tuple to a list is handy if you need a mutable version of a tuple value
References
when you assign a list to a variable, you are actually assigning a list reference to the variable - that happens to mutable data types like dictionary and list
a reference is a value that points to some bit of data
Passing References
when you alter the list referenced in a variable, you're altering the list in place - even if the change is made in a function call, the list is changed globally
The copy Module's copy() and deepcopy() Functions
list2 = copy.copy(<list1>)
if the list you need to copy contains lists:
list2 = copy.deepcopy(<list1>)
Practice Questions
1:
it's the empty list value.
2:
spam[2] = 'hello'
3:
'd'
4:
'd'
5:
['a', 'b']
6:
1
7:
[3.14, 'cat', 11, 'cat', True, 99]
8:
[3.14, 11, 'cat', True]
9:
the same as string concatenation and replication operators (+ and *)
10:
append() passes an argument and adds it to the end of the list it was called on
insert(<index for the new value>, <new value to be inserted>): inserts a value at any index in the list it was called on
11:
you can use a del statement on list[index]; or
you can use the remove() method, passing the value of the item you want to be removed
12:
they both are ordered sequences (lists contain items and strings contain single characters) and can be operated using their elements indexes and slices, they can be concatenated, replicated, used in for loops and in the len() function
13:
tuples are immutable and use ()
lists are mutable and use []
14:
(42,)
15:
to get the tuple form of a list value you need to use the tuple() function and the list value as the argument. the other way around: list(), using the tuple value as the argument.
16:
they contain references to list values
17:
both functions can make copies of lists that are passed as arguments - the copy.deepcopy() though is used to make copies of lists that are contained in other lists
Practice Projects
- Chapter V - Dictionaries and Structuring Data -
The Dictionary Data Type
indexes for dictionaries are called keys, and a key with its associated value is called a key-value pair
Dictionaries vs. Lists
items in dictionaries are unordered - so they can't be sliced like lists
The keys(), values(), and items() Methods
these methods will return list-like values of the dictionary (keys, values and both)
the values returned are not true lists as they can't be modified and do not have an append() method
but these data types (dict_keys, dict_values, and dict_items) can be used in for loops
if you want a true list from one of these methods: list()
when using items() in a for loop, you can use the multiple assignment trick
Checking Whether a Key or Value Exists in a Dictionary
'in' and 'not in'
in <dictname> == in dictname.keys() - the dictionary name only calls the keys values
if you want to check the values of values or items, gotta use the methods
The get() Method
two args: the key of the value to retrieve and a fallback value to return if that key does not exist
The setdefault() Method
sets a value in a dictionary for a certain key only if that key does not already have a value
two args: a key to check for and the value to set at the key if the does not exist. If it exists, the method will return the key's value
Pretty Printing
import pprint
pprint.pprint(someDictionaryValue) -> pretty prints the dictionary
pprint.pformat(someDictionaryValue) -> gets the prettified text as a string value instead of displaying it on the screen
Using Data Structures to Model Real-World Things
A Tic-Tac-Toe Board
Nested Dictionaries and Lists
Summary
Practice Questions
1:
{}
2:
{'foo': 42}
3:
Lists are ordered sequences and its items are indexed to an integer, while dictionaries are unordered and its items consist of a key-value pair.
4:
A KeyError is displayed, since there's no 'foo' key in the spam dictionary.
5:
There's no practical difference as 'cat' in spam will look for keys named 'cat' in spam dictionary.
6:
The expression 'cat' in spam checks whether there is a key named 'cat' in spam dictionary, while 'cat' in spam.values() checks if there's a value 'cat' for one of the keys.
7:
spam.setdefault('color', 'black')
8:
pprint.pprint()
Practice Projects
fantasyGameInventory.py
- Chapter 6 - Manipulating Strings -
Working with Strings
String Literals
Double Quotes
Escape Characters
\ followed by the character
Raw Strings
r before the beginning quotation
it completely ignores all escape characters and prints any backslash that appears in the string
they are useful if you're typing string values that contain many backslashes, such as the ones used for regular expressions
Multiline Strings with Triple Quotes
any quotes, tabs, or newlines in betwwent he 'triple quotes' are considered part of the string
Multiline Comments
a multiline string is often used for comments that span multiple lines
the hash character (#) marks the beginning of a comment for the rest of the line only
Indexing and Slicing Strings
same way lists do
The in and not in Operators with Strings
same way as lists
Useful String Methods
The upper(), lower(), isupper(), and islower() String Methods
isupper() and islower() will return a Boolean True value if the string has at least one letter and all the letters are uppercase or lowercase
The isX String Methods
isalpha() - returns True if the string consists only of letters and is not blank
isalnum() - returns True if the string consists only of letters and/or numbers and is not blank
isdecimal() - returns True if the string consists only of numeric characters and is not blank
isspace() - returns True if the string consists only of spaces, tabs, and newlines and is not blank
istitle() - returns True is the string consists only of words that begin with an uppercase letter followed by only lowercase letters
these methods are helpful when you need to validate user input
The startswith() and endswith() String Methods
they return Boolean values
The join() and split() String Methods
join() - joins a list of strings together into a single string value. It's called on a string, gets passed a list of strings, and returns a string
split() - does the opposite of join(). The strings get splitted by default wherever whitespace characters such as the space, tab, or newline characters are found. You can pass a delimiter string to the method to specify a different string to split upon.
Justifying Text with rjust(), ljust(), and center()
rjust() and ljust() - returns a padded version of the string they are called on, with spaces inserted to justify the text. First arg: an integer length for the justified string. Second arg (optional): specifies a fill character (other than the default space character).
center() - works as the previous methods but centers the text rather than justifying it to the left or right
these methods are specially useful when you need to print tabular data that has the correct spacing
Removing Whitespace with strip(), rstrip(), and lstrip()
strip() - method returns a new string without any whitespace characters at the beginning or end
lstrip() and rstrip() - will remove whitespace characters from the left and right ends, respectively
optional arg: a string could specify which characters on the ends should be stripped (the order of the characters in the string is irrelevant)
Copying and Pasting Strings with the pyperclip Module
copy() and paste() functions that can send text to and receive text from computer's clipboard
Running Python Scripts Outside of Idle (Appendix B)
Shebang Line
first line of the program, tells your computer that you want Python to execute this program:
windows - #! python3
linux - #! /usr/bin/python3
the line is needed to run Python scripts from the command line instead of the IDLE
Running Python Programs on Linux
1) save your. py file to your home folder; 2) change the .py file's permissions to make it executable by running chmod +x <file>.py; 3) then it will be possible to run the program from the Terminal entering ./<file>.py
Running Python Programs with Assertions Disabled
you can disable the assert statements in your programs for a slight performance improvement:
when running Python from the terminal, include the -0 switch after python or python3 and before the name of the .py file
assertion checks will be skipped
Project: Password Locker
Step 1: Program Design and Data Structures
Step 2: Handle Command Line Arguments
Step 3: Copy the Right Password
Project: Adding Bullets to Wiki Markup
Step 1: Copy and Paste from the Clipboard
Step 2: Separate the Lines of Text and Add the Star
Step 3: Join the Modified Lines
Even if you don't need to automate this specific task, you might want to automate some other kind of text manipulation, such as removing trailing spaces from the end of lines or converting text to uppercase or lowercase. Whatever your needs, you can use the clipboard for input and output.
Practice Questions
1:
Escape characters represent characters in string values that would otherwise be difficult or impossible to type into code.
2:
\n - newline | \t - tab
3:
\\
4:
as this specific string is marked by double quotes (""), python inteprets a single quote (') in it as its text value, so it doesn't need to be escaped.
5:
Multiline strings (triple quotes ''' at the beginning and the end of the string) allow you to use newlines in strings without the \n escape character.
6:
-'e'
-'Hello'
-'Hello'
-'lo world!'
7:
-'HELLO'
-True
-'hello'
8:
-['Remember,', 'remember,', 'the', 'fifth', 'of', 'November.']
-'There-can-be-only-one.'
9:
rjust(), ljust(), center()
10:
lstrip(), rstrip(), strip()
Practice Project
printTable.py
PART II - Automating Tasks
- Chapter 7 -
Pattern Matching with Regular Expressions
Finding Patterns of Text Without Regular Expressions
Finding Patterns of Text with Regular Expressions
regexes - regular expressions for short
\d stands for a digit character (single numeral 0 to 9)
\d{3} - matches the pattern 3 times
Creating Regex Objects
re module - contains all the regex functions in Python
Passing Raw Strings to re.compile()
a raw string does not escape characters
since regular expressions frequently use backslashes in them, it is convenient to pass raw strings to the re.compile() instead of typing extra backslashes
Matching Regex Objects
search() - method that will return None if the regex pattern is not found in the string. If the pattern is found, the method returns a Match object
Match objects have a group() method that will return the actual matched text from the searched string
Review of Regular Expression Matching
1) import the regex module with import re
2) create a regex object with the re.compile() function (remember to use a raw string)
3) pass the string you to search into the Regex object's search() method. This returns a Match object
4) call the Match object's group() method to return a string of the actual matched text
More Pattern Matching with Regular Expressions
Grouping with Parentheses
adding parentheses will create groups in the regex, by passing an integer to the group() match object method, you can grab different parts of the matched text. Passing 0 or nothing to the group() method will return the entire matched text
parentheses have a special meaning in regular expressions. To match a parenthesis in the text you need to escape it with a backslash \ (it escapes even in a raw string)
Matching Multiple Groups with the Pipe |
the | character is called a pipe
it can be used anywhere you want to match one of many expressions
you can find all matching occurrences with the findall() method
if you need to match an actual pipe character, escape it with a backslash \
Optional Matching with the Question Mark ?
match zero or one of the () group preceding the question mark ?
if you need to match an actual question mark character, escape it with \?
Matching Zero or More with the Star *
the group that precedes the star can occur any number of times in the text
if you need to match an actual star character, prefix the star in the regular expression with a backslash
Matching One or More with the Plus +
the group preceding a plus must appear at least once
if you need to match, an actual plus sign character, prefix it with a \
Matching Specific Repetitions with Curly Brackets {}
(Ha){3} - will match 'HaHaHa'
(Ha){3,5} - will match 'HaHaHa', 'HaHaHaHa', and 'HaHaHaHaHa'
(Ha){,5} - will match zero to five instances
(Ha){3,} - will match three or more instances
Greedy and Nongreedy Matching
in ambiguous situations Python will match the longest string possible - greedy by default.
the nongreedy version of the curly brackets, which matches the shortest string possible, has the closing curly bracket followed by a question mark
the question mark has two meanings in regular expressions: declaring a nongreedy match or flagging an optional group - these meanings are entirely unrelated
The findall() Method
will return the strings of every match in the searched string - in a list of strings, as long as there are no groups in the regular expression
if there are groups in the regular expression, then findall() will return a list of tuples, and the items of each tuples will represent a group of the matched object
Summarize findall() method:
1) when called on a regex with no groups - returns a list of string matches
2) when called on a regex that has groups - returns a list of tuples of strings (one string for each group)
Character Classes
\d - any numeric digit from 0 to 9
\D - any character that is not a numeric digit from 0 to 9
\w - any letter, numeric digit, or the underscore character ('word' characters)
\W - any character that is not a letter, numeric digit, or the underscore character
\s - any space, tab ow newline character ('space' characters)
\S - any character that is not a space, tab, or a newline
[0-5] - numbers from 0 to 5; instead of using (0|1|2|3|4|5)
Making Your Own Character Classes
you can define your own character class using square brackets
inside the square brackets the normal regular expression symbols are not interpreted as such - there's no need to escape characters with a \
negative character class - placing a caret character (^) just after the character class's opening bracket
The Caret and Dollar Sign Characters
^ - the caret symbol can be used at the start of a regex to indicate that a match must occur at the beginning of the searched text
$ - the dollar sign can be put at the end of the regex to indicate the string must end with this pattern
you can use ^ and $ together to indicate that the entire string must match the regex - 'carrots cost dollars'
The Wildcard Character .
matches any character except for a newline
to match an actual dot, escape it with a \
Matching Everything with Dot-Star .*
dot character means 'any single character except the newline' and the star character means 'zero or more of the preceding character'
the dot-star uses greedy mode
to use it in a nongreedy fashion: .*?
Matching Newlines with the Dot Character
by passing re.DOTALL as the second argument to re.compile(), you can make the dot character match all characters, including the newline character
Review of Regex Symbols
? - matches zero or one of the preceding group
* - matches zero or more of the preceding group
+ - matches one or more of the preceding group
{n} - matches exactly n of the preceding group
{n,} - matches n or more of the preceding group
{,m} - matches 0 to m of the preceding group
{n,m} - matches at least n and at most m of the preceding group
{n,m}? or *? or +? performs a nongreedy match of the preceding group
^spam - the string must begin with spam
spam$ - the string must end with spam
. - matches any character, except newline characters
\d, \w, and \s - match a digit, word, or space character, respectively
\D, \W, and \S - match anything except a digit, word, or space character, respectively
[abc] - matches any character between the brackets (such as a, b, or c)
[^abc] - matches any character that isn't between the brackets
Case-Insensitive Matching
to make your regex case-insensitive, you can pass re.IGNORECASE or re.I as a second argument to re.compile()
Substituting Strings with sub() Method
first arg: a string to replace any matches
second arg: string for the regular expression
the method returns a string with the substitutions applied
sometimes you may need to use the matched text itself as part of the substitution - in the first argument to sub(), you can type \1, \2, \3, and so on, to mean 'enter the text of group 1, 2, 3, and so on, in the substitution'
Managing Complex Regexes
you can spread the regular expression over multiple lines with comments like this:
phoneRegex = re.compile(r'''(
(\d{3}|\(\d{3}\))? # area code
(\s|-|\.)? # separator (optional)
\d{3} # first 3 digits
(\s|-|\.) # separator
\d{4} # last 4 digits
(\s*(ext|x|ext.)\s*\d{2,5})? # extension
)''', re.VERBOSE)
Combining re.IGNORECASE, re.DOTALL, and re.VERBOSE
re.compile() takes only a single value as its second argument, but you can get around this limitation using the pipe character |, which in thix context is known as the bitwise or operator
someRegexValue = re.compile('foo', re.IGNORECASE | re.DOTALL | re.VERBOSE)
Project: Phone Number and Email Address Extractor
Practice Questions
1:
re.compile() - the re module has to be imported first in order to make the function works in Python
2:
since regexes often use backslashes in them, it is convenient to pass raw strings to the re.compile() function so the backslashes won't need to be escaped.
3:
it returns the first matched object in the text passed to it
4:
by using the function group() or groups()
5:
group 0 cover the entire match, group 1 covers the first group inside the parentheses and group 2 covers the second one
6:
they must be escaped with a backslash: \. and \( \)
7:
it will return a list of strings if the regex contains at most one group. and each element of the list will represent a match. otherwise, the matches will be returned separately in different tuples, and in each tuple its elements (strings) will represent a regex group in the matched object
8:
the pipe character | allows you to match one of many expressions. for instance: (ext|x|ext.) -> this group will match ext OR x OR ext. Either or between two groups.
9:
the question mark ? flags a preceding group as one that can be matched zero or more times (it's optional then). it also can precede {}, * and + to declare them as a nongreedy match.
10:
+ means the preceding group must be matched at least one time but it can be more
* means the preceding group may not be matched or can be matched any times
11:
{3} flags the preceding group as to be matched 3 times
{3,5} flags the preceding group as to be matched 3, 4 or 5 times