LectureNotesOnPython.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546

=======================
Lecture Notes on Python
=======================

:Authors:
   蓝珲 (lanhui AT zjnu.edu.cn)

:Version: 0.1.2 of 2019-04-14

	  
.. contents:: 内容目录


前言
--------------------------------------------------------------------

非学究写书，无空洞行文。

Python语法简洁，库函数全面强大，编程速度快，运行速度也不慢。

大学里， 往往是专家教初学者。 专家也是从初学者过来的，只不过专家经常忘
记这一点。 要知道，初学者可能只写过不到10行的程序，而专家已经写了至少
10000行程序了。  两者的大脑构造不同。 学习是困难的， 教育或培训应把专
家的工作流程放慢100倍！  初学者要做的是尽量快的使自己的程序量到达1000，
这包括中间无数次除错， 每次除错都是一次微小的学习。  要想感到舒服， 只
有积累自己的经验， 无其它捷径。

面对新东西，初学者往往会问这些困惑 (ask the student's name)：

- 这个东西是什么意思？

- 怎么来的？ 

- 有什么用？

- 怎么用？

- 什么情况下用？

- 为什么程序中要加入这个东西？

动态的教育方式会更好。专家与初学者坐在电脑前， 逐步回答上述问题， 直到
初学者说“我明白了”。


Python的发音纠正
------------------------------

国人普遍把th发作s。 Not quite correct。

\ ˈpī-ˌthän , -thən\  pronounciation_

.. _pronounciation: https://cn.bing.com/search?q=define%20python&tf=U2VydmljZT1EaWN0aW9uYXJ5QW5zd2VyVjIgU2NlbmFyaW89RGVmaW5pdGlvblNjZW5hcmlvIFBvc2l0aW9uPU5PUCBSYW5raW5nRGF0YT1UcnVlIEZvcmNlUGxhY2U9RmFsc2UgUGFpcnM9RGljdGlvbmFyeVdvcmQ6cHl0aG9uO3NjbjpEZWZpbml0aW9uU2NlbmFyaW87cDpRQVM7IHw%3d&hs=hyRBF0mYq9hrfQUq66DIZnFVta1ZGRfBiBks25oUguk%3d


Python源流
------------------------------

Python之父Guido van Rossum，荷兰人，1956年生，1982年阿姆斯特丹大学获得
数学与计算机科学硕士学位。有过ABC语言的工作经验。1989年设计了Python语
言。

Python语法简洁，有大而全而有用的标准库。

自然（natural）语言。特点：歧义，重复。“The penny dropped。” “不要。”

正式（formal）语言。特点：只管字面意思。

计算机组成概要：CPU，总线，内存，硬盘。

Bit, byte, KB, MB, GB, TB换算。

变量的命名。如，层叠策略，用CDCL还是TiledStrategy?

教务管理系统，http://10.1.70.164/jwglxt?

习语言、易语言等目前非主流语言。

最简单的类定义:


    class A:
        pass


以上面的类为蓝本，创建一个实例：a = A()。 虽然这个a什么也做不了。

Python文件命令行执行。 python a.py。

函数头的三要素：def，函数名，参数列表:


    | def add_number(a, b):
    |     return a + b
    
    | def add_lst(a, b):
    |     if len(a) != len(b):
    |         return 'ERROR: a and b not in equal length.'
    |     n = len(a)
    |     result = []
    |     for i in range(n):
    |         result.append(a[i] + b[i])
    |     return result
    |
    | print( add_lst([1,2,3],[-1,-2,-3]) )


Python的关键词
--------------------------------


| def pass
| from import
| False True
| in
| None
| class 
| return
| while for
| continue break 
| and or not
| if else elif
| try except finally raise
| lambda nonlocal 
| del global with
| yield assert   
| as is


关键词被语言留用（reserved），无法作变量名。


值的类型
-------------------------

所有的值都是对象。a = 5, help(a)  a.bit_length()

数字。1， 1.，1.1， .1, 1e1, 1e-1, 1E1, 1E-1

字符串（string）。'hello', 100 * 'hello', 'hello' * 100, 'Weight is %4.2f kg' % (70.2)
       f = open('a.html')
       s = f.read()
       f.close()

列表（list）。['a', 'b', 'c', 'd']
     ['bob', 170, 'john', '180']
     [1, 2, 3, 4]
     range(10) 返回一个range对象。可以用list函数把这个对象变成列表。
     等价的是range(0, 10, 1)，从0开始，步进1，不包括10。
     A list of list
     A list of tuples
     A list of objects

元组（tuple），字典（dict）。


变量（Variable）
------------------------------------

是一个名字（name），是指向一个值（value）的名字。

值存放在内存（memory）中的某个地址。

尽量选有意义的简短的名字。比如，代表个数用n，代表索引用i，j，k。

关键词不能用作变量名。


值存放在内存某处。值会记录指向它的变量个数。

为节省空间，如果几个变量的值相同，那么这些变量有时会指向这个值（而不是为每个变量单独分配内存空间单独存放该值）。

这叫做interning技术。但并非总是如此。


| a = 10
| b = 10
| c = 10
| id(a), id(b), id(c)
| (8791229060416, 8791229060416, 8791229060416)


值10存在地址8791229060416，所有a，b，c三个变量都指向（point to）这个地址。


| x = 257
| y = 257
| id(x), id(y)
| (46487024, 46487952)


以上虽然变量x与y的值都是一样，可是这两个值存放在不同的内存地址。


| s1 = 'hello'
| s2 = 'hello'
| id(s1), id(s2)
| s1 == s2
| s1 is s2

| s1 = 'h' * 100
| s2 = 'h' * 100
| id(s1), id(s2)

| s3 = 'hello, world!'
| s4 = 'hello, world!'
| id(s3), id(s4)
| (46703536, 46705136)


| class A:
|    pass

| a = A()
| b = A()
| a
| <__main__.A object at 0x0000000002CD92E8>
| b
| <__main__.A object at 0x0000000002CD9240>


| x = [1,2,3]
| id(x)
| 46869512
| y = x
| id(y)
| 46869512
| x.append(4)
| x
| [1, 2, 3, 4]
| y
| [1, 2, 3, 4]

| x = []
| id(x)
| 46869640


| x = [1,2,3,4]
| y = [1,2,3,4]
| id(x)
| 46869768
| id(y)
| 46868808


一个没有名字与之对应的值将会被清出内存。

参考资料：

- http://foobarnbaz.com/2012/07/08/understanding-python-variables/
- https://stackoverflow.com/questions/19721002/is-a-variable-the-name-the-value-or-the-memory-location


可变（mutable）类型与不可变类型
----------------------------------------------------------

字符串是不可变的（immutable）类型，不能在原内存地址改变。

a = 'hello'  不可以原地修改a[0] = 'H'。需要修改a的值时，需要对a进行重新赋值a = 'Hello'。

列表是可变（mutable）类型，能在原内存地址改变。

a = [1, 2]   可以原地修改a[0] = 2

参考资料：

- https://stackoverflow.com/questions/8056130/immutable-vs-mutable-types


表达式（expression）：值，变量或操作符的组合。

    | 17
    | n + 2

语句（statement）：能够制造一个变量或者显示信息的代码。

    | n = 17
    | print(n)


数与格式化显示
-------------------------

    | x = 3.1415926
    
    | print('%4.0f' % (x))
    | print('%4.1f' % (x))
    | print('%4.2f' % (x))
    | print('%4.3f' % (x))
    | print('%4.4f' % (x))
    
    
    | print('%6.0f' % (x))
    | print('%6.1f' % (x))
    | print('%6.2f' % (x))
    | print('%6.3f' % (x))
    | print('%6.4f' % (x))
    
    
    | print('%.0f' % (x))
    | print('%.1f' % (x))
    | print('%.2f' % (x))
    | print('%.3f' % (x))
    | print('%.4f' % (x))
    | print('%.5f' % (x))
    | print('%.6f' % (x))
    | print('%.7f' % (x))
    | print('%.8f' % (x))
    | print('%.9f' % (x))
    | print('%.15f' % (x))
    | print('%.16f' % (x))
    | print('%.17f' % (x))
    | print('%.18f' % (x))
    
    | print('%4.f' % (x))
    | print('%5.f' % (x))
    | print('%6.f' % (x))
    | print('%7.f' % (x))
    | print('%8.f' % (x))
    
    | print('%f' % (x))
    

字符串（Strings）
------------------------------------------

由字符组成。

| fruit = 'banana!'
| first_letter = fruit[0]
| second_letter = fruit[1]

索引（index）从0开始，所以1代表第二个字符。只用整数。

负整数代表从字符串末尾开始。如fruit[-1]代表fruit字符串最后一个字符。

| i = 1
| fruit[i]
| fruit[i+1]

len()函数。返回字符串字符个数。len(fruit)。

| L = len(fruit)
| fruit[L-1]，最后一个字符。与fruit[-1]等价。


遍历（traverse）字符串。

    | fruit = 'banana'
    | for c in fruit:
    |     print(c)
    

反向遍历。

    | fruit = 'banana'
    | for i in range(len(fruit)-1,-1,-1):
    |     print(fruit[i])
    
    | fruit = 'banana'
    | for c in fruit[::-1]:  # [start,stop,step]
    |     print(c)
    
    
    | fruit = 'banana'
    | for c in ''.join(reversed(fruit)):
    |     print(c)
    

以上 ``# [start,stop,step]`` 代表注释（comment），注释以 ``#`` 号开头。
    

字符串相加（concatenation）
-------------------------------------------------------

输出Jack, Kack, Lack, Mack, Nack, Ouack, Pack, and Quack

| prefixes = 'JKLMNOPQ'
| suffix = 'ack'
| for c in prefixes:
|     if c == 'O' or c == 'Q':
|        print(c + 'u' + suffix)
|     else:
|         print(c + suffix)


子串（slice）
-------------------------------------------------------

s[n:m]，其中n或m可省略。
包括第n个字符，不包括第m个字符。（索引自0开始）

| s = 'Monty Python'
| s[0:5]
| s[6:12]
| s[:5]
| s[6:]
| s[:]

n一般小于m。如果n大于等于m，那么就返回空字符串。

空字符串的长度是0。

字符串是immutable的。不能改变已有的字符串。

| greeting = 'Hello, world!'
| greeting[0] = 'J'

| greeting = 'Hello, world!'
| new_greeting = 'J' + greeting[1:] 


搜索字符串
-----------------------------

| def find(word, c):
|     i = 0
|     while i < len(word):
|         if word[i] == c:
|             return i
|         i = i + 1
|     return -1

| print(find('banana', 'a'))

练习一：加第三个参数，设定从哪个字符开始搜起。

练习二：加第三个参数，设定从哪个方向开始搜起。

String对象有内置函数find。

数字符串中某个字符的个数。

练习：用上面三参数的find来做。


String类（对象）方法
------------------------------------------

| upper()
| lower()

方法调用：invocation/call

| word.find('na')
| word.find('na', 3)
| name.find('b', 1, 2)


in操作符
------------------------------------------

'a' in 'banana'
'seed' in 'banana'

练习：写出下面的函数，使得
in_both('apples', 'oranges')返回'aes'。


字符串比较
-------------------------------------------

字典序（alphabetical order）。大写字母排在小写字母前。


字符串之间可以有以下对比操作:

| ==
| >, >=
| <, <=


练习：写is_reverse函数，使得is_reverse('god', 'dog')返回True。    

    
find_from函数的两种实现。如果能够找出错误，给1分奖励。

字符串是对象（object）。

对象的本质涵义 - data construct。

计算复杂度。

即兴定义函数，制造一个长度不小于4的密码。


列表
--------------------

语言的内置（built-in）类型。注意与String类比，index也是从0开始， in操作符， 求长度，获得字串，遍历操作类似。


    | [ ]
    | [10, 20, 30, 40]
    | ['crunchy frog', 'ram bladder', 'lark vomit']
    

列表中的元素不需要是同一类型的: ``['spam', 2.0, 5, [10, 20]]``

列表[10,20]在另外一个列表中，这叫嵌套列表。

['spam', 1, ['Brie', 'Roquefort', 'Pol le Veq'], [1, 2, 3]]，长度是多少？


列表是 Mutable类型。值可以在原地变。（注意与String的区别）。

IndexError

遍历

for cheese in cheeses:
    print(cheese)


for i in range(len(numbers)):
    numbers[i] = numbers[i] * 2

for x in []:
    print('This never happens.')

    
.. 讨论软件工程认证数据输入问题。


``+`` 操作符用来连接， ``*`` 操作符用来重复。

列表的方法

    append
    
    extend
    
    sort
    
    t = ['d', 'c', 'e', 'b', 'a']

    t.sort() # 问t.sort()返回什么值？

    t
    
sum  - reduce方法，把几个值变成一个值

map方法，把几个值变成另外几个值

def f(x):
    return 2*x

list(map(f, [1,2]]))


filter方法，从几个值中选择符合条件的几个值。


    | def f(x):
    |     if x % 2 == 0:
    |         return True
    |     return False

    | list(filter(f, [1,2,3,4]))


pop

    | t = ['a', 'b', 'c']
    | x = t.pop(1) # pop可不带参数，不带参数返回哪个值？
    

del

    | t = ['a', 'b', 'c']
    | del t[1]
    
    | t = ['a', 'b', 'c', 'd', 'e', 'f']
    | del t[1:5]
    

remove

    | t = ['a', 'b', 'c']
    | t.remove('b')
    

split

    | list_of_characters = list('spam')
    | list_of_words = 'spam should be filtered'.split()
    | list_of_words = 'spam-should-be-filtered'.split('-')
    

join方法

    | ','.join(['1','2','3'])
    
    
    | a = 'banana'
    | b = 'banana'
    | a is b # a与b是不是指向同一个值
    | a == b
    
    
    | a = [1, 2, 3]
    | b = [1, 2, 3]
    | a is b # not identical, a and b are not the same object 
    | a == b # equivalent     though they have the same values
    

别名（Aliasing）

a = [1, 2, 3]
b = a
b is a 

把变量名与对象联系起来叫做reference。
a与b是指向[1,2,3]的两个references。
因为[1,2,3]是mutable的，所以使用a对[1,2,3]做改变同样影响到b对应的值。
error-prone（易错）


列表作为参数
---------------------------------------------

    | def delete_head(t):
    |     del t[0]
    
    | letters = ['a', 'b', 'c']
    | delete_head(letters) # letters and t points to the same list object.
    | letters
    

注意区别 ``append`` 与 ``+`` 操作符
----------------------------------------------

    | t1 = [1, 2]
    | t2 = t1.append(3)
    | t1
    | [1, 2, 3]
    | t2
    
    
    | t3 = t1 + [4]
    | t3
    | [1, 2, 3, 4]
    | t1
    | [1, 2, 3]
    

区别如下两个函数:

    def bad_delete_head(t):
        t = t[1:] # WRONG!
    
    def tail(t):
        return t[1:]
    

TDD - Test-driven Development
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

测试驱动开发。 My favourite。 刺激有挑战性。 帮助厘清需求。  帮助编写代码。

推荐使用pytest。如何安装？ 使用命令 ``pip install pytest``。

在 ``test_cases.py`` 写如下测试用例。然后在命令行运行： ``python -m pytest test_cases.py`` 。

.. code:: python

	  # Copyright (c) Hui Lan 2019

          import random
          import string
          
          def make_password(n):
              '''
              Return a string of length n consisting of a combination of
              letters, digits and special characters.  Note that each password
              must have at least one lower case letter, one upper case letter,
              one digit and one special charater.  Return an empty string if n
              is less than 4.
              '''
          
              if n < 4:
                  return ''
              
              password = random.choice(string.ascii_lowercase) + \
                  random.choice(string.ascii_uppercase) + \
                  random.choice(string.digits) + \
                  random.choice(string.punctuation) + \
                  ''.join([random.choice(string.ascii_letters + string.digits + string.punctuation) for i in range(n-4)])
          
              return ''.join(random.sample(password, n)) # shuffle password then return
          
          
          if __name__ == '__main__':
              for n in range(0,20):
                  pwd = make_password(n)
                  print(pwd)
            

计算复杂度
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

用Big O表述复杂度。O(n)， O(n^2), O(n^3)。


密码实验回顾。


字典（Dictionary）
---------------------------------

Mutable数据类型。

实际开发中超级有用。

    | d = {} or d = dict()
    
    | d = {'hot':'热', 'cool':'凉', 'cold':'冷'}
    | d['warm'] = '温'
    | d['warm']
    | d['freezing'] # KeyError
    | len(d)
    
    | 'warm' in d
    | '温' in d.values()
    
key

value

key-value pair (item)

item的顺序不可预测，不是按照创建时的顺序。


递增开发（Incremental Development）
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

每次完成一小点。从易到难。


练习：给定一个字符串，数出每个字母出现的频率。

.. code:: python
	  
     def histogram(s):
         ''' Cannot pass any test cases. '''
         pass
    
     def histogram(s):
         ''' Can pass the test case in which s is an empty string. '''
         d = {}
         return d
    
     def histogram(s):
         ''' Can pass the test cases in which all characters in s are unique. '''
         d = {}
         for c in s:
             d[c] = 1
         return d
    
     def histogram(s):
         ''' Can pass all test cases. '''
         d = {}
         for c in s:
             if c not in d:
                 d[c] = 1
             else:
                 d[c] += 1
         return d
    
    
     h = histogram('good')
     print(h)
    

练习：给定一个字符串，数出每个单词出现的频率。

练习：给定一个新闻文本，数出每个单词出现的频率。考虑以下方面，（1）只考虑字典里有的单词。（2）单词周围如有标点符号，要先移除。

.. code:: python
	  
	  # Copyright (C) 2019 Hui Lan
          # The following line fixes SyntaxError: Non-UTF-8 code starting with ...
          # coding=utf8
          
          def file2lst(fname):
              ''' Return a list where each element is a word from fname. '''
              L = []
              f = open(fname)
              for line in f:
                  line = line.strip()
                  lst = line.split()
                  for x in lst:
                      L.append(x)
              f.close()
              return L
          
          
          def lst2dict(lst):
              ''' Return a dictionary given list lst.  Each key is an element in the lst.
              The value is always 1.'''
              d = {}
              for w in lst:
                  d[w] = 1 
              return d
          
          
          import string
          def remove_punctuation(s):
              p = ',.:’“”' + string.punctuation  
              t = ''
              for c in s:
                  if not c in p:
                      t += c
                  elif c == '’': # handle the case such as May’s
                      return t
              return t
          
          def word_frequency(fname, english_dictionary):
              ''' Return a dictionary where each key is a word both in the file fname and in 
              the dictionary english_dictionary, and the corresponding value is the frequency
              of that word.'''
              d = {}
              L = file2lst(fname)
              for x in L:
                  x = remove_punctuation(x.lower())
                  if x in english_dictionary:
                      if not x in d:
                          d[x] = 1
                      else:
                          d[x] += 1
              return d
          
          
          def sort_by_value(d):
              ''' Return a sorted list of tuples, each tuple containing a key and a value.
                  Note that the tuples are order in descending order of the value.'''
              import operator
              lst = sorted(d.items(), key=operator.itemgetter(1), reverse=True)    
              return lst
          
          
          if __name__ == '__main__':    
              ed = lst2dict(file2lst('words.txt')) # from http://greenteapress.com/thinkpython2/code/words.txt
              d = word_frequency('brexit-news.txt', ed)
              lst = sort_by_value(d)
              for x in lst:
                  print('%s (%d)' % (x[0], x[1]))
          

练习： 改写函数 ``word_frequency`` ， 使它能接受第三个参数， ``black_lst``。 ``black_lst`` 是包含要排除考虑的单词的列表。 例如， ``black_lst`` 可以是 ``['the', 'and', 'of', 'to']`` 。


key与value互换
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

注意到在原来的字典中一个value可能对应多个key的值。比如 ``d = {'a':1, 'b':2, 'c':2}`` 中，2就对应两个key，'b'与'c'。


.. code:: python
	  
          def inverse_dictionary(d):
              d2 = {}
              for k in d:
                  v = d[k]
                  if not v in d2:
                      d2[v] = [k]
                  else:
                      d2[v].append(k)
              return d2
          
          
          d = {'a':1, 'b':2, 'c':2}
          d2 = inverse_dictionary(d)
          print(d2)
          

练习： 用 ``inverse_dictionary`` 对上面 ``d = word_frequency('brexit-news.txt', ed)`` 产生的 ``d`` 进行转化。然后按照单词出现频率从高到低把所有单词都显示出来。每行显示一个频率内的所有单词。


.. code:: python
	  
          d2 = inverse_dictionary(d)
          for k in sorted(d2.keys(), reverse=True):
              print('%d %s' % (k, ' '.join(d2[k])))
              

练习： 使用 ``setdefault`` 方法对上面的 ``inverse_dictionary`` 进行简化 （减少行数）。


.. code:: python


          def inverse_dictionary(d):
              d2 = {}
              for k in d:
                  v = d[k]
                  d2.setdefault(v, []).append(k)
          
              return d2
          

字典里面可以有字典
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: python

	  d = { 'john':{'dob':'1990-10-23', 'height':'6 feet 5 inches'} }


函数
------

当我们开始不断复制黏贴代码时，就要考虑把这部分代码做成函数了。

函数 ``unique_words`` 与 ``unique_words2`` 哪个运行速度快？

.. code:: python

          def unique_words(lst):
              d = {}
              for x in lst:
                  d[x] = 1
              return sorted(d.keys())
          
          def unique_words2(lst):
              return sorted(list(set(lst)))
          
          
          N = 10000000
          print(unique_words(['hello', 'world', 'am', 'he'] * N))
          print(unique_words2(['hello', 'world', 'am', 'he'] * N))
          

局部变量
~~~~~~~~~~~~~~~~

在函数之内。函数执行结束，局部变量消失。


全局变量
~~~~~~~~~~~~~~~~

全局变量位于函数之外，模块之内。全局变量对所有模块内的函数可见（可读）。如果在函数内要对全局变量重新赋值，那么要先用 ``global`` 声明之 （declare）。


.. code:: python
	  
          verbose = True
          
          def example1():
              if verbose:
                  print('Running example1')
                  
          def example2():
              verbose = False  # a NEW local variable verbose
              if verbose:
                  print('Running example2')
                  
          def example3():
              global verbose # I am actually going to use the global variable verbose; don't create a local one.
              verbose = False
              if verbose:
                  print('Running example3')
          
          
          print(verbose)     
          example1()
          
          print(verbose) 
          example2()
          example1()
          
          print(verbose) 
          example3()
          example1()
          
          print(verbose) 
          

全局的列表与字典，如果只需改变其内容，而不是重新赋值，则不需要用 ``global`` 声明。


.. code:: python
	  
          record = {'s1':65, 's2':60}
          
          def add_score(student, score):
              record[student] = score
              
              
          print(record)
          add_score('s3', 75)
          print(record)


练习： 定义一个函数 ``empty_dict`` 清空字典 ``record``。 要求: 不能用 ``return`` 语句。 提示： 可以用 ``pop`` 方法， 或者直接给 ``record`` 赋值 ``{}`` 。


调用函数与传递参数
~~~~~~~~~~~~~~~~~~~~~~~~~

在使用函数前要先确定函数已经被定义。

区别 ``argument`` 与 ``parameter`` 。传过去的是 ``argument`` ， 函数头的参数列表是 ``parameter`` 。 ``argument`` 的值赋给 ``parameter`` ， ``parameter`` 是函数的局部变量。

``argument`` 与 ``parameter`` 的名字可以相同也可以不同。


.. code:: python
	  
          def reverse_string(s):
              t = ''
              for i in range(len(s)-1,-1,-1):
                  t += s[i]
              return t
          

          s = 'put'
          t = reverse_string(s)
          print(t)

以上 s 一个是全局变量一个是局部变量。

以上 t 一个是全局变量一个是局部变量。 


函数执行顺序 （flow of execution）
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

函数的定义不执行，被调用时才执行。

顺序执行。 当遇到函数调用时，跳转到函数，执行函数，函数返回后继续执行跳转地后一条语句。


文件
------------------------------------------------

信息多存储在文件中。所以文件的读写是最最常见的操作。 本节主要考虑纯文本文件。 以下后缀结尾的文件一般都是纯文本文件: txt, csv, html, rst, md。

实验： 读取纽约新生儿的名字统计文件 PopularBabyNames_ 。
      写命令行程序 lookupname.py 。给定性别与种族，输出最流行的头几个名字。
      命令行例子： ``python lookupname.py girl white top5`` 。 这个命令输出最流行的5个白人女孩的名字。
      第一个参数可以是 ``girl/boy`` ， 第二个参数可以是 ``asian/white/black/hispanic`` 。第三个参数以 ``top`` 开始，默认是 1。

.. _PopularBabyNames: https://data.cityofnewyork.us/api/views/25th-nujf/rows.csv?accessType=DOWNLOAD

.. code:: python

         # Copyright (C) 2019 Hui Lan
         # lanhui AT zjnu.edu.cn
         # Purpose: 1. Introduce command line argument parsing. 2. Introduce nested dictionaries. 
         # Usage:
         #   python lookupname.py asian boy top10
         #   python lookupname.py white girl top5
         #   python lookupname.py girl white top 
         
         
         def map(x):
             d = {'FEMALE':'girl', 'MALE':'boy', 'ASIAN AND PACIFIC ISLANDER':'asian', 'ASIAN AND PACI':'asian',
                  'BLACK NON HISPANIC':'black', 'BLACK NON HISP':'black', 'HISPANIC':'hispanic', 'WHITE NON HISPANIC':'white', 'WHITE NON HISP':'white'}
             return d[x]
         
         
         def file2dict(fname):
             d = {} # will be a nested dictionary: e.g., d[gender] = {'asian':{'name':count}, 'black':[], 'white':[], 'hispanic':[]}
             f = open(fname)
             lines = f.readlines()
             for line in lines[1:]:
                 line = line.strip()
                 lst = line.split(',')
                 gender = map(lst[1])
                 ethnicity = map(lst[2])
                 firstname = lst[3].title()
                 count = int(lst[4])
                 if not gender in d:
                     d[gender] = {ethnicity: {firstname:count}}
                 else:
                     if not ethnicity in d[gender]:
                         d[gender][ethnicity] = {firstname:count}
                     else:
                         if not firstname in d[gender][ethnicity]:
                             d[gender][ethnicity][firstname] = count
                         else:
                             d[gender][ethnicity][firstname] += count
             f.close()
             return d
         
         
         def get_commandline_parameter(lst):
             d = {'gender':'', 'ethnicity':'', 'top':1}
             for x in lst:
                 o = x.lower()
                 if o in ['asian', 'black', 'white', 'hispanic']:
                     d['ethnicity'] = o
                 elif o in ['girl', 'boy']:
                     d['gender'] = o
                 elif o == 'top':
                     pass # use default value 1
                 elif 'top' in o:
                     d['top'] = int(o[3:])
                 else:
                     raise Exception('Not recognised option %s' % (x))
             return d
         
         
         def sort_by_value(d):
             ''' Return a sorted list of tuples, each tuple containing a key and a value.
                 Note that the tuples are order in descending order of the value.'''
             import operator
             lst = sorted(d.items(), key=operator.itemgetter(1), reverse=True)    
             return lst
         
         
         import sys
         if __name__ == '__main__':
             d = file2dict('Popular_Baby_Names.csv')
             args = get_commandline_parameter(sys.argv[1:])
             gender = args['gender']
             ethnicity = args['ethnicity']
             top = args['top']
             d2 = d[gender][ethnicity]
             lst = sort_by_value(d2)
             for i in range(top):
                 print(lst[i][0])
         

模块
-----------------------------------------------

每个py文件就是一个模块。 每个模块有一个隐含的变量指示模块名， ``__name__`` 。

当该py文件作为主模块运行时， ``__name__`` 的值是 ``__main__`` 。 当该py文件作为被引入的模块时，该模块的 ``__name__`` 的是模块名 （是文件名）。

在每个py文件 ``if __name__ == '__main__':`` 后添加测试代码， 当这个py文件作为主模块运行时，测试代码会被执行。 而当引入这个py文件时，该文件的测试代码不会被执行，我们也不希望它们执行。


排序
------------------------------------------------

排序是常见重要的操作。 按照成绩排序。  按照文件名排序。  按照文件大小排序。  按照时间排序。

Python自带的 ``sorted`` 可以很好满足排序需求。


排序一组数或一组字符串
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

如果需要从大到小排序， 添加 ``reverse=True`` 。

.. code:: python

         # Sort numbers
         import random
         a = [random.randint(0,100) for i in range(5)] # a list of 5 random numbers between 0 and 100
         print(a)
         
         sa_incr = sorted(a)
         print(sa_incr)
         
         sa_decr = sorted(a, reverse=True)
         print(sa_decr)
         
         # Sort a list of strings
         s = 'D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.  https://d3js.org/'
         lst = list(set(s.split()))
         
         sa_incr = sorted(lst)
         print(sa_incr)
         
         sa_decr = sorted(lst, reverse=True)
         print(sa_decr)
         

自定义排序算法
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

为了弄清排序的原理， 我们看两种排序算法。


选择排序
```````````````````````````````````````````````````

遍历列表，每次把最小的那个放到最左边位置。

.. code:: python

         # Copyright (C) 2019 Hui Lan
         # lanhui AT zjnu.edu.cn

         def swap(L, i, j):
             L[j], L[i] = L[i], L[j]
         
         
         def selection_sort(L):
             i = 0
             while i < len(L):
                 min_val = L[i]
                 k = j = i
                 while j < len(L):
                     if L[j] < min_val:
                         min_val = L[j]
                         k = j
                     j += 1
                 swap(L, i, k) # will change L
                 i += 1
             return L
         
         if __name__ == '__main__':
             
             import random
             for n in range(10):
                 a = [random.randint(0,100) for i in range(n)]
                 sa = selection_sort(a)
                 print(sa)
                 assert sa == a
                 assert sa == sorted(a)
         

合并排序 （Merge sort）
```````````````````````````````````````````````````

将列表一分为二，对每半部分排序，把排好序的两部分合并之（确保合并后同样是排好序的）。 注意到，以下的实现方式是递归。

.. code:: python

         # Copyright (C) 2019 Hui Lan
         # lanhui AT zjnu.edu.cn

         def _merge(L, R):
             ''' Return a sorted list that combines the sorted list L and sorted list R.'''
             nL = len(L)
             nR = len(R)
             result = []
             i = j = count = 0
             while count < nL + nR:
                 if i >= nL and j < nR:
                     result.append(R[j])
                     j += 1
                 elif j >= nR and i < nL:
                     result.append(L[i])
                     i += 1
                 elif L[i] < R[j]:
                     result.append(L[i])
                     i += 1
                 else:
                     result.append(R[j])
                     j += 1
                 count += 1
             return result
         
         
         def merge_sort(L):
             if len(L) <= 1:
                 return L
             else:
                 i = int(len(L)/2)
                 l = merge_sort(L[:i])
                 r = merge_sort(L[i:])
                 return _merge(l, r)
         
         if __name__ == '__main__':
                 
             import random
             for n in range(100):
                 a = [random.randint(0,100) for i in range(n)]
                 sa = merge_sort(a)
                 assert sa == sorted(a)
         

比较排序速度
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

排序是 Python 的核心算法，所以是优化了再优化。

Python 自带的排序算法最快， ``selection_sort`` 最慢。 


.. code:: python

         from merge_sort import merge_sort
         from selection_sort import selection_sort
         
         import random, time
         L = [random.randint(0,10000) for i in range(10000)]
         
         print('Python sort ...')
         now = time.time()
         result0 = sorted(L)
         print(time.time() - now)
         
         
         print('Merge sort ...')
         now = time.time()
         result1 = merge_sort(L)
         print(time.time() - now)
         
         print('Selection sort ...')
         now = time.time()
         result2 = selection_sort(L)
         print(time.time() - now)
         
         assert result0 == result1
         assert result1 == result2
         

在命令行运行上面的程序，在作者的计算机上得到如下的结果。

| Python sort ...
| 0.002
| Merge sort ...
| 0.083
| Selection sort ...
| 11.57


排序元组列表
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

一个元组由多个元素组成，多个元组组成元组列表， 如何按照某个元素进行排序呢？

可以有以下两种方案。一种用模块 ``operator`` ， 一种用 ``lambda`` 函数。

.. code:: python

         def sort_by_nth_element(lst, n):
             ''' Return a sorted list of tuples lst, according to the nth element in each tuple.'''
             import operator
             result = sorted(lst, key=operator.itemgetter(n))    
             return result
         
         
         def sort_by_nth_element2(lst, n):
             ''' Return a sorted list of tuples lst, according to the nth element in each tuple.'''
             result = sorted(lst, key=lambda x: x[n]) # https://stackoverflow.com/questions/8966538/syntax-behind-sortedkey-lambda
             return result
         
         
         if __name__ == '__main__':
             lst = [(1, 'xxx', 2), (2, 'aaa', 1)]
             print(sort_by_nth_element(lst, 0))
             print(sort_by_nth_element(lst, 1))    
             print(sort_by_nth_element(lst, 2))
             
             print(sort_by_nth_element2(lst, 0))
             print(sort_by_nth_element2(lst, 1))    
             print(sort_by_nth_element2(lst, 2))
         

巧用 lambda 函数进行灵活排序
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

如何把一个由字符串组成的列表按照字符串的长短进行排序？

.. code:: python

         lst = ['this', 'is', 'a', 'example']
         a = sorted(lst, key=lambda x: len(x))
         b = sorted(lst, key=lambda x: -len(x))
         print('\n'.join(a))
         
         s = '''https://genius.com/William-shakespeare-romeo-and-juliet-act-1-prologue-annotated#note-2756596
         Romeo and Juliet
         PROLOGUE
         Two households, both alike in dignity,
         In fair Verona, where we lay our scene,
         From ancient grudge break to new mutiny,
         Where civil blood makes civil hands unclean.
         From forth the fatal loins of these two foes
         A pair of star-cross'd lovers take their life;
         Whose misadventured piteous overthrows
         Doth with their death bury their parents' strife.
         The fearful passage of their death-mark'd love,
         And the continuance of their parents' rage,
         Which, but their children's end, nought could remove,
         Is now the two hours' traffic of our stage;
         The which if you with patient ears attend,
         What here shall miss, our toil shall strive to mend.'''
         
         lst = s.split('\n')
         c = sorted(lst, key=lambda x: len(x))
         d = sorted(lst, key=lambda x: -len(x))
         print('\n'.join(c))


以上程序运行会输出如下结果。

::

  a
  is
  this
  example
  PROLOGUE
  Romeo and Juliet
  Two households, both alike in dignity,
  Whose misadventured piteous overthrows
  In fair Verona, where we lay our scene,
  From ancient grudge break to new mutiny,
  The which if you with patient ears attend,
  And the continuance of their parents' rage,
  Is now the two hours' traffic of our stage;
  Where civil blood makes civil hands unclean.
  From forth the fatal loins of these two foes
  A pair of star-cross'd lovers take their life;
  The fearful passage of their death-mark'd love,
  Doth with their death bury their parents' strife.
  What here shall miss, our toil shall strive to mend.
  Which, but their children's end, nought could remove,
  https://genius.com/William-shakespeare-romeo-and-juliet-act-1-prologue-annotated#note-2756596


递归
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Memo
``````````````````````````````````````````````````

.. code:: python

         def fibonacci(n):
             if n == 0:
                 return 0
             elif n == 1:
                 return 1
             else:
                 return fibonacci(n-1) + fibonacci(n-2)
         
         
         known = {0:0, 1:1}
         def fibonacci_memo(n):
             ''' A  'memoized' version of fibonacci. '''
             if n in known:
                 return known[n]
             res = fibonacci(n-1) + fibonacci(n-2)
             known[n] = res
             return res
         
         
         n = 35
         import time
         t1 = time.time()
         print(fibonacci(n))
         print('%4.2f' % (time.time() - t1))
         
         t1 = time.time()
         print(fibonacci_memo(n))
         print('%4.2f' % (time.time() - t1))
         

用递归方式改写 ``selection_sort``
```````````````````````````````````````````````````

.. code:: python
	  
         def selection_sort(L):
             if len(L) <= 1:
                 return L
             
             min_val = L[0]
             k = j = 0
             while j < len(L):
                 if L[j] < min_val:
                     min_val = L[j]
                     k = j
                 j += 1
             L[k], L[0] = L[0], L[k]
             return [min_val] + selection_sort(L[1:]) 
         

注意以上的代码在L的长度很大（1000可以，10000不可以）的时候会用尽所有stack。


类 class 与 对象 object
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


web 应用程序
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

输入文本
``````````````````````````````````````````````````

school.py


上传文件
```````````````````````````````````````````````````

omg

	  
参考
------

- Think Python 2e – Green Tea Press.  http://greenteapress.com/thinkpython2/thinkpython2.pdf.  


.. Make a html page from this file.  Issue the following command:
   pip install docutils && rst2html.py LectureNotesOnPython.rst LectureNotesOnPython.html