Bug476-ZhangWeiHao-YuHuangtao #85
Open
mrlan
wants to merge 9 commits from
Bug476-ZhangWeiHao-YuHuangtao
into master
pull from: Bug476-ZhangWeiHao-YuHuangtao
merge into: mrlan:master
mrlan:Alpha-snapshot20240618
mrlan:Bug580-Hui
mrlan:Bug565-Hui
mrlan:Bug394-Hui
mrlan:Bug478-Hui
mrlan:Bug501-Hui
mrlan:Bug511-Hui
mrlan:Bug518-Hui
mrlan:Bug563-Hui
mrlan:bug555-fangchen
mrlan:Bug547-FanWenQi-Adapted
mrlan:SPM2023S-QianJunQi
mrlan:Bug536-Jiangwangzhe
mrlan:Bug540-XiongJiaming
mrlan:Bug579-LuKangyang
mrlan:BUG543-JiWenkai
mrlan:Bug574-ChenLingjie2
mrlan:Bug573-PanBinjie
mrlan:Bug570_CaiShuHuang
mrlan:Bug578-ChenChen2
mrlan:Bug578-ChenChen
mrlan:Bug577-JiangXueQin
mrlan:Bug576-XiaBaizhi
mrlan:Bug564-JiangChao
mrlan:Bug572-ZhongYi2
mrlan:Bug533-ZhangXuDong
mrlan:Bug392-LiJie
mrlan:Bug358-LiJie
mrlan:Bug534-WangWeitao
mrlan:Bug569-YuTianshuai
mrlan:Bug563-HuangHaoqi
mrlan:Bug571-TongQi
mrlan:Bug572-ZhongYi
mrlan:Bug566-SunJiawen
mrlan:Bug574-ChenLingjie
mrlan:Bug565-ChenYuhang
mrlan:Bug568-SongHaiyan
mrlan:Bug567-YuZheChen
mrlan:Bug571-TonQi
mrlan:Bug561-LiangZiyue
mrlan:Bug511-Bosh
mrlan:Bug555-chenshiying
mrlan:Bug513-Mayada
mrlan:Bug543-AyaOK
mrlan:Bug518-Mponeja
mrlan:Bug543-Aya
mrlan:Bug547_FanWenQi
mrlan:bug-497Gongzhengcheng
mrlan:Bug500-Badr
mrlan:Bug518-Esther
mrlan:Bug536-QianLetao
mrlan:Alpha-snapshot20230621
mrlan:Bug533-Yuyikai
mrlan:Bug562-Wuyichen
mrlan:Alpha-snapshot20230621OK
mrlan:Bug476-LiMengdie
mrlan:Bug545-HuangHuiLing
mrlan:Bug561-WanZiKun
mrlan:BugTBD-LiSinan
mrlan:master
mrlan:Bug579
mrlan:0618、
mrlan:Bug551-DingZeYu
mrlan:547
mrlan:542
mrlan:Bug528-TangJiao
mrlan:Bug553_LinShan
mrlan:Bug-561
mrlan:fix-vuln
mrlan:Bug476-Yuhuangtao
mrlan:Bug527-ZhouZhifang
mrlan:Alpha-snapshot20230615
mrlan:Bug473-Buya
mrlan:Bug534-NingShushuang
mrlan:improvment_GuHan
mrlan:Bug393-TanYanMei
mrlan:improvment-NiWeiCong2
mrlan:refactor-huangzirui
mrlan:Bosh
mrlan:Alpha-snapshot20230605
mrlan:refactor-wangyu
mrlan:Alpha-snapshot20230601
mrlan:Alpha-snapshot20230531
mrlan:Bug529-GuHan
mrlan:Bug492-XuHongJian-HuangZirui
mrlan:Alpha-snapshot20230529
mrlan:Refactor_qianjunqi
mrlan:Bug522-HuangZirui
mrlan:Refactor-XunYucan
mrlan:Alpha-snapshot20230525
mrlan:Refactor-HeZhengzheng
mrlan:Alpha-snapshot20230511
mrlan:Bug407-JinHaoLin
mrlan:Alpha-snapshot20230519
mrlan:Bug476-YuHuangtao
mrlan:Bug473-LuXiaochen
mrlan:Bug493-GongKeCheng
mrlan:Bug488-TangWei
mrlan:SOFTARCH2023S-ZENOVIO
mrlan:Bug532-HuangDan
mrlan:Bug476-ZhangWeiHao-BaoYuelin
mrlan:Bug504-LiJia
mrlan:SPM2023-PR44-YuGaoxiang
mrlan:Bug476-ZhangWeiHao
mrlan:Alpha-snapshot20230507
mrlan:Huangdan
mrlan:Bug509-XieQiuHan-WangZiming-HuangDan
mrlan:Alpha-snapshot20230506
mrlan:improvment-NiWeiCong
mrlan:Bug502-YuGaoXiang
mrlan:bug359-zhangkeli
mrlan:593
mrlan:Alpha-snapshot20230427
mrlan:Alpha-snapshot20230425
mrlan:Bug509-XieQiuHan-WangZiming
mrlan:Alpha-snapshot20230426
mrlan:Alpha
mrlan:Lanhui-update-README2
mrlan:SPM2022F-CONTRIBUTORS-WuWenZhuo
mrlan:Bug490-ChenQiuwei
mrlan:Bug525-Hui
mrlan:englishpal
mrlan:AAALF
mrlan:Bug509-XieQiuHan
mrlan:SPM2022F-CONTRIBUTORS-DingRui
mrlan:SPM2022F-CONTRIBUTORS-jiaojiao
mrlan:Bug521-LiYuFeng-refactor
mrlan:Bug492-XuHongJian
mrlan:SPM2022F-CONTRIBUTORS-XIEQIUHAN
mrlan:SPM2022F-CONTRIBUTORS-XuHongJian
mrlan:509
mrlan:Bug512-RenYu
mrlan:SPM2022F-CONTRIBUTORS-GONGKE
mrlan:Bug521-LiYuFeng
mrlan:SPM2022F-CONTRBUTORS-luofei
mrlan:SPM2022F-CONTRIBUTORS-RUANYUXUAN
mrlan:bug492
mrlan:SPM2022F-CONTRIBUTORS-LINJUNHONG
mrlan:SPM2022F-CONTRIBUTORS-TANGWEI
mrlan:Bug508-CenHaotian
mrlan:SPM2022F-CONTRIBUTORS-GONGKECHENG
mrlan:Bug499-Hui
mrlan:SPM2022F-CONTROIBUTORS-XUHONGJIAN
mrlan:SPM2022F-CONTRBUTORS-TECHLEADNAME
mrlan:Bug505-ZhangYiteng
mrlan:Bug495-LiangLiGang
mrlan:SPM2022F-CONTRIBUTORS-LIJIA
mrlan:SPM2022F-CONTRIBUTORS-CENHAOTIAN
mrlan:SPM2022F-CONTRIBUTORS-ChenQiuwei
mrlan:Bug499-WangZiming
mrlan:Bug487-WuYuhan-Refactor
mrlan:Hui-Build
mrlan:Bug487-WuYuhan
mrlan:Bug412-JiangLetian-Refactor
mrlan:Bug412-JiangLetian
mrlan:Hui-EscapeUserInput
mrlan:Bug422-XuXing
mrlan:Improvement-Stewart
mrlan:Bug400-QiuZhonghui-Refactor
mrlan:Bug477-ChenJingyi
mrlan:Bug400-QiuZhonghui
mrlan:Bug474-RenYinJie
mrlan:Bug344-JiangXueHong
mrlan:Bug428-LouJiCheng
mrlan:SOFTARCH2022S-ZAYID-478
mrlan:Bug392-LuoYu
mrlan:Bug358-TengJiaQian
mrlan:Bug410-DuanJiaJie
mrlan:Bug209-LiuChangYou
mrlan:Bug393-QinYanMei
mrlan:IMPROVE-WangWeiLong
mrlan:Bug391-LiuYiXiu
mrlan:Bug260-Mohanad
mrlan:bug209_yaaqobv2
mrlan:BugFix407-JinHaoLin
mrlan:Bug394-MiaoChenShuo
mrlan:bug209-yaaqob
mrlan:Bug257-MaJiaBin
mrlan:Bug395-Anxiuxiu
mrlan:Deployment
mrlan:ChenTianle-TypoCorrection
mrlan:Hui-SaferSQL
mrlan:BugFix347
mrlan:WangXuan-Highlight-Pronounce
mrlan:Lanhui-Selenium
mrlan:Lanhui-update-README
mrlan:BugFix300
mrlan:Lanhui-IncreaseChanceOfGettingDifficultArticles
mrlan:BugFix293
mrlan:BugFix284
mrlan:Lanhui-flash-messages
mrlan:Lanhui-go-bootstrap
mrlan:Lanhui-build.sh
mrlan:BugFix254-Author-ZhanJianhao
mrlan:Lanhui-add-articles
No reviewers
Labels
Clear labels
No items
No Label
Milestone
Clear milestone
No items
No Milestone
Projects
Clear projects
No project
Assignees
Clear assignees
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.
No due date set.
Dependencies
No dependencies set.
Reference: mrlan/EnglishPal#85
Reference in New Issue
There is no content yet.
Delete Branch "Bug476-ZhangWeiHao-YuHuangtao"
Deleting a branch is permanent. Although the deleted branch may exist for a short time before cleaning up, in most cases it CANNOT be undone. Continue?
@yuhuangtao
请填写具体内容
工作内容:
改进用户的单词等级评价机制:
本分支通过修改difficulty.py中的get_difficulty_level函数,将用户的单词和系统自带的单词分两次进行评级,具体的流程如下:
1、加载系统词库到d2,加载用户单词到d1,调用get_difficulty_level函数进行评级;
2、使用get_difficulty_level_for_words_and_test函数给词库单词评级返回处理后的字典d2,完成之后调用simplify_the_words函数得到一个值与d2相同,键为d2中单词词干的字典d2_sim;
3、依次对用户的单词进行评级,分两类情况:
1)该单词的词干与d2_sim中的某个对应,则该单词的等级为d2_sim(单词)
2)该单词的词干与d2_sim中的每个都不符,则通过单词的频率(原先的评级方法)来评级
用户的单词评完级之后会被添加到d2中,最终返回d2
意义和缺陷:
对于apple和apples,在原先的系统中有可能会被识别为两个不同等级的词,这会导致用户或文章的等级与实际有所出入。改进单词的评级方法则可以在很大程度上降低这一风险,但只依据词根进行等级的判定在本组的方法中会导致部分单词的难度等级下降,比如某个六级单词的负数或分词形式,提取词根后与四级中的某个词一致,则该词的等级会被定为4
改进方法:
使用nltk库,还原单词为原型而不是词干能够提高准确率,但是需要添加文件,可部分添加(大概十几兆),也可完全添加(六百多兆)
@ -31,3 +38,3 @@
def get_difficulty_level(d1, d2):
def get_difficulty_level_for_words_and_tests(d_in):
@yuhuangtao
d
好于d_in
, 因为在参数列表中,一般来说都是 in 的了。请用d
。@ -51,0 +51,4 @@
d[k] = 4 # CET4 word has level 4
elif 'CET6' in d_in[k]:
d[k] = 6
elif 'IELTS' in d_in[k] or 'GRADUATE' in d_in[k]: # 雅思或研究生英语
雅思应该比考研词汇难点。
考研是 6。
雅思是 7。
@ -51,0 +53,4 @@
d[k] = 6
elif 'IELTS' in d_in[k] or 'GRADUATE' in d_in[k]: # 雅思或研究生英语
d[k] = 8
elif 'EnWords' in d_in[k]: # 除基础词汇外的绝大多数词,包括一些犄角旮旯的专业词汇,近九万个,绝大多数我是真不认识
将 'EnWords' 改为 'OTHER',方便吗?
已修改标签名
Thanks 很不错。
需要与原来的
words_and_test.p
做一个合并,而非完全替换它。请看看如何把
app/static/words_and_tests.p
从分支中去除。理由:代码仓库需要避免对二进制文件进行版本控制。
可以在电脑端用git面板进行合并,通过指令可以忽略此文件,使其不被合并
已完成合并,目前未上传
@ -51,0 +51,4 @@
result[k] = 4 # CET4 word has level 4
elif 'CET6' in d[k] or 'GRADUATE' in d[k]:
result[k] = 6
elif 'IELTS' in d[k]: # 雅思或研究生英语
@yuhuangtao
thanks
更新注释为,去掉研究生英语
@ -51,0 +55,4 @@
result[k] = 7
elif 'BBC' in d[k]:
result[k] = 8
# elif 'EnWords' in d[k]: # 除基础词汇外的绝大多数词,包括一些犄角旮旯的专业词汇,近九万个,定级不太好处理,绝大多数我是真不认识
如果是 EnWords, 则难度设为 3
@ -51,3 +79,3 @@
return d
def get_difficulty_level(d1, d2):
@yuhuangtao
函数重命名为
get_difficulty_level_for_user
@ -48,3 +40,1 @@
d[k] = min(difficulty_level_from_frequency(k, d1), d[k])
elif k in d1:
d[k] = difficulty_level_from_frequency(k, d1)
def get_difficulty_level_for_words_and_tests(d):
@yuhuangtao
函数重命名为
convert_test_type_to_difficulty_level
@ -18,39 +19,83 @@ def load_record(pickle_fname):
def difficulty_level_from_frequency(word, d):
@yuhuangtao
这个函数有用吗?似乎可以重构(删除)掉。
有用。我们的词库虽然很大,但仍有可能出现某单词的原型乃至词根都不在词库中的情况,该函数就被用于这种情况,即根据用户的频率给其评级
@ -54,0 +65,4 @@
在d2的后面添加单词,没有新建一个新的字典
"""
d2 = convert_test_type_to_difficulty_level(d2) # 根据d2的标记评级{'apple': 4, 'abandon': 4, ...}
d2_simplified = simplify_the_words_dict(d2) # 提取d2的词根 {'appl': 4, 'abandon': 4, ...}
@yuhuangtao
还需调用
simplify_the_words_dict
吗? 上一行的d2
中已经有词根了。是每个单词的词根都有了吗?
应该是的,见你们看板卡片 Merge pull request [44] for [EnglishPal] 附件中的 select_words_and_tests.py
在制作 d4 的时候,把词根也作为 key 存入字典中了。 d4 最后被存成 my_words_and_tests.pickle
@ -54,0 +24,4 @@
:return:
"""
result = {}
L = list(d.keys()) # in dic, we have test types (e.g., CET4,CET6,BBC) for each word
dic 改 d
Step 1:
From your project repository, check out a new branch and test the changes.Step 2:
Merge the changes and update on Gitea.