Bug 358 - 带单引号的英文单词会被分隔成两个独立的单词
Summary: 带单引号的英文单词会被分隔成两个独立的单词
Status: RESOLVED FIXED
Alias: None
Product: EnglishPal
Classification: Unclassified
Component: Bug报告 (show other bugs)
Version: 0.1
Hardware: PC Windows
: --- normal
Assignee: Hui Lan
URL:
Depends on:
Blocks:
 
Reported: 2021-12-30 13:24 CST by 温启涛
Modified: 2024-09-08 11:54 CST (History)
0 users

See Also:


Attachments
附图 (116.44 KB, image/png)
2021-12-30 13:24 CST, 温启涛
Details

Description 温启涛 2021-12-30 13:24:13 CST
Created attachment 159 [details]
附图

在本项目数据库中原有的一些文章内,某些单词的单引号可能并非是半角单引号',而是全角单引号’,这就导致wordfreqCMD.py内的remove_punctuation函数在处理单词时将这个引号转为空格符,使得原本的单词被分隔成两部分

例如以下附件所示,选中的文章包含 that's / don't / I've 这三个单词,而检索后 s/don/t/ve 却成为了独立的单词
Comment 1 Hui Lan 2022-01-02 13:15:59 CST
Thanks, 温启涛同学

Hui
Comment 2 Hui Lan 2022-01-02 16:08:28 CST
所选的三个单词 that's / don't / I've 都很简单,一般不会作为生词加入。

有其它难一点的(独立)单词吗?


-Hui

Note You need to log in before you can comment on or make changes to this bug.