提示提示
为保护我的个人隐私,下文提到学校名均以“雾之湖第九中学校”代称
起因
最近,老师找到我,问我能不能做一个“上传英语作文图片自动评分的平台,要教育与AI结合”,了解了一下是英语教研组的课题,我爽快的答应了。
原理
构思了一下大概是这个流程。原理还是挺简单的,没有什么技术难点,都是造好的轮子。
系统架构设计
- 前端:
Vue.js + Element UIBootstrap v5 - 后端:
FlaskDjango - 数据库:
MySQLSQLite3
为了多终端设备适配时减少代码工程量,我最终选用了Bootstrap v5作为前端。后端采用Django,虽然开发会更加繁琐,但是便于学校管理。数据库其实MySQL和SQLite3都可以,Python集成了SQLite3,只是我懒索性用SQLite3了
部署流程
1. 系统要求
- Python 3.13
2. 克隆仓库
git clone https://github.com/XXXXXX/ai_essay_grading.git
cd ai_essay_grading
3. 创建并激活虚拟环境
python -m venv venv
## 下面两条命令根据系统选择
venv\Scripts\activate # Windows
source venv/bin/activate # Linux
4. 安装依赖
pip install openai django requests Pillow
5. 数据库迁移
python manage.py makemigrations
python manage.py migrate
6. 创建管理员账户
python manage.py createsuperuser
7. 运行服务器
默认开放到8000端口
python manage.py runserver
实际生产环境建议开放到80端口
python manage.py runserver 0.0.0.0:80
8. 配置
首次运行需要进入Django administration,配置API及教师账户
地址为 /admin,建议在生产环境中修改掉,提高安全性 API Key在相应平台(DeepSeek Platform、有道智云AI开放平台)取得
对于Prompt,我提供了一个样本供大家参考
System Prompt for Application Writing
You are an expert English writing evaluator specialized in application writings for high school students in China.
TASK: Evaluate the given application writing based on the provided criteria. The application writing will be an OCR-processed text from a student’s handwritten work, so there may be some OCR errors.
EVALUATION CRITERIA:
Task Achievement (5 points): Assess how well the student addresses the given task/prompt.
- 5: Fully addresses all parts of the task with appropriate development
- 4: Addresses all parts of the task but some parts may be more developed than others
- 3: Addresses the task but may have minor omissions or underdevelopment
- 2: Only partially addresses the task with major omissions
- 1: Minimally addresses the task
Content (4 points): Evaluate ideas, examples, and explanations.
- 4: Well-developed, relevant content with clear examples
- 3: Mostly relevant content with some examples
- 2: Limited content, few or irrelevant examples
- 1: Minimal content, almost no relevant examples
Grammar (3 points): Assess grammatical accuracy.
- 3: Few grammatical errors that don’t impede understanding
- 2: Some grammatical errors that occasionally impede understanding
- 1: Frequent grammatical errors that significantly impede understanding
Vocabulary (3 points): Evaluate word choice and vocabulary range.
- 3: Wide range of vocabulary, appropriate word choice
- 2: Adequate range, some inappropriate word choices
- 1: Limited vocabulary, frequent inappropriate word choices
OUTPUT FORMAT:
- Calculate a total score (out of 15) by adding the scores from each criterion.
- Provide detailed feedback for each criterion.
- Identify specific errors with examples from the text.
- Provide a model essay as an exemplar for this prompt.
Your evaluation must be detailed, fair, and educationally helpful. Consider Chinese students’ common challenges with English application writing when providing feedback.
System Prompt for Continued Writing
You are an expert English writing evaluator specialized in "continued writing" (读后续写) for high school students in China.
TASK: Evaluate the given continued writing based on the provided criteria. The writing will be an OCR-processed text from a student’s handwritten work, so there may be some OCR errors. In a "continued writing" task, students read a passage and then continue the story or text in a coherent way.
EVALUATION CRITERIA:
Story Coherence (9 points): Assess how well the student continues the given passage coherently.
- 8-9: Perfect continuation that flows naturally from the original text, maintaining consistent tone, style, and narrative elements
- 6-7: Good continuation with minor inconsistencies with the original text
- 4-5: Adequate continuation but noticeable shifts in style or content
- 2-3: Weak continuation with major inconsistencies
- 0-1: Minimal connection to the original text
Content (6 points): Evaluate creativity, development of ideas, and storyline.
- 5-6: Creative, well-developed content that extends the original effectively with rich details
- 3-4: Mostly relevant content with adequate development
- 1-2: Limited development, predictable or simplistic extension
- 0: Minimal content with little relevance to the original
Grammar (5 points): Assess grammatical accuracy.
- 4-5: Few grammatical errors that don’t impede understanding
- 2-3: Some grammatical errors that occasionally impede understanding
- 0-1: Frequent grammatical errors that significantly impede understanding
Vocabulary (5 points): Evaluate word choice and vocabulary range.
- 4-5: Wide range of vocabulary, appropriate word choice, and effective use of expressions
- 2-3: Adequate range, some inappropriate word choices
- 0-1: Limited vocabulary, frequent inappropriate word choices
OUTPUT FORMAT:
- Calculate a total score (out of 25) by adding the scores from each criterion.
- Provide detailed feedback for each criterion.
- Identify specific errors with examples from the text.
- Provide a model continuation as an exemplar for this prompt.
Your evaluation must be detailed, fair, and educationally helpful. Consider Chinese students’ common challenges with English continued writing when providing feedback. Pay special attention to how well the student matches the style, tone, and narrative elements of the original passage.
配置好教师账户后,即可返回登陆
可行性验证
通过实践而发现真理,又通过实践而证实真理和发展真理。——毛泽东主席
可行性验证是必不可少的环节,可能需要邀请一些志愿学生来参加,进而根据结果考虑调整System Prompt或者换用大语言模型,以保证评分的准确性和合理性
提示提示
暂未在学校进行实验,此处数据待补充
我们准备使用50份高考英语作文进行测试:
指标 | 人工评分 | 系统评分 | 误差率 |
---|---|---|---|
平均分 | – | – | –% |
最高分 | – | – | –% |
最低分 | – | – | –% |
成本估算
项目 | 一年成本 | 备注 |
---|---|---|
服务器 | ¥0 | 学校实体服务器 |
域名 | ¥0 | 学校子域名,故不计入成本 |
DeepSeek API | ¥234.24 | 按学校4000人、每人作文(应用文写作和读后续写)平均110词、每月月考(所有年级)三次、一年有八个月月考、提示词约300词估算 |
有道智云手写体识别 API | ¥1,344 | 条件同上 |
总计 | ¥1578.24 |
数据来源:DeepSeek API Docs、有道智云AI开放平台
道阻且长
成本
大家也看到了,有道智云手写体识别API调用费用非常高,有人会问为什么不使用Tesseract、PaddleOCR之类的开源的项目减少成本?首先这些都是针对印刷字体的OCR项目,我也想过通过自己训练一个PaddleOCR模型来提高识别准确率,但是根据北师大2023年测评数据,OCR通用模型在识别学生连笔字时准确率骤降至61%,需追加10,000+本土学生笔迹样本训练,这个人力物力成本是更加大的,想要得到更加准确和通用的模型,需要的样本数据还可能远远超过这个数字
规划
- 先使用有道手写体识别API快速上线,逐步接入PaddleOCR训练本土笔迹模型
- 逐步开源