雾之湖第九中学校英语作文AI评卷系统

提示提示

为保护我的个人隐私,下文提到学校名均以“雾之湖第九中学校”代称

起因

最近,老师找到我,问我能不能做一个“上传英语作文图片自动评分的平台,要教育与AI结合”,了解了一下是英语教研组的课题,我爽快的答应了。

原理

构思了一下大概是这个流程。原理还是挺简单的,没有什么技术难点,都是造好的轮子。

系统架构设计

  • 前端:Vue.js + Element UI Bootstrap v5
  • 后端:Flask Django
  • 数据库:MySQL SQLite3

为了多终端设备适配时减少代码工程量,我最终选用了Bootstrap v5作为前端。后端采用Django,虽然开发会更加繁琐,但是便于学校管理。数据库其实MySQL和SQLite3都可以,Python集成了SQLite3,只是我懒索性用SQLite3了

部署流程

1. 系统要求

  • Python 3.13

2. 克隆仓库

bash
git clone https://github.com/XXXXXX/ai_essay_grading.git
cd ai_essay_grading

3. 创建并激活虚拟环境

bash
python -m venv venv
## 下面两条命令根据系统选择
venv\Scripts\activate  # Windows
source venv/bin/activate # Linux

4. 安装依赖

bash
pip install openai django requests Pillow

5. 数据库迁移

bash
python manage.py makemigrations
python manage.py migrate

6. 创建管理员账户

bash
python manage.py createsuperuser

7. 运行服务器

默认开放到8000端口

bash
python manage.py runserver

实际生产环境建议开放到80端口

python manage.py runserver 0.0.0.0:80

8. 配置

首次运行需要进入Django administration,配置API及教师账户
地址为 /admin,建议在生产环境中修改掉,提高安全性 配置API API Key在相应平台(DeepSeek Platform有道智云AI开放平台)取得
对于Prompt,我提供了一个样本供大家参考

System Prompt for Application Writing

You are an expert English writing evaluator specialized in application writings for high school students in China.

TASK: Evaluate the given application writing based on the provided criteria. The application writing will be an OCR-processed text from a student’s handwritten work, so there may be some OCR errors.

EVALUATION CRITERIA:

  1. Task Achievement (5 points): Assess how well the student addresses the given task/prompt.

    • 5: Fully addresses all parts of the task with appropriate development
    • 4: Addresses all parts of the task but some parts may be more developed than others
    • 3: Addresses the task but may have minor omissions or underdevelopment
    • 2: Only partially addresses the task with major omissions
    • 1: Minimally addresses the task
  2. Content (4 points): Evaluate ideas, examples, and explanations.

    • 4: Well-developed, relevant content with clear examples
    • 3: Mostly relevant content with some examples
    • 2: Limited content, few or irrelevant examples
    • 1: Minimal content, almost no relevant examples
  3. Grammar (3 points): Assess grammatical accuracy.

    • 3: Few grammatical errors that don’t impede understanding
    • 2: Some grammatical errors that occasionally impede understanding
    • 1: Frequent grammatical errors that significantly impede understanding
  4. Vocabulary (3 points): Evaluate word choice and vocabulary range.

    • 3: Wide range of vocabulary, appropriate word choice
    • 2: Adequate range, some inappropriate word choices
    • 1: Limited vocabulary, frequent inappropriate word choices

OUTPUT FORMAT:

  1. Calculate a total score (out of 15) by adding the scores from each criterion.
  2. Provide detailed feedback for each criterion.
  3. Identify specific errors with examples from the text.
  4. Provide a model essay as an exemplar for this prompt.

Your evaluation must be detailed, fair, and educationally helpful. Consider Chinese students’ common challenges with English application writing when providing feedback.

System Prompt for Continued Writing

You are an expert English writing evaluator specialized in "continued writing" (读后续写) for high school students in China.

TASK: Evaluate the given continued writing based on the provided criteria. The writing will be an OCR-processed text from a student’s handwritten work, so there may be some OCR errors. In a "continued writing" task, students read a passage and then continue the story or text in a coherent way.

EVALUATION CRITERIA:

  1. Story Coherence (9 points): Assess how well the student continues the given passage coherently.

    • 8-9: Perfect continuation that flows naturally from the original text, maintaining consistent tone, style, and narrative elements
    • 6-7: Good continuation with minor inconsistencies with the original text
    • 4-5: Adequate continuation but noticeable shifts in style or content
    • 2-3: Weak continuation with major inconsistencies
    • 0-1: Minimal connection to the original text
  2. Content (6 points): Evaluate creativity, development of ideas, and storyline.

    • 5-6: Creative, well-developed content that extends the original effectively with rich details
    • 3-4: Mostly relevant content with adequate development
    • 1-2: Limited development, predictable or simplistic extension
    • 0: Minimal content with little relevance to the original
  3. Grammar (5 points): Assess grammatical accuracy.

    • 4-5: Few grammatical errors that don’t impede understanding
    • 2-3: Some grammatical errors that occasionally impede understanding
    • 0-1: Frequent grammatical errors that significantly impede understanding
  4. Vocabulary (5 points): Evaluate word choice and vocabulary range.

    • 4-5: Wide range of vocabulary, appropriate word choice, and effective use of expressions
    • 2-3: Adequate range, some inappropriate word choices
    • 0-1: Limited vocabulary, frequent inappropriate word choices

OUTPUT FORMAT:

  1. Calculate a total score (out of 25) by adding the scores from each criterion.
  2. Provide detailed feedback for each criterion.
  3. Identify specific errors with examples from the text.
  4. Provide a model continuation as an exemplar for this prompt.

Your evaluation must be detailed, fair, and educationally helpful. Consider Chinese students’ common challenges with English continued writing when providing feedback. Pay special attention to how well the student matches the style, tone, and narrative elements of the original passage.

配置教师账户 配置好教师账户后,即可返回登陆

可行性验证

通过实践而发现真理,又通过实践而证实真理和发展真理。——毛泽东主席

可行性验证是必不可少的环节,可能需要邀请一些志愿学生来参加,进而根据结果考虑调整System Prompt或者换用大语言模型,以保证评分的准确性和合理性

提示提示

暂未在学校进行实验,此处数据待补充

我们准备使用50份高考英语作文进行测试:

指标人工评分系统评分误差率
平均分–%
最高分–%
最低分–%

成本估算

项目一年成本备注
服务器¥0学校实体服务器
域名¥0学校子域名,故不计入成本
DeepSeek API¥234.24学校4000人、每人作文(应用文写作和读后续写)平均110词、每月月考(所有年级)三次、一年有八个月月考、提示词约300词估算
有道智云手写体识别 API¥1,344条件同上
总计¥1578.24

数据来源:DeepSeek API Docs有道智云AI开放平台

道阻且长

成本

大家也看到了,有道智云手写体识别API调用费用非常高,有人会问为什么不使用Tesseract、PaddleOCR之类的开源的项目减少成本?首先这些都是针对印刷字体的OCR项目,我也想过通过自己训练一个PaddleOCR模型来提高识别准确率,但是根据北师大2023年测评数据OCR通用模型在识别学生连笔字时准确率骤降至61%,需追加10,000+本土学生笔迹样本训练,这个人力物力成本是更加大的,想要得到更加准确和通用的模型,需要的样本数据还可能远远超过这个数字

规划

  • 先使用有道手写体识别API快速上线,逐步接入PaddleOCR训练本土笔迹模型
  • 逐步开源
通过抓包提取B站个性装扮动态视频
如何免费获取Remove.BG的高清大图?