There are two corpora - mostly English (trec06p) and Chinese (trec06c). trec06p/full/ -- Ideal feedback English corpus trec06p/full-delay/ -- Delayed feedback English corpus trec06c/full/ -- Ideal feedback Chinese corpus trec06c/delay/ -- Delayed fe
本代码实现了朴素贝叶斯分类器(假设了条件独立的版本),常用于垃圾邮件分类,进行了拉普拉斯平滑。
关于朴素贝叶斯算法原理可以参考博客中原理部分的博文。
#!/usr/bin/python
# -*- coding: utf-8 -*-
from math import log
from numpy import*
import operator
import matplotlib
import matplotlib.pyplot as plt
from os import listdir
def