小型项目实战：手写数字识别与文本分类

通过实战项目，可以将理论知识应用到实践中，加深对 AI 技术的理解。下面我们将提供两个基于简单数据集的 AI 项目实战：手写数字识别 和 简单文本分类。

项目 1：手写数字识别（MNIST 数据集）

项目目标：

使用卷积神经网络（CNN）识别手写数字（0-9）。

数据集：

MNIST 数据集：包含 60,000 张训练图像和 10,000 张测试图像，每张图像为 28x28 的灰度图。

实现步骤：

环境准备：

pip install tensorflow keras numpy matplotlib

加载数据：

from tensorflow.keras.datasets import mnist

# 加载数据集
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# 归一化数据
X_train = X_train / 255.0
X_test = X_test / 255.0

构建模型：

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# 构建 CNN 模型
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

# 编译模型
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

训练模型：

# 训练模型
model.fit(X_train.reshape(-1, 28, 28, 1), y_train, epochs=5, validation_data=(X_test.reshape(-1, 28, 28, 1), y_test))

评估模型：

# 评估模型
test_loss, test_acc = model.evaluate(X_test.reshape(-1, 28, 28, 1), y_test)
print(f"Test Accuracy: {test_acc}")

可视化结果：

import matplotlib.pyplot as plt

# 预测结果
predictions = model.predict(X_test.reshape(-1, 28, 28, 1))

# 显示图像和预测结果
plt.imshow(X_test[0], cmap='gray')
plt.title(f"Predicted: {np.argmax(predictions[0])}")
plt.show()

项目 2：简单文本分类（IMDb 数据集）

项目目标：

使用深度学习模型对电影评论进行情感分类（正面/负面）。

数据集：

IMDb 数据集：包含 25,000 条训练评论和 25,000 条测试评论，每条评论标注为正面或负面。

实现步骤：

环境准备：
```
pip install tensorflow keras numpy
```

加载数据：

from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences

# 加载数据集
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=10000)

# 填充序列
X_train = pad_sequences(X_train, maxlen=200)
X_test = pad_sequences(X_test, maxlen=200)

构建模型：

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

# 构建 LSTM 模型
model = Sequential([
    Embedding(10000, 128, input_length=200),
    LSTM(128, dropout=0.2, recurrent_dropout=0.2),
    Dense(1, activation='sigmoid')
])

# 编译模型
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

训练模型：

# 训练模型
model.fit(X_train, y_train, epochs=3, batch_size=64, validation_data=(X_test, y_test))

评估模型：

# 评估模型
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_acc}")

预测新评论：

# 预测新评论
new_review = "This movie was fantastic! I loved every minute of it."
new_review_encoded = [word_index[word] for word in new_review.split() if word in word_index]
new_review_padded = pad_sequences([new_review_encoded], maxlen=200)
prediction = model.predict(new_review_padded)
print(f"Prediction: {'Positive' if prediction > 0.5 else 'Negative'}")

总结与学习建议

项目总结：

项目	数据集	模型	目标	关键步骤
手写数字识别	MNIST	CNN	识别手写数字（0-9）	数据加载、模型构建、训练与评估
文本分类	IMDb	LSTM	情感分类（正面/负面）	数据预处理、模型构建、训练与评估

学习建议：

掌握基础：学习深度学习的基础知识（如 CNN、LSTM）。
动手实践：通过实际项目熟悉工具和模型。
阅读文档：参考 TensorFlow 官方文档和 Keras 官方文档。
扩展项目：尝试改进模型（如增加层数、调整超参数）或应用到其他数据集。

通过完成这些小型项目，你将能够将理论知识应用到实践中，为更复杂的 AI 项目打下坚实的基础。加油！ 🚀