内容介绍

项目介绍

该个人智能语音助理的需求源于对更加便捷、高效和智能的生活方式的追求。期望通过语音命令轻松管理日常任务、获取信息、控制家庭设备、进行语音搜索以及获得定制化建议，从而提升个人和家庭生活的体验。能够满足人们对更智能、互联和个性化服务的需求。
具体需求如下：

支持在线语音交互，能够查询天气、查找信息、答案问题等，以及播放音乐
支持离线使用，能够支持TF卡保存音频资源，并离线播放
内置锂电池，具有电池充放电管理，支持长时间待机。

市场应用介绍

个人智能语音助理在市场应用中提供了多样化的功能，包括日程管理、智能家居控制、媒体播放、在线购物、语言学习、社交媒体互动、健康追踪、旅行规划等。它们通过语音命令实现便捷操作，提高用户的生活质量，助力各个领域的个人和家庭生活更加智能和互联。

项目设计思路

该项目采用ESP32-WROVER-E作为主控，由于项目需求较复杂，故采用freertos系统进行编程，进而保证的系统运行的实时性。编程采用ESP-ADF进行编程。

项目方案框图和原理图解释

方案框图

框图的分享链接如下：https://www.digikey.cn/schemeit/project/story-teller-9d42eb69cb3d4171adabef9b0b8aaa3a
FmY32CP6YH3pyPsP9cyZSvNZlhkI
整个系统主要包括电源电路、SD卡电路、按键输入电路、音频输出电路和麦克风电路。

系统主要分为在线模式和离线模式。在线模式使用Smartconfig进行配网，语音交互采用语音唤醒和VAD，当用户话音结束后，将语音数据上传至百度云语音识别API，之后将识别到的文字传至百度文心一言平台，并获得相应的对话回复，之后使用自己搭建的tts服务器将对话回复文本转换为音频文件播放。在离线模式下，通过按键控制音频的播放暂停和上一首、下一首以及音量控制。

原理图

电源电路

在电源电路中，为了保证供电的稳定性，用两个稳压模块对ESP32和音频电路分别供电。

FrvtwRIS18841KFJ9jLWAKO-KuF4

主控电路

主控芯片采用ESP32-WROVER-E，是一款通用型 Wi-Fi + Bluetooth + Bluetooth LE MCU 模组，功能强大，用途广泛，可以用于低功耗传感器网络和要求极高的任务，例如语音编码、音频流和 MP3 解码等。ESP32-WROVER-E-N8R8具有8M的SRAM和8M的Flash，能够满足语音识别所需要的运行资源。

按键电路

按键分别包括模式切换、播放\暂停、上一首、下一首、音量加、音量减，采用ADC按键，只需占用一个GPIO口，占用资源较小。

SD卡电路

SD卡采用单线驱动方式，节省GPIO口资源。

音频电路

使用ADC芯片ES7243E将麦克风采集到的信号传递给esp32。ES7243E是顺芯广泛推广的高性能先进多位Delta-sigma音频ADC芯片，具有高达 -90dB THD+N的失真度，24-bit,8~48 KHz的采样率，并且具有自动电平控制（ALC）和噪声门，极具高性价比优势。
音频放大部分采用NS4150B芯片。NS4150B是一款超低EMI、无需滤波器的3W单声道D类音频功率放大器。NS4150B采用先进的技术，在全带宽范围内极大地降低了 EMI 干扰，最大限度地减少对其他部件的影响。NS4150B 内置过流保护、过热保护及欠压保护功能，有效地保护芯片在异常工作状况下不被损坏。并且利用扩频技术充分优化全新电路设计，高达90%的效率更加适合于便携式音频产品。NS4150B 无需滤波器的 PWM 调制结构及增益内置方式减少了外部元件、PCB 面积和系统成本。

FgsjmaK8hYg2hfqTx7EPhTZk-E_I

设计中用到规定厂商的元器件介绍

ESP32-WROVER-E 和 ESP32-WROVER-IE 是两款通用型 Wi-Fi + Bluetooth + Bluetooth LE MCU 模组，功能强大，用途广泛，可以用于低功耗传感器网络和要求极高的任务，例如语音编码、音频流和MP3解码等。 ESP32-WROVER-E 采用 PCB 板载天线。该模组有多种版本，分别具有不同大小的Flash和Psram。本项目采用的是8M Flash和 8M Psram的模组。

ESP32-WROVER-E 和 ESP32-WROVER-IE 采用的芯片是 ESP32 系列的ESP32-D0WD-V3 或 ESP32-D0WDR2-V3*。ESP32-D0WD-V3 和 ESP32-D0WDR2-V3 芯片具有可扩展、自适应的特点。两个CPU 核可以被单独控制。CPU时钟频率的调节范围为80MHz到240MHz。用户可以关闭CPU的电源，利用低功耗协处理器监测外设的状态变化或某些模拟量是否超出阈值。ESP32还集成了丰富的外设，包括电容式触摸传感器、SD卡接口、以太网接口、高速SPI、UART、I2S和I2C等。

ESP32 的操作系统是带有LwIP的freeRTOS，还内置了带有硬件加速功能的TLS1.2。芯片同时支持OTA加密升级，方便用户在产品发布之后继续升级。

参考文档：esp32-wrover-e_esp32-wrover-ie_datasheet_cn

PCB绘制打板介绍及遇到的问题和解决方法

在PCB设计的时候由于原理图检查不仔细，导致某条电源线没接到芯片上，最后使用飞线暂时解决，并在原理图和PCB上进行了修改。因此在检查原理图时应该仔细加仔细。
在购买芯片是没有注意，ES7243和ES7243E是两款不同的芯片，虽然ESP-ADF都支持，但是在编程的时候还是有区别的，从而导致了刚开始的时候以为是电路板绘制和焊接又有问题了，最后通过仔细对比原理图和代码找到了问题。
板子采用手工焊接，封装主要采用0805。但其中ES8311和ES7243为QFN封装，用烙铁焊接较为困难。并且这是我首次焊接QFN封装芯片（最开始开始尝试用电烙铁焊接，结果报废了两个板子），建议使用热风枪，我使用的是低温无铅焊锡和小型DIY热风枪。

关键代码及说明

smartconfig配网

static void smartconfig_task(void * parm)
{
    EventBits_t uxBits;
    ESP_ERROR_CHECK( esp_smartconfig_set_type(SC_TYPE_ESPTOUCH) );
    smartconfig_start_config_t cfg = SMARTCONFIG_START_CONFIG_DEFAULT();
    ESP_ERROR_CHECK( esp_smartconfig_start(&cfg) );
    while (1) {
        uxBits = xEventGroupWaitBits(s_wifi_event_group, CONNECTED_BIT | ESPTOUCH_DONE_BIT, true, false, portMAX_DELAY);
        if(uxBits & CONNECTED_BIT) {
            ESP_LOGI(TAG, "WiFi Connected to ap");
        }
        if(uxBits & ESPTOUCH_DONE_BIT) {
            ESP_LOGI(TAG, "smartconfig over");
            esp_smartconfig_stop();
            vEventGroupDelete(s_wifi_event_group);
            vTaskDelete(NULL);
        }
    }
}

static void wifi_event_handler(void* arg, esp_event_base_t event_base,
                                int32_t event_id, void* event_data)
{
    if (event_base == WIFI_EVENT && event_id == WIFI_EVENT_STA_START) {
        xTaskCreate(smartconfig_task, "smartconfig_task", 4096, NULL, 3, NULL);
    } else if (event_base == WIFI_EVENT && event_id == WIFI_EVENT_STA_DISCONNECTED) {
        esp_wifi_connect();
        xEventGroupClearBits(s_wifi_event_group, CONNECTED_BIT);
    } else if (event_base == IP_EVENT && event_id == IP_EVENT_STA_GOT_IP) {
        xEventGroupSetBits(s_wifi_event_group, CONNECTED_BIT);
    } else if (event_base == SC_EVENT && event_id == SC_EVENT_SCAN_DONE) {
        ESP_LOGI(TAG, "Scan done");
    } else if (event_base == SC_EVENT && event_id == SC_EVENT_FOUND_CHANNEL) {
        ESP_LOGI(TAG, "Found channel");
    } else if (event_base == SC_EVENT && event_id == SC_EVENT_GOT_SSID_PSWD) {
        ESP_LOGI(TAG, "Got SSID and password");

        smartconfig_event_got_ssid_pswd_t *evt = (smartconfig_event_got_ssid_pswd_t *)event_data;
        wifi_config_t wifi_config;

        bzero(&wifi_config, sizeof(wifi_config_t));
        memcpy(wifi_config.sta.ssid, evt->ssid, sizeof(wifi_config.sta.ssid));
        memcpy(wifi_config.sta.password, evt->password, sizeof(wifi_config.sta.password));
        wifi_config.sta.bssid_set = evt->bssid_set;
        if (wifi_config.sta.bssid_set == true) {
            memcpy(wifi_config.sta.bssid, evt->bssid, sizeof(wifi_config.sta.bssid));
        }
        char ssid[33] = { 0 };
        char password[65] = { 0 };
        memcpy(ssid, (char *)evt->ssid, sizeof(evt->ssid));
        memcpy(password, (char *)evt->password, sizeof(evt->password));
        nvs_set_sta(ssid, password);

        ESP_ERROR_CHECK( esp_wifi_disconnect() );
        ESP_ERROR_CHECK( esp_wifi_set_config(WIFI_IF_STA, &wifi_config) );
        esp_wifi_connect();
    } else if (event_base == SC_EVENT && event_id == SC_EVENT_SEND_ACK_DONE) {
        xEventGroupSetBits(s_wifi_event_group, ESPTOUCH_DONE_BIT);
    }
}

void smartconfig_wifi(void)
{
    s_wifi_event_group = xEventGroupCreate();
    esp_wifi_stop();
    esp_wifi_deinit();
    wifi_init_config_t cfg = WIFI_INIT_CONFIG_DEFAULT();
    ESP_ERROR_CHECK( esp_wifi_init(&cfg) );

    ESP_ERROR_CHECK( esp_event_handler_register(WIFI_EVENT, ESP_EVENT_ANY_ID, &wifi_event_handler, NULL) );
    ESP_ERROR_CHECK( esp_event_handler_register(IP_EVENT, IP_EVENT_STA_GOT_IP, &wifi_event_handler, NULL) );
    ESP_ERROR_CHECK( esp_event_handler_register(SC_EVENT, ESP_EVENT_ANY_ID, &wifi_event_handler, NULL) );

    ESP_ERROR_CHECK( esp_wifi_set_mode(WIFI_MODE_STA) );
    ESP_ERROR_CHECK( esp_wifi_start() );
}

语音唤醒和VAD配置

static void start_recorder()
{
    audio_element_handle_t i2s_stream_reader;
    audio_pipeline_cfg_t pipeline_cfg = DEFAULT_AUDIO_PIPELINE_CONFIG();
    pipeline_rec = audio_pipeline_init(&pipeline_cfg);
    if (NULL == pipeline_rec) {
        return;
    }

    i2s_stream_cfg_t i2s_cfg = I2S_STREAM_CFG_DEFAULT();
    i2s_cfg.i2s_port = CODEC_ADC_I2S_PORT;
    i2s_cfg.i2s_config.use_apll = 0;
    i2s_cfg.i2s_config.sample_rate = CODEC_ADC_SAMPLE_RATE;
    i2s_cfg.i2s_config.bits_per_sample = CODEC_ADC_BITS_PER_SAMPLE;
    i2s_cfg.type = AUDIO_STREAM_READER;
    i2s_stream_reader = i2s_stream_init(&i2s_cfg);

    audio_element_handle_t filter = NULL;
#if CODEC_ADC_SAMPLE_RATE != (16000)
    rsp_filter_cfg_t rsp_cfg = DEFAULT_RESAMPLE_FILTER_CONFIG();
    rsp_cfg.src_rate = CODEC_ADC_SAMPLE_RATE;
    rsp_cfg.dest_rate = 16000;
    filter = rsp_filter_init(&rsp_cfg);
#endif

    raw_stream_cfg_t raw_cfg = RAW_STREAM_CFG_DEFAULT();
    raw_cfg.type = AUDIO_STREAM_READER;
    raw_read = raw_stream_init(&raw_cfg);

    audio_pipeline_register(pipeline_rec, i2s_stream_reader, "i2s");
    audio_pipeline_register(pipeline_rec, raw_read, "raw");

    if (filter) {
        audio_pipeline_register(pipeline_rec, filter, "filter");
        const char *link_tag[3] = {"i2s", "filter", "raw"};
        audio_pipeline_link(pipeline_rec, &link_tag[0], 3);
    } else {
        const char *link_tag[2] = {"i2s", "raw"};
        audio_pipeline_link(pipeline_rec, &link_tag[0], 2);
    }

    audio_pipeline_run(pipeline_rec);
    ESP_LOGI(TAG, "Recorder has been created");

    recorder_sr_cfg_t recorder_sr_cfg = DEFAULT_RECORDER_SR_CFG();
    recorder_sr_cfg.afe_cfg.memory_alloc_mode = AFE_MEMORY_ALLOC_MORE_PSRAM;
    recorder_sr_cfg.afe_cfg.wakenet_init = WAKENET_ENABLE;
    // recorder_sr_cfg.feed_task_core=0;
    // recorder_sr_cfg.fetch_task_core=0;
    recorder_sr_cfg.multinet_init = false;
    recorder_sr_cfg.afe_cfg.aec_init = RECORD_HARDWARE_AEC;
    recorder_sr_cfg.afe_cfg.agc_mode = AFE_MN_PEAK_NO_AGC;

    audio_rec_cfg_t cfg = AUDIO_RECORDER_DEFAULT_CFG();
    cfg.read = (recorder_data_read_t)&input_cb_for_afe;
    cfg.sr_handle = recorder_sr_create(&recorder_sr_cfg, &cfg.sr_iface);
    cfg.event_cb = rec_engine_cb;
    cfg.vad_off = 500;
    recorder = audio_recorder_create(&cfg);
}

在线tts服务器搭建

from flask import Flask, current_app, redirect, url_for, request, send_from_directory
import json
# import pyttsx4
import zhtts
from pydub import AudioSegment
import os 
# 实例化app
app = Flask(import_name=__name__)

UPLOAD_PATH = os.path.dirname(__file__)

# 通过methods设置POST请求
@app.route('/tts', methods=["POST"])
def json_request():

    # 接收处理json数据请求
    data = json.loads(request.data) # 将json字符串转为dict
    tts_str = data['data']
    tokens=request.args["tokens"]
    if tokens =="righttokens":
        # engine = pyttsx4.init()
        # engine.setProperty('voice','Hakka Chinese')
        # engine.setProperty('rate', 150)   #设置语速
        # engine.setProperty('volume',1)  #设置音量
        # engine.save_to_file(tts_str, "res.mp3")
        # engine.runAndWait()
        tts = zhtts.TTS()
        tts.text2wav(tts_str,"res.wav")
        wav_audio = AudioSegment.from_file("res.wav", format="wav")
        file_handle = wav_audio.export("res.mp3", format="mp3", bitrate="16k")
        return send_from_directory(UPLOAD_PATH,"res.mp3")
    else:
        return "tokens error!"

@app.route('/music', methods=["POST"])
def music_request():

    song=request.args["song"]
    tokens=request.args["tokens"]
    if tokens =="righttokens":
        tts = zhtts.TTS()
        tts.text2wav(tts_str,"res.wav")
        wav_audio = AudioSegment.from_file("res.wav", format="wav")
        file_handle = wav_audio.export("res.mp3", format="mp3", bitrate="16k")
        return send_from_directory(UPLOAD_PATH,"res.mp3")
    else:
        return "tokens error!"

if __name__ == '__main__':
    app.run(host="127.0.0.1",port=2000)

http流音频播放

void http_mp3_play(void *arg)
{

    audio_pipeline_handle_t pipeline;
    audio_element_handle_t http_stream_reader, i2s_stream_writer, mp3_decoder;

    ESP_LOGI(TAG, "[ 1 ] Start audio codec chip");
    audio_board_handle_t board_handle = audio_board_init();
    audio_hal_ctrl_codec(board_handle->audio_hal, AUDIO_HAL_CODEC_MODE_DECODE, AUDIO_HAL_CTRL_START);

    ESP_LOGI(TAG, "[2.0] Create audio pipeline for playback");
    audio_pipeline_cfg_t pipeline_cfg = DEFAULT_AUDIO_PIPELINE_CONFIG();
    pipeline = audio_pipeline_init(&pipeline_cfg);
    mem_assert(pipeline);

    ESP_LOGI(TAG, "[2.1] Create http stream to read data");
    http_stream_cfg_t http_cfg = HTTP_STREAM_CFG_DEFAULT();
    http_stream_reader = http_stream_init(&http_cfg);

    ESP_LOGI(TAG, "[2.2] Create i2s stream to write data to codec chip");
    i2s_stream_cfg_t i2s_cfg = I2S_STREAM_CFG_DEFAULT();
    i2s_cfg.type = AUDIO_STREAM_WRITER;
    i2s_stream_writer = i2s_stream_init(&i2s_cfg);

    ESP_LOGI(TAG, "[2.3] Create mp3 decoder to decode mp3 file");
    mp3_decoder_cfg_t mp3_cfg = DEFAULT_MP3_DECODER_CONFIG();
    mp3_decoder = mp3_decoder_init(&mp3_cfg);

    ESP_LOGI(TAG, "[2.4] Register all elements to audio pipeline");
    audio_pipeline_register(pipeline, http_stream_reader, "http");
    audio_pipeline_register(pipeline, mp3_decoder,        "mp3");
    audio_pipeline_register(pipeline, i2s_stream_writer,  "i2s");

    ESP_LOGI(TAG, "[2.5] Link it together http_stream-->mp3_decoder-->i2s_stream-->[codec_chip]");
    const char *link_tag[3] = {"http", "mp3", "i2s"};
    audio_pipeline_link(pipeline, &link_tag[0], 3);

    ESP_LOGI(TAG, "[2.6] Set up  uri (http as http_stream, mp3 as mp3 decoder, and default output is i2s)");
    audio_element_set_uri(http_stream_reader, (char *)arg);

    // Example of using an audio event -- START
    ESP_LOGI(TAG, "[ 4 ] Set up  event listener");
    audio_event_iface_cfg_t evt_cfg = AUDIO_EVENT_IFACE_DEFAULT_CFG();
    audio_event_iface_handle_t evt = audio_event_iface_init(&evt_cfg);

    ESP_LOGI(TAG, "[4.1] Listening event from all elements of pipeline");
    audio_pipeline_set_listener(pipeline, evt);

    ESP_LOGI(TAG, "[ 5 ] Start audio_pipeline");
    audio_pipeline_run(pipeline);

    audio_hal_set_volume(board_handle->audio_hal, 100);

    EventBits_t bits=xEventGroupGetBits(task_event_group);
    while (bits&TASK_RUN_BIT) {
        audio_event_iface_msg_t msg;
        esp_err_t ret = audio_event_iface_listen(evt, &msg, 500/portTICK_PERIOD_MS);
        if (ret != ESP_OK) {
            ESP_LOGE(TAG, "Event interface not ready");
        }else{
            if (msg.source_type == AUDIO_ELEMENT_TYPE_ELEMENT
                && msg.source == (void *) mp3_decoder
                && msg.cmd == AEL_MSG_CMD_REPORT_MUSIC_INFO) {
                audio_element_info_t music_info = {0};
                audio_element_getinfo(mp3_decoder, &music_info);

                ESP_LOGI(TAG, "[ * ] Receive music info from mp3 decoder, sample_rates=%d, bits=%d, ch=%d",
                        music_info.sample_rates, music_info.bits, music_info.channels);

                i2s_stream_set_clk(i2s_stream_writer, music_info.sample_rates, music_info.bits, music_info.channels);
                continue;
            }

            /* Stop when the last pipeline element (i2s_stream_writer in this case) receives stop event */
            if (msg.source_type == AUDIO_ELEMENT_TYPE_ELEMENT && msg.source == (void *) i2s_stream_writer
                && msg.cmd == AEL_MSG_CMD_REPORT_STATUS
                && (((int)msg.data == AEL_STATUS_STATE_STOPPED) || ((int)msg.data == AEL_STATUS_STATE_FINISHED))) {
                ESP_LOGW(TAG, "[ * ] Stop event received");
                break;
            }
        }
        bits=xEventGroupGetBits(task_event_group);
    }
    // Example of using an audio event -- END

    ESP_LOGI(TAG, "[ 6 ] Stop audio_pipeline");
    audio_pipeline_stop(pipeline);
    audio_pipeline_wait_for_stop(pipeline);
    audio_pipeline_terminate(pipeline);

    /* Terminate the pipeline before removing the listener */
    audio_pipeline_unregister(pipeline, http_stream_reader);
    audio_pipeline_unregister(pipeline, i2s_stream_writer);
    audio_pipeline_unregister(pipeline, mp3_decoder);

    audio_pipeline_remove_listener(pipeline);

    /* Make sure audio_pipeline_remove_listener & audio_event_iface_remove_listener are called before destroying event_iface */
    audio_event_iface_destroy(evt);

    /* Release all resources */
    audio_pipeline_deinit(pipeline);
    audio_element_deinit(http_stream_reader);
    //audio_element_deinit(i2s_stream_writer);
    audio_element_deinit(mp3_decoder);
    vTaskDelete(NULL);
}

功能展示及说明

硬件设备展示

FiKR2VXDvQNa-r_IRLumlzR365wy

心得体会

这是我第一次参加该项目，也是我第一次独立完整设计一个项目，包括最初的构思、绘制板子、焊接调试。非常感谢硬禾学堂能够给我这样一次锻炼的机会！在这次活动中，虽然我也遇到了很多问题，包括原理图和PCB的绘制，以及代码的编写，其中主要的问题还是在于代码的编写，由于对于ESP-IDF框架的不熟悉，导致遇到了很多编译错误，好在最后通过查阅网上和书本上的相关资料得到了解决，这也使我对于ESP-IDF编程有了进一步的了解，也锻炼了我解决问题的能力。最后再次感谢硬禾学堂给了我一次锻炼自己的机会！