[2019 머신러닝 스터디 잼] Speech to Text Transcription with the Cloud Speech API

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

pizzaplanet

[2019 머신러닝 스터디 잼] Speech to Text Transcription with the Cloud Speech API 본문

[2019 머신러닝 스터디 잼] Speech to Text Transcription with the Cloud Speech API

scio 2019. 2. 13. 00:13

트랙 Speech to Text Transcription with the Cloud Speech API

GSP048

Overview

Cloud Speech API를 사용하여 오디오 파일을 80+ 언어 텍스트로 변환할 수 있습니다.

1번 퀵랩 [2019 머신러닝 스터디 잼] Google Cloud Speech API: Qwik Start과 매우 비슷합니다.

이번 트랙에서는 영어 외 프랑스어 실습도 포함되었습니다.

API를 호출하기 위한 URL이 다릅니다.

1번 퀵랩 경우 : https://speech.googleapis.com/v1beta1/speech:syncrecognize?key=${API_KEY}

3번 퀵랩 경우 : https://speech.googleapis.com/v1/speech:syncrecognize?key=${API_KEY}

Cloud Speech-to-Text Release notes를 보면 v1beta1은 사라질 예정이니 v1 사용을 권고하고 있다.

< Cloud Speech-to-Text API Release notes >

What you'll do

Speech API request 만들기 및 curl을 이용한 API 호출
다른 언어 오디오 파일을 이용하여 Speech API 호출

Create An API Key

[2019 머신러닝 스터디 잼] Google Cloud Speech API: Qwik Start의 Create An API Key 부분을 참고하여 API Key를 생성 및 export 하자

Create your Speech API request

1. request_en.json 파일을 작성해주자

1
vim request_en.json
cs

1
2
3
4
5
6
7
8
9
{
  "config": {
      "encoding":"FLAC",
      "languageCode": "en-US"
  },
  "audio": {
      "uri":"gs://cloud-samples-tests/speech/brooklyn.flac"
  }
}
Colored by Color Scripter
cs

config

encoding : API로 전송 될 파일의 유형
sample_rate : API에 보내는 오디오 데이터 비율
language_code : 오디오의 해당 언어. 한국어 : "ko-KR". 모든 지원 언어 리스트

audio

uri : 변환 할 오디오 파일 경로

Call the Speech API

1. curl을 이용하여 Speech API를 Call 해보자.

1
2
curl -s -X POST -H "Content-Type: application/json" --data-binary \
@request_en.json "https://speech.googleapis.com/v1/speech:recognize?key=${API_KEY}"
cs

--data-binary @request_en.json : curl은 POST 시 데이터를 text 취급하므로 binary 데이터가 깨질 수 있다. 제대로 전송하기 위해 --data-binary 옵션을 이용한다.
-s : silent mode. 진행 내역이나 메시지 등을 출력하지 않는다. HTTP response code만 가져올 경우 좋다.
-X : 기본 값을 POST 형식으로 설정
-H : 특정한 HTTP Header를 설정해서 보내야 할 경우(EX: json 등) 사용한다.
speech:recognize : recognize 서비스 이용하겠다
key=${API_KEY} : 앞서 발급 받았던 API Key를 환경변수에 등록했었다. 그 API Key를 사용하겠다는 뜻.

2. Response가 왔다.

1
2
3
4
5
6
7
8
9
10
11
12
{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.98267895
        }
      ]
    }
  ]
}
Colored by Color Scripter
cs

ranscript : 오디오 파일 속 음성을 text화 하였다.
confidence : 결과에 대한 신뢰 스코어 (1==100%)

Speech to text transcription in different languages

1. request_fr.json 파일을 작성해주자

1

vim request_en.json

1
2
3
4
5
6
7
8
9
{
  "config": {
      "encoding":"FLAC",
      "languageCode": "fr"
  },
  "audio": {
      "uri":"gs://speech-language-samples/fr-sample.flac"
  }
}
Colored by Color Scripter
cs

config

encoding : API로 전송 될 파일의 유형
sample_rate : API에 보내는 오디오 데이터 비율
language_code : 오디오의 해당 언어. 한국어 : "ko-KR". 모든 지원 언어 리스트

audio

uri : 변환 할 오디오 파일 경로

Call the Speech API

1. curl을 이용하여 Speech API를 Call 해보자.

1
2
curl -s -X POST -H "Content-Type: application/json" --data-binary \
@request_fr.json "https://speech.googleapis.com/v1/speech:recognize?key=${API_KEY}"
cs

--data-binary @request_en.json : curl은 POST 시 데이터를 text 취급하므로 binary 데이터가 깨질 수 있다. 제대로 전송하기 위해 --data-binary 옵션을 이용한다.
-s : silent mode. 진행 내역이나 메시지 등을 출력하지 않는다. HTTP response code만 가져올 경우 좋다.
-X : 기본 값을 POST 형식으로 설정
-H : 특정한 HTTP Header를 설정해서 보내야 할 경우(EX: json 등) 사용한다.
speech:recognize : recognize 서비스 이용하겠다
key=${API_KEY} : 앞서 발급 받았던 API Key를 환경변수에 등록했었다. 그 API Key를 사용하겠다는 뜻.

2. Response가 왔다.

1
2
3
4
5
6
7
8
9
10
11
12
{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "maître corbeau sur un arbre perché tenait en son bec un fromage",
          "confidence": 0.9710122
        }
      ]
    }
  ]
}
Colored by Color Scripter
cs

ranscript : 오디오 파일 속 음성을 text화 하였다.
confidence : 결과에 대한 신뢰 스코어 (1==100%)

진행영상

참고

[구글 머신러닝 스터디잼 가이드라인]

[Speech to Text Transcription with the Cloud Speech API]

'AI' 카테고리의 다른 글

[2019 머신러닝 스터디 잼] Entity and Sentiment Analysis with the Natural Language API (0)	2019.02.13
[2019 머신러닝 스터디 잼] Cloud Natural Language API: Qwik Start (0)	2019.02.12
[2019 머신러닝 스터디 잼] Google Cloud Speech API: Qwik Start (0)	2019.02.12
Kaggle API install and Competitions Data Download (0)	2018.09.18