LiveVideoStack » 为什么OpenCV计算的帧率是错误的？

为什么OpenCV计算的帧率是错误的？

OpenCV 视频技术

LiveVideoStack 2022年6月27日

作者：王伟

编辑：Alex

引言

我们有一个平台来周期性地对线上的直播流数据进行某些检测，例如黑/白屏检测、静态画面检测……在检测中，我们会根据提取到的直播流的帧率来预估要计算的帧数量，例如，如果要检测5s的直播流，而该直播流的帧率为20fps，需要计算的帧数量则为100。忽然有一天，我们发现，平台开始大面积的超时，之前只需要2s就能完成的计算，现在却需要30+分钟。查了之后，我们发现，之所以计算超时是因为OpenCV计算的帧率为2000，从而导致需要计算的帧数量从之前的100变为了10000，进而引起了计算超时。

1、OpenCV 如何计算帧率

这个问题的具体描述可以参见 OpenCV Issues 21006[1]。该问题的模拟直播流片段test.ts可以点击链接下载：

https://pan.baidu.com/share/init?surl=RY0Zk5C_DOEwTXYe2SLFEg，下载提取码为x87m。

如果用如下的代码获取test.ts的fps，

const double FPS = cap.get(cv::CAP_PROP_FPS);
std::cout << "fps: " << FPS << std::endl;

可以得到：

$ fps: 2000

用ffprobe对视频进行分析，可以得到：

codec_name=h264
r_frame_rate=30/1
avg_frame_rate=0/0
……

从 opencv/modules/videoio/src/cap_ffmpeg_impl.hpp[2]中，我们发现fps由CvCapture_FFMPEG::get计算而来，其计算逻辑如下：

double fps = r2d(ic->streams[video_stream]->avg_frame_rate);
if (fps < eps_zero) {
    fps = 1.0 / r2d(ic->streams[video_stream]->codec->time_base);
}

2、为什么OpenCV得到的帧率是错的

利用test_time_base.cpp[3]，我们可以得到：

time_base: 1/2000
framerate: 0/0
avg_framerate: 0/0
r2d(ic->streams[video_stream]->avg_frame_rate) = 0

所以OpenCV采用了：

1.0 / r2d(ic->streams[video_stream]->codec->time_base)

来计算该视频的fps。而此处的time_base = 1/2000，因此，最终得到的fps是2000。

也就是说，AVStream->codec->time_base的值导致了OpenCV得到一个看起来是错误的fps。那么，AVStream->codec->time_base为什么是这个值呢？FFmpeg是怎么计算这个字段的呢？

3、FFmpeg 如何计算

AVCodecContext.time_base

AVStream->codec->time_base是AVCodecContext中定义的 time_base字段，根据libavcodec/avcodec.h[4] 中的定义可知，对于解码而言，time_base已经被废弃，需要使用framerate来替换 time_base。并且，对于固定帧率而言，time_base = 1/framerate，但并非总是如此。

利用H264Naked[5]对test.ts对应的H.264码流进行分析，我们得到SPS.Vui信息：

timing_info_present_flag :1
num_units_in_tick :1
time_scale :2000
fixed_frame_rate_flag :0

从中可以看到，test.ts是非固定帧率视频。从test_time_base.cpp[3]的结果看，test.ts视频中，framerate = 0/0，而time_base = 1/2000。

难道，对于非固定帧率视频而言，time_base和framerate之间没有关联？如果存在关联，那又是怎样的运算才能产生这种结果？这个 time_base究竟是怎么计算的呢？究竟和framerate有没有关系呢？一连串的问题随之而来……

源码面前，了无秘密。接下来，带着这个问题，我们来一起分析一下FFmpeg究竟是如何处理time_base的。

3.1 avformat_find_stream_info

在 FFmpeg中，avformat_find_stream_info() 对ic->streams[video_stream]->codec进行初始化，因此我们可以从 avformat_find_stream_info() 开始分析。

从 libavformat/avformat.h[6]中，可以得知avformat_open_input()会打开视频流，从中读取相关的信息，然后存储在AVFormatContext中，但是有时候，此处获取的信息并不完整，因此需要调用avformat_find_stream_info()来获取更多的信息。

需要注意的是：

avformat_find_stream_info()会尝试通过解码部分视频帧来获取需要的信息。

/** 
* Read packets of a media file to get stream information. This 
* is useful for file formats with no headers such as MPEG. This 
* function also computes the real framerate in case of MPEG-2 repeat 
* frame mode. 
* The logical file position is not changed by this function; 
* examined packets may be buffered for later processing. 
* 
* @param ic media file handle 
* @param options  If non-NULL, an ic.nb_streams long array of pointers to
*                 dictionaries, where i-th member contains options for 
*                 codec corresponding to i-th stream. 
*                 On return each dictionary will be filled with options that
*                 were not found. 
* @return >=0 if OK, AVERROR_xxx on error 
* 
* @note this function isn't guaranteed to open all the codecs, so 
*       options being non-empty at return is a perfectly normal behavior. 
* 
* @todo Let the user decide somehow what information is needed so that 
*       we do not waste time getting stuff the user does not need. 
*/
int avformat_find_stream_info(AVFormatContext*ic, AVDictionary **options);

avformat_find_stream_info()的整体逻辑大致如下图所示，其中特别需要关注图中所示的 7 个步骤：

3.2 avformat_find_stream_info()的重要步骤说明

STEP 1 设置线程数，避免H.264多线程解码时没有把SPS/PPS信息提取到extradata。

STEP 2 设置AVStream *st，st会在后续的函数调用中一直透传到 try_decode_frame()。

STEP 3 比较简单，这里不再赘述。

STEP 4 设置AVCodecContext *avctx为透传的st->internal->avctx，在后续的解码函数调用中，一直透传的就是这个avctx，因此，从这里开始的执行流程，FFmpeg使用的全部都是st->internal->avctx，而不是st->codec，这里要特别的注意。此处同时会设置解码的线程数，其目的和STEP 1是一致的。

STEP 5 因为之前设置了解码线程数为1，所以此处会调用

ret = avctx->codec->decode(avctx, frame, &got_frame, pkt)

来解码并计算avctx->framerate。注意，此处的avctx实际上是透传而来的st->internal->avctx。计算 framerate的逻辑会在如何计算framerate部分介绍。

STEP 6 根据解码器得到的framerate信息来计算 avctx->time_base，注意此处实际上是st->internal->avctx->time_base。根据如何计算framerate可知，此处framerate = {1000, 1}。根据 AVCodecContext.ticks_per_frame的介绍可知，ticks_per_frame = 2。因此，此处avctx->time_base = {1, 2000}：

avctx->time_base = av_inv_q(av_mul_q({1000, 1}, {2, 1})) = {1, 2000}

STEP 7 这一步可谓是“瞒天过海，明修栈道暗度陈仓”。这一步为了解决API的前向兼容，做了一个替换，把st->internal->avctx->time_base 赋值给了st->codec->time_base，而把s****t->avg_frame_rate 赋值给了 st->codec->framerate。因此：

st->codec->time_base = {1, 2000}
st->codec->framerate = {0, 0}

st->codec->time_base 的计算和 st->codec->framerate 之间没有任何关系，而是和 st->internal->avctx->framerate 有关。究其本质，是和sps.time_scale，sps.num_units_in_tick有关。

st->internal->avctx->time_base.num =
sps->num_units_in_tick * 
st->internal->avctx->ticks_per_frame

st->internal->avctx->time_base.den = sps->time_scale * 
   st->internal->avctx->ticks_per_frame;

st->internal->avctx->time_base = {sps->num_units_in_tick,
sps->time_scale}

3.3 internal->avctx->time_base & internal->framerate

所以实际上，internal->avctx->time_base为：

avctx->time_base = sps->num_units_in_tick /
sps->time_scale

而internal->avctx->framerate则是：

avctx->framerate = sps->time_scale /
(sps->num_units_in_tick * avctx->ticks_per_frame)

因此，对于 H.264 码流而言，time_base = 1 / (2 * framerate)，而不是1 / framerate。

这也就是为什么

libavcodec/avcodec.h[4] 中说：

* This often, but not always is the inverse of the frame rate or field rate
* for video.

从如上的分析可以知道：

avctx->framerate = 1 / (avctx->time_base * avctx->ticks_per_frame)

因此，当st->avg_frame_rate = 0 时，OpenCV计算fps的逻辑是错误的。

在H.265中，ticks_per_frame = 1，因此对于H.265的编码，OpenCV是没有这个问题的。可以使用Zond 265 [7]工具来分析一个 H.265的视频码流，然后对照OpenCV以及FFmpeg的结果来验证。

同时，正是如上所示的STEP 7中的移花接木导致了 test_time_base.cpp[3] 的结果：

st->codec->framerate: 0/0
st->codec->time_base: 1/2000

3.4 ff_h264_decoder

libavcodec/decode.c [8]中的

decode_simple_internal()会调用对应的解码器来进行解码（STEP 5）。而正如前所示，test.ts为H.264 编码的视频流，因此此处会调用 H.264 解码器来进行解码。在FFmpeg中，H.264解码器位于 libavcodec/h264dec.c[9] 中定义的

const AVCodec ff_h264_decoder。

const AVCodec ff_h264_decoder = {
    .name                  = "h264",
    .type                  = AVMEDIA_TYPE_VIDEO,
    .id                    = AV_CODEC_ID_H264,
    .priv_data_size        = sizeof(H264Context),
    .init                  = h264_decode_init,
    .close                 = h264_decode_end,
    .decode                = h264_decode_frame,
    ......
};

在上文图中的STEP 5中，

ret = avctx->codec->decode(avctx, frame, &got_frame, pkt);

实际调用的就是：

ff_h264_decoder->h264_decode_frame(avctx, frame, &got_frame, pkt);

而此处的avctx也就是

try_decode_frame()中透传下来的st->internal->avctx，即上文图中的STEP 4。

3.5 h264_decode_frame

h264_decode_frame()的整体逻辑如下图所示：

3.6 AVCodecContext.ticks_per_frame

后面会用到ticks_per_frame来计算framerate。在STEP 6中计算 time_base的时候也用到了该值。因此，有必要做一下特殊说明。在H.264解码器中，ticks_per_frame=2，其具体的取值可以从如下几处得知：

libavcodec/avcodec.h [4]中的字段说明：

/**
* For some codecs, the time base is closer to the field rate than the frame rate. 
* Most notably, H.264 and MPEG-2 specify time_base as half of frame duration 
* if no telecine is used ... 
* 
* Set to time_base ticks per frame. Default 1, e.g., H.264/MPEG-2 set it to 2.
 */
 int ticks_per_frame;

libavcodec/h264dec.c [9]中的 h264_decode_init()：

avctx->ticks_per_frame = 2;

4、如何计算framerate

STEP 1 根据整体的计算流程可知，此处的h实际上就是

avformat_find_stream_info() 中的

st->internal->avctx->priv_data。h会一直透传到之后的所有流程，这个务必要注意。

STEP 2 此处会首先获取到sps的相关信息，以备后续的计算使用，我们可以再次看一下test.ts sps[10] 的相关信息。

timing_info_present_flag :1
num_units_in_tick :1
time_scale :2000
fixed_frame_rate_flag :0

STEP 3 根据sps的相关信息计算framerate，在上文的STEP 6中计算 time_base用到的framerate就是在此处计算的。因为 timing_info_present_flag = 1，因此会执行计算framerate的逻辑：

avctx->framerate.den = sps->num_units_in_tick * h->avctx->ticks_per_frame = 1 * 2 = 2
avctx->framerate.num = sps->time_scale = 2000
avctx->framerate = (AVRational){1000, 1}

因此，

st->internal->avctx->framerate = {1000, 1}

但是，因为avctx->time_base={1,2000}，所以OpenCV计算出来的帧率结果为2000。导致这种不一致的原因在于，OpenCV在使用codec->time_base计算帧率的时候没有考虑ticks_per_frame。因此，对于OpenCV而言，正确的计算帧率的方式应该为：

double fps = r2d(ic->streams[video_stream]->avg_frame_rate);
if (fps < eps_zero) {
fps = 1.0 / r2d(ic->streams[video_stream]->codec->time_base * ic->streams[video_stream]->codec->ticks_per_frame);
}

结论

通过上面的分析我们可以知道：

FFmpeg在计算 AVCodecContex 中的framerate和time_base的时候，会用到：

o sps.time_scale

o sps.num_units_in_tick

o AVCodecContex.ticks_per_frame
在 FFmpeg 中，framerate和time_base的关系为：

o framerate = 1 / (time_base * ticks_per_frame)

o time_base = 1 / (framerate * ticks_per_frame)
对于非 H.264/MPEG-2，

ticks_per_frame=1，因此framerate和time_base是互为倒数的关系。而对于H.264/MPEG-2 而言，ticks_per_frame=2，因此，此时二者并非是互为倒数的关系。因而，FFmpeg 中才说，framerate和time_base通常是互为倒数的关系，但并非总是如此。
在OpenCV中，对于H.264/MPEG-2视频而言，当

AVStream.avg_frame_rate=0时，其计算fps的逻辑存在BUG。
因为在解码时，

AVCodecContex.time_base已经废弃，同时 AVStream.avctx也已经废弃，而

avformat_find_stream_info() 中为了兼容老的API，因此会利用 AVStream.internal.avctx和其他的信息来设置AVStream.avctx。而AVStream.avctx.time_base取自AVStream.internal.avctx，AVStream.avctx.framerate 则取自 AVStream.framerate。