manage styles in the output #2

darodi · 2022-01-22T13:42:18Z

First of all, thanks for your script. It was the only converter I found on github actually working when there are styles in the input vtt.

but would it be possible to add font style in the output? :)
For example here is an input:


WEBVTT

STYLE
::cue {
    font-family: Verdana, Arial, Tiresias;
    line-height: 125%;
}
::cue(.white) {
    color: #ffffff;
}
::cue(.lime) {
    color: #00ff00;
}
::cue(.cyan) {
    color: #00ffff;
}
::cue(.red) {
    color: #ff0000;
}
::cue(.yellow) {
    color: #ffff00;
}
::cue(.magenta) {
    color: #ff00ff;
}
::cue(.blue) {
    color: #0000ff;
}
::cue(.black) {
    color: #000000;
}
::cue(.bg_black) {
    background: rgba(0, 0, 0, 0.76);
}

sub0
00:00:07.120 --> 00:00:09.480 line:-1
<c.magenta.bg_black>Musique douce</c>

sub1
00:00:09.720 --> 00:00:29.520 align:left line:-1
<c.magenta.bg_black>---</c>

sub2
00:00:32.439 --> 00:00:35.320 line:-1
<c.magenta.bg_black>Musique douce</c>

sub3
00:00:35.560 --> 00:02:25.240 align:left line:-1
<c.magenta.bg_black>---</c>

sub4
00:02:25.480 --> 00:02:27.440 line:-1
<c.white.bg_black>-Stéphane ? Où on se gare ?</c>

the current output is:


1
00:00:07,120 --> 00:00:09,480
Musique douce

2
00:00:09,720 --> 00:00:29,520
---

3
00:00:32,439 --> 00:00:35,320
Musique douce

4
00:00:35,560 --> 00:02:25,240
---

5
00:02:25,480 --> 00:02:27,440
-Stéphane ? Où on se gare ?

The desired output would be


1
00:00:07,120 --> 00:00:09,480
<font color="#ff00ff">Musique douce</font> 

2
00:00:09,720 --> 00:00:29,520
<font color="#ff00ff">---</font> 

3
00:00:32,439 --> 00:00:35,320
<font color="#ff00ff">Musique douce</font> 

4
00:00:35,560 --> 00:02:25,240
<font color="#ff00ff">---</font> 

5
00:02:25,480 --> 00:02:27,440
-Stéphane ? Où on se gare ?

The text was updated successfully, but these errors were encountered:

lbrayner · 2022-01-22T16:32:08Z

Hey, just took a look at the webvtt-py docs and no, this is currently not possible. It really would be a great feature. The most it can do is spit out text as is, that is:

1
00:00:07,120 --> 00:00:09,480
<c.magenta.bg_black>Musique douce</c>

2
00:00:09,720 --> 00:00:29,520
<c.magenta.bg_black>---</c>

3
00:00:32,439 --> 00:00:35,320
<c.magenta.bg_black>Musique douce</c>

4
00:00:35,560 --> 00:02:25,240
<c.magenta.bg_black>---</c>

5
00:02:25,480 --> 00:02:27,440
<c.white.bg_black>-Stéphane ? Où on se gare ?</c>

You made me realize that this might be desirable if the style was in-line, so I'm going to add a flag that would allow that (keep caption text as is).

lbrayner · 2022-01-22T16:40:12Z

Looking at the code I realized the dummy here was importing an unused library (html2text). Pushed a commit removing that requirement.

darodi · 2022-01-23T11:02:10Z

Thanks for your feedback.

When you talked about webvtt-py, I had a look at your code and improved the script to store each colour style and change it in each caption in the loop.
I might create a pull request, but I'm not so happy about it.

One better solution would be to modify the webvtt-py library directly to be able to manage font styles.

As said here:
https://webvtt-py.readthedocs.io/en/latest/usage.html#converting-captions
These few lines of code would just be enough.

import webvtt

# save in SRT format
vtt = webvtt.read('captions.vtt')
vtt.save_as_srt()

I'll check their code and create a pull request on their project.

By the way, why did you loop on each caption instead of what is preconized by webvtt-py? Was there a reason?

darodi · 2022-01-23T11:11:08Z

I might have a look at ffmpeg 's way of converting subtitles in their code too.
I tried
ffmpeg.exe -i captions.vtt captions.srt
Usually, it works, but with my input file containing styles, I get an empty output.

lbrayner · 2022-01-24T12:02:55Z

...
As said here: https://webvtt-py.readthedocs.io/en/latest/usage.html#converting-captions These few lines of code would just be enough.
import webvtt

# save in SRT format
vtt = webvtt.read('captions.vtt')
vtt.save_as_srt()

Because save_as_srt() still uses webvtt.Caption.raw_text. Tested it last Saturday and just now. The video player I used at the time on Windows machines then did not support WebVTT, nor did it parse any tags on SRT files. Moreover, even webvtt.Caption.text sometimes still had tags in it (at least 4 years ago), hence html.unescape.

darodi · 2022-01-24T23:51:23Z

I've also created a pull request for webvtt-py

glut23/webvtt-py#39

darodi · 2022-01-25T21:57:33Z

Following your remark here: posted by @lbrayner in glut23/webvtt-py#39 (comment)
For your information,
pull request #3 was already html unescaped, no problem here.

def replace_color(x, tag_name, v):
    return ("" if tag_name == "c" else ("<" + tag_name + ">")) \
           + "<font color=\"" + v + "\">" \
           + html.unescape(x.group(1)) \
           + "</font>" \
           + ("" if tag_name == "c" else ("</" + tag_name + ">"))

 if no_tag_found:
            caption_text = html.unescape(caption.text)

darodi linked a pull request Jan 24, 2022 that will close this issue

manage styles in the output #3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

manage styles in the output #2

manage styles in the output #2

darodi commented Jan 22, 2022 •

edited

Loading

lbrayner commented Jan 22, 2022

lbrayner commented Jan 22, 2022

darodi commented Jan 23, 2022 •

edited

Loading

darodi commented Jan 23, 2022 •

edited

Loading

lbrayner commented Jan 24, 2022

darodi commented Jan 24, 2022 •

edited

Loading

darodi commented Jan 25, 2022

manage styles in the output #2

manage styles in the output #2

Comments

darodi commented Jan 22, 2022 • edited Loading

lbrayner commented Jan 22, 2022

lbrayner commented Jan 22, 2022

darodi commented Jan 23, 2022 • edited Loading

darodi commented Jan 23, 2022 • edited Loading

lbrayner commented Jan 24, 2022

darodi commented Jan 24, 2022 • edited Loading

darodi commented Jan 25, 2022

darodi commented Jan 22, 2022 •

edited

Loading

darodi commented Jan 23, 2022 •

edited

Loading

darodi commented Jan 23, 2022 •

edited

Loading

darodi commented Jan 24, 2022 •

edited

Loading