The story behind peertube's transcription feature

The transcription feature finally landed in Peertube 6.2, and this has been a long journey!

Subtitles are now automatically generated for your videos thanks to AI.

I'm writing this from my perspective. Important bits and pieces might be missing, but please correct me if I have altered any facts; consider the rest as my interpretations.

apps.education

The development of this feature was initiated as a PeerTube plugin in 2021 by the 🇫🇷 apps.education team within the 🇫🇷 MENJ.

Once upon a time, I thought government organizations were only led by clueless technocrats. I was amazed to discover so many librists working there. I took that as the sign of the impending libre revolution toward global collaboration ✊

The apps portal offers access to free (libre) digital services and collaboration tools to all education employees for their everyday needs.

It's great to know that teachers now have alternatives to GAFAM services.

Among the proposed services are many 🇫🇷 PeerTube instances, and they were willing to make them accessible (a11y). In France, this is 🇫🇷 legally required for public services since 2005. Video content accessibility criteria are specified in point 4 on “multimedia content” of the 🇫🇷 RGAA, which is heavily inspired by the time-based-media criteria of the WCAG 2.1.

We wanted to achieve the following success criteria on “prerecorded” video content; back then, “live” content was considered out of reach:

Prerecorded Video-only Either an alternative for time-based media or an audio track is provided that presents equivalent information [...] Captions (Prerecorded) Captions are provided for all prerecorded audio content in synchronized media [...]

Plugin

OpenAI Whisper wasn't yet a thing, but there were already many options available. Mozilla DeepSpeech and the CommonVoice initiative were very inspiring, but soon we realized the project was abandoned, and the French transcripts quality was disappointing. Given Vosk transcript quality, it was the most promising candidate.

Since a PeerTube plugin is just a Node.js package, we chose the Vosk Node.js package version (C++ bindings). This version would be naturally installed with the plugin since it was listed in its dependencies. This was the answer to one of the main constraints: there was no need to manually install an external dependency on the server. However, these bindings later revealed many unwanted side effects: – Maintaining a fork to work in // in a web worker context – Even if a Node.js package, it may require dev dependencies if there's no pre-built binary for your architecture – Intermittent failures (“segfault”), impossible to debug – Sometimes causing memory starvation and crashing the PeerTube process

Even though it meant relying on an external dependency, instrumenting the Vosk binary, as done for ffmpeg and youtube-dl in the PeerTube codebase, would have been a better tradeoff. However, this would still have left us with other plugin-related issues: – Reinventing a few wheels – Notifications – Job queues – Finding/adding the right hook to trigger/cancel the transcription process.

See all the transcription plugin issues.

Community

We were initially cut off from community feedback because account signup was disabled at mim-libre forge. This later led us to migrate to Gitlab.com. PeerTube has great international exposure, and so did this plugin. We had many users providing feedback and raising issues, but unfortunately, we were lacking the resources to answer them thoroughly.

While we had setbacks with this plugin version, it led us to raise many issues and PRs on PeerTube Core with many contributions: – Improvements to the PeerTube plugins API – Constant management – New hooks – Typing fixes – A typing package that allows you 🫵 to write typesafe plugins – A simple subtitle editor

By the way, you should have a look at this subtitle editor plugin by @Herover@helvede.net, which looks quite promising.

I was personally touched by how well @Chocobozzz@framapiaf.org welcomed those contributions, and I greatly appreciate our discussions. I'd also like to thank @JohnLivingston, who was kind enough to share some of his experience in developing a PeerTube plugin.

Whisper

Then in 2023, things went crazy with OpenAI Whisper. There are many different flavors of Whisper out there. Given the previous setbacks, the community feedback and some rich discussion with @yassinsiouda@mastodon.doesnotexist.club, we decided to move toward a new plugin version based on “Whisper as a Service”. This was also the year of the release of PeerTube 5.2 and the peerTube-runner.

✨ And then the stars aligned crazily... ✨

FUN

In September 2023, we went ahead and contacted the PeerTube team with our idea. We told them we'd go on with our new plugin idea unless a Whisper-based transcription solution could hit PeerTube core.

@Chocobozzz@framapiaf.org told us he had transcription in mind while designing peertube-runner 💛💛💛

Then in October, Manuel Raynaud from FUN MOOC contacted Framasoft with similar needs! They were already using the peertube-runner to offload some of their transcoding jobs, and they were thinking about using it to generate transcripts as well. They were also quite familiar with AI possibilities. They even organized an AI challenge where students had to automate the creation of video text summaries using generative AI. I know this might sound evil to some, but this might just lead to new exciting FOSS features!

Given the time to set up the collaboration and find the funds, here we are :)

There is now a @peertube/peertube-transcription package available in the PeerTube codebase.
We contributed to the following repositories on the road:
The @peertube/peertube-transcription package is used in PeerTube and in peertube-runner.

The peertube-plugin-transcription is now considered deprecated. If you're still interested in a Vosk version, you could implement it as a new VoskTranscriber in the @peertube/peertube-transcription package.

Looking forward to your feedback!

—

Thanks again to everyone who helped make this happen:

🇫🇷 MENJ, 🇫🇷 DNE, @dinum@social.numerique.gouv.fr, @apps@mastodon.mim-libre.fr Benoît Piédallu @nschont@mastodon.mim-libre.fr Nicolas Vignal. Also Nicolas Can from ESUP, Jérôme Louradour & Michel-Marie MAUDET from LinTO.ai & @LINAGORA@framapiaf.org.

Of course, @peertube@framapiaf.org team @Chocobozzz@framapiaf.org & @Pouhiou@framapiaf.org from @Framasoft@framapiaf.org. A special thanks to Manuel Raynaud @lunika@mastodon.social from FunMOOC & also to @nlnet@nlnet.nl.

And also to all the peertube-plugin-transcription contributors: @artlog@linuxrocks.online @chagai95@campaign.openworlds.info @lcaylat@mamot.fr @toby3d@mstdn.io @tr4sk@pouet.chapril.org @mikeletxeberriaokariz@mastodon.eus @phlhardy@mastodon.zaclys.com