IIT Jodhpur researchers build C2VNet framework

Spread the love

Jodhpur: Researchers at the Indian Institute of Technology (IIT) Jodhpur are building a software framework for converting digital comics to video (C2VNet).

Researchers of Digital Humanities (DH) at IIT Jodhpur are developing a framework for creating software Comic-to-Video Network (C2VNet). framework revolves around creating an audio-video storybook.

Digital Humanities focuses on and contributes to, a composite of approaches (ideas and methods), rather than different approaches, that lay emphasis on preserving, reconstructing, transmitting, and interpreting human records historically and contemporaneously.

This, in major ways, attends to epistemological questions on knowledge production about generating digital data from material objects and rethinking existing processes of knowledge production.

Methods and methodologies that can create the desired multimedia content have grown as a result of advances in technology. One such instance is “Automatic image synthesis”, which has gained a lot of attention among researchers. In contrast, audio-video scene synthesis, such as that based on document images, remains challenging and underresearched.

This field of DH lacks sustained analysis of multimodality in automatic content synthesis and its growing impact on digital scholarship in the humanities. The C2VNet is a step towards bridging this gap.

The C2VNet evolves panel-by-panel in a comic strip and eventually produces a full-length video (with audio) of a digitized or born-digital storybook. The goal was to design and develop software that takes a born-digital or digitized comic book as input and produces an audiovisual animated movie from it.

Along with the software, IIT Jodhpur researchers have proposed a dataset titled “IMCDB: Indian Mythological Comic Dataset of Digitized Indian Comic Storybook” in the English language. This has complete annotations for panels, binary masks of the text balloon, and text files for each speech balloon and narration box within a panel and plans to make the dataset publicly available.

According to Dr Chiranjoy Chattopadhyay, Assistant Professor, Department of Computer Science and Engineering, IIT Jodhpur, the panel extraction model C2VNet has two internal networks to support the video creation.

“CPENet developed by the team gives over 97% accuracy, and the speech balloon segmentation model SBSNet gives 98% accuracy with fewer parameters,” said Dr Chattopadhyay.

“Both have outperformed state-of-art models. C2VNet is the first step towards the big future of automatic multimedia creation of comic books to bring new comic reading experiences,” added Dr Chattopadhyay.

This is a one-of-a-kind study to discuss the automation of creating audio-visual content from scanned document images. In the future, the team is working towards improving the software so that these multimedia books become more immersive and engaging for the target audience. Usually, this kind of work takes more time and effort, but with this software, it can be done quickly and in a more interactive way.

Leave a Reply Cancel reply