While developing my video content generation pipeline, I needed to control the maximum length of generated text so that the final video wouldn’t end up too long.
What are the ways to do this? For example, you can set a maximum word count for the generated text. If you want the starting point to be the video length in minutes, use the formula: video length in minutes = number of words * average reading speed per minute. The average reading speed is about 100–120 words per minute (for English).
So, what about interacting with the neural network itself? There are two main approaches:
Approach one: Explicitly specify in the prompt that the generated text should not exceed a certain number of characters. This usually works only moderately well, and I wouldn’t recommend relying on this method alone. It’s also better to specify an acceptable margin of error so the neural network doesn’t get confused.
Approach two: Set the max_tokens parameter, which limits the output of the neural network. Along with this, I also recommend passing the temperature parameter with a value in the range of 0.5–0.8, so the neural network is more specific in its responses and less verbose.
import re
from src.types import LanguageType
classTextUtils: END_SENTENCE_CHARS =(".","!","?",",",";","。","\n") RATIO ={"en":1.4,"ru":1.6,"es":1.6,"de":1.6,"zh":2.0,"ja":2.0,"ko":2.0,}@staticmethoddefget_tokens_per_word(language: LanguageType)->float: value = TextUtils.RATIO.get(language)ifnot value:raise ValueError(f"Unsupported language: {language}")return value
@staticmethoddefsplit_by_tokens(text:str, max_tokens:int, overlap_tokens:int=0):"""
Разбивает текст на куски по max_tokens токенов с перекрытием overlap_tokens.
Использует tiktoken (cl100k_base).
"""if max_tokens <=0:raise ValueError("max_tokens must be > 0")if overlap_tokens <0:raise ValueError("overlap_tokens must be >= 0")import tiktoken
enc = tiktoken.get_encoding("cl100k_base") tokens = enc.encode(text) chunks =[] start =0 n =len(tokens)while start < n: end =min(start + max_tokens, n) chunk_tokens = tokens[start:end] chunks.append(enc.decode(chunk_tokens))if end >= n:break start = end - overlap_tokens if overlap_tokens >0else end
return chunks
@staticmethoddefgroup_text_into_sentences( text:str, max_words_length:int|None=20)->list[list[dict]]: new_sent_split_pattern ="|".join(map(re.escape, TextUtils.END_SENTENCE_CHARS)) raw =[w.strip()for w in re.split(new_sent_split_pattern, text)if w.strip()] replacements ={"#":"","---":""} raw =[{"word": TextUtils.replace(w, replacements)}for w in raw if w.strip()]return TextUtils.group_words_dict_into_sentences(raw, max_words_length)@staticmethoddefreplace(value:str, replacements:dict)->str:for old, new in replacements.items(): value = value.replace(old, new)return value
@staticmethoddefgroup_words_dict_into_sentences( words:list[dict], max_words_length:int|None=20,)->list[list[dict]]: sentences =[] current_sentence =[]for _, word inenumerate(words): current_sentence.append(word) has_end_char =( word.get("word","").strip().endswith(TextUtils.END_SENTENCE_CHARS)) current_sentence_words ="".join([w.get("word","")for w in current_sentence if w.get("word")]) is_length_exceeded =( max_words_length andlen(current_sentence_words)>= max_words_length
) should_split = has_end_char or is_length_exceeded
# Create new sentence every max_words_per_line words or at punctuationif should_split: sentences.append(current_sentence) current_sentence =[]# Add remaining words if anyif current_sentence: sentences.append(current_sentence)return sentences
# agent example with setting of those parametersdefget_agent(self, words_to_generate:int=300, language: LanguageType ="en"):return AtomicAgent[ VisualSubtitlesMakerInputSchema, VisualSubtitlesMakerOutputSchema,]( config=AgentConfig( client=instructor.from_openai(self.client, mode=instructor.Mode.JSON), model=Config.gpt_model(), model_api_parameters=ModelApiConfig.get_tokens_limit_for_words( words_to_generate, language
), system_prompt_generator=SystemPromptGenerator( background=["""Ты — режиссёр-постановщик кинематографичного фильма.""","""На основе истории из субтитров ты должен создать атмосферные и визуально разнообразные сцены.""",], steps=["""Для каждой сцены сделай одно визуальное описание:
- Ключевое действие/объект сцены
- Обстановка (место, атмосфера, время)
- Эмоции/настроение
""",], output_instructions=["""Пиши в стиле промта для генерации изображений (cinematic).""","""Не добавляй слайды с призывами подписаться, лайкать или комментировать.""",],),))
How do you calculate the number of tokens based on the number of words? Below is code that does this for four languages. On average, one word in English equals 1.3 tokens. If you pass a fractional number, the neural network will silently ignore it and assume there are no restrictions, which can drain your balance. It’s a good idea to check the results generated by the neural network afterward. With the script below, I managed to achieve over 90% accuracy. Leave comments and share your experience. Good luck with your development!
Table of Contents
Open for contract collaboration
I am available for contract-based collaboration. If you have an interesting project idea, schedule a call via Calendly.