The emergence of OpenAI's Sora on December 10 transformed the landscape of video generation technology and sparked a frenzy of reactions among domestic AI companies.Since its debut on February 16,Sora had faced criticism,labeled as a mere "technological futures projection," yet it eventually revealed its capabilities of producing videos with resolutions up to 1080p and durations of up to 20 seconds.This transition marked a pivotal moment in the sector,with OpenAI’s CEO Sam Altman likening Sora’s official release to the GPT-1 milestone in the realm of video generation.
However,unlike the swift follow-up seen during the GPT era,the response from Chinese AI enterprises regarding Sora was far more complex.While some companies rushed to align their offerings with Sora,others adopted a stance of complete divergence.Companies such as ByteDance,Kuaishou,Tencent,and various AI startups like Zhipu AI and MiniMax demonstrated a proactive approach,announcing their own video generation models soon after Sora's introduction.Many asserted that these new models matched or even surpassed the capabilities of the preview version of Sora.
On the flip side,certain firms,including Baidu and Baichuan Intelligent,asserted their decision not to pursue Sora-like models.Baidu's CEO,Robin Li,made it clear that no matter Sora's popularity,Baidu would refrain from engaging with it.Others in the industry,although possessing video generation technology,chose not to prioritize it.This divergence in strategy suggests a nuanced evolutionary path for AI thoroughfares in China,diverging from the trend established during the rise of the GPT series.
This refusal or willingness to align with Sora reflects stark contrasts among companies capable of developing general foundational models.The landscape of Chinese video generation technology mirrors various decision-making processes,each with its own strategic perspective on technological direction and commercial potential.
Firstly,it is essential to clarify what domestic tech firms aspire to create in alignment with Sora.At its core,Sora merges diffusion models with transformers to generate video content from prompts formed via textual,visual,or video elements.Therefore,any model in contention with Sora should ideally embody several characteristics: generalizability,high-quality output,and strong visual consistency.Moreover,it invites a new reality where the strategic choices stemming from Sora prompt varying reactions from companies across the spectrum.
Some companies,especially those rooted in video-centric business models,quickly demonstrated their commitment to advancing video generation capabilities.Following the launch of Sora,ByteDance introduced its Dreamnia product,and Kuaishou unveiled its Kegli model,both aiming to carve out their niches.Tencent also jumped onto the bandwagon,establishing their mixed-modal generative model.Companies engaged in the developmental domain acted responsively,providing timely tools aimed at video creation,with Zhipu AI's Qingying video generation tool coming to the forefront in July,showcasing a user-friendly interface for creating 4K videos based on user-defined prompts.
Conversely,some entities took a firmer stance,entirely detaching from the Sora narrative.Baichuan Intelligent's CEO,Wang Xiaochuan,publicly expressed resistance to the Sora path,maintaining that while they valued innovation,they would not follow this particular trend.Baidu’s Robin Li echoed this sentiment,opting to focus on broader ventures such as large language models,stating a preference to place resources on products with definite commercial trajectories instead of following the uncertain shadows of video generation.
Then arises a third cohort,those who engage superficially with the technology.Many domestic businesses,driven by a fear of missing out (FOMO) following Sora's success,deemed it necessary to prepare for video generation without making substantial investment commitments.For instance,Alibaba’s marketing team released tomovideo,exploring e-commerce through video generation capabilities,albeit without placing considerable weight behind it.Similarly,Wangyi Wanwu’s entry into B2B markets revealed an awareness of the landscape but demonstrated caution in prioritizing video generation given the adjustments prevalent in the entertainment sector.
In essence,if one were to portray the global emergence of foundational models as a game of poker,the stakes appear different with Sora in the mix.The dynamics shifted from a situation where OpenAI's innovations attracted mass emulation from other firms to a strategic game where companies assess their cards and prioritize their next moves based on business importance and strategic alignment.
This brings us to an inquiry: why has the gaming dynamic shifted with Sora's arrival?The answer lies in a multitude of uncertainties permeating the field of video generation technology.Presently,this domain is obscured by three overlapping clouds of confusion: technological ambiguity,commercial vagueness,and competitive uncertainty.
The first cloud embodies technological ambiguity.While OpenAI frames Sora as a prospective pathway to artificial general intelligence (AGI) rooted in its unique technical procedures,this very pathway is questioned by thought leaders in AI.Influencers like Fei-Fei Li argue that Sora,constrained within a two-dimensional framework,lacks the three-dimensional intelligence requisite for achieving AGI.Thus,videos showcasing urban environments fail to exhibit essential spatial understanding.Sora’s capability has raised eyebrows,leaving room for skepticism about its potential to achieve meaningful breakthroughs.
The second cloud hovers over commercial feasibility.The potential return on investment through the deployment of video generation models remains ambiguous,making many entities rethink their strategies.Given the resource-intensive nature of Sora’s model,the monetization strategy remains uncertain.Companies like Baidu,who have adeptly developed their video technology,opt to prioritize other avenues like financial services and education,where returns appear more tangible and immediate.
Finally,the competitive cloud looms large over the market landscape.Currently,while the commercial vibrancy of video technologies may raise skepticism,success may stem from significant investments,leading to underlying competition.The landscape of foundational modeling today differs starkly from that during the serendipitous rise of GPT,pointing towards a drastically evolved marketplace where re-creating and launching competitive models isn't as arduous.Consequently,organizations ramping up video generation capabilities may question their ability to maintain long-term competitive dominance.
As technological trends,commercial expectations,and competitive dynamics continue to envelop the realm of video generation,Sora's involvement orchestrates a complex blend of unpredictable outcomes.Today’s video generation environment remains ambiguous,laden with uncertainty about the right paths toward success.Each enterprise possesses distinct metrics by which they weigh risks,simultaneously ensuring progress is made according to their terms.
The evolution of major modeling technologies remains paramount,yet with Sora's arrival,domestic firms are reluctant to subscribe unwaveringly to OpenAI's vision.Instead,they are formulating their trajectories forward.In practical terms,learning from Sora and observing OpenAI's narrative strategy is crucial for engaging stakeholders.
Indeed,regardless of whether firms decide to move forward in congruence with Sora’s trajectory,they must not overlook key technological advancements.Baidu exemplifies this balance,foregoing developments directly tied to Sora while advancing in essential areas applicable to the sector.
Through the articulation of overall commercial strategies geared toward core business endeavors,firms can delineate their priorities.The landscape shaped by foundational models brings with it an encouraging cadence as domestic players carve out their rhythm,seeking successful routes reflective of their capabilities and market insights.