31 December 2025

Hopes for AI in 2026

Andrew Ng's team recently asked what people's highest hopes for artificial intelligence is for 2026. Given that the year 2025 is widely considered the year that AI really started being adopted by the industry, and with even Sam Altman saying that he isn't sure what direction AI would take, the responses to the question asked by Andrew's team showcased a small snippet of what people expect to see. 

I was honored that one published response was mine.

My thoughts were composed of three major points:

1. Inspiring new directions of thought, to take technology beyond what the Transformer architecture is capable of. Although people are impressed with GPT, the fact remains that many people don't really understand how it works. It is necessary for students to build AI from scratch and build tools to observe how "intelligence" emerges as the AI is trained with more data. Even if working with a smaller amount of training data, such efforts (when built with proper experimental tools) could help people understand the basics of how the pseudo-intelligence manifests itself, and how to improve it. This is crucial not just for designing better architecture, but also for designing better primitives. VL JEPA is one thought process in the direction of change, similar to my thesis proposal of the universal event model. Chronos 2 is another technique, which is very similar to my thesis proposal of partial contexts.

2. Fundamental changes in academia: The way academia encourages research, needs to be modified to help scholars truly explore and innovate/invent new technology. This requires better decisions on stipends, a focus on quality over quantity (paper publishing), and access to good computing power. It is crucial to connect researchers from various fields so that they can brainstorm and ideate, based on the various innovations each of them created in their respective fields of hardware, biological research, and software. It would help to incentivize innovation in more directions. Not all of it may be fruitful, but the exploration of ideas is necessary.

3. Corporations and governments creating leisure: While competition is good, the way it is currently being done is ending up creating burnouts instead of fostering healthy work environments. The manner in which the economy is designed itself may need a fundamental change, to enable countries to cooperate and take civilization forward. 

Humans have achieved a civilizational leap in technology with the current advancement of AI. I hope the magnanimity of people among us would propel AI to a level that can help us create a heaven on Earth and beyond the solar system. To do that, we need to build a healthy foundation that is very different from the existing local optima that evolution is stuck at. Perhaps one way to progress would be to incorporate artificial intelligence and machines into the human body and create super-humans.

There were more opinions shared in part 2 of the hopes for AI.

For now, let us see what 2026 has to offer! 

03 December 2025

How to auto adjust the volume across audio of multiple media files while it is being played?

It's not called "normalization" or "equalization". It's actually called Dynamic Range Compression (DRC). This is the setting to choose in VLC, as I understood from the Geekality blog

DRC in VLC 

To enable it in VLC, go to Tools >  Effects and filters > Compressor. Click the "Enable" checkbox and adjust the values to the following: 

  • RMS/peak: 0
  • Attack: 50 ms
  • Release: 300 ms
  • Threshold: -20 dB
  • Ratio: 20:1
  • Knee radius: 1 dB    
  • Makeup gain: 12 dB 

These are the values from Geekality which worked well, but although a cursory explanation is provided on the blog, the explanation was not good enough to be able to understand the what and why to know how changing those values would affect the ability of VLC to adjust the volume. This is where I asked Grok and found out:

What the terms mean

  • RMS/Peak: Uses peak level (fastest reaction)
  • Attack: How fast it clamps loud spikes
  • Release: How fast it lets the volume back up
  • Threshold: Anything louder than –20 dB gets squashed
  • Ratio: Turns 20 dB over threshold into 1 dB 
  • Knee: Very sharp bend (hard knee)
  • Makeup gain: Boosts everything after squashing

So for the values given above, loud peaks are crushed instantly, quiet parts stay quiet, then the whole track is boosted 12 dB. This makes music/video feel flat and loud, but pumping (volume breathing) can be heard on songs with big dynamic swings.

  • Pumping / Breathing: Volume drops after loud parts, then slowly rises again.
  • Spikes slipping: Short loud bangs escape and surprise your ears. 
  • Transients: The sharp “crack” at the start of drums or claps. 
  • Dynamic swings: Natural loud-quiet changes in music or speech. 
  • Constancy: Everything stays at the same volume. 
  • Hard knee: Compression kicks in suddenly (sharp bend). 
  • Soft knee: Compression fades in gently (smooth curve). 
  • Peak: The single loudest instant. 
  • RMS: The average loudness over a moment.
  • Squashing: Forcing loud parts down to match quiet ones. 
  • Makeup gain: Boosting the whole sound after squashing it flat. 
  • Fast attack: catches transients (no spikes slip).
  • Slow release: avoids pumping.
  • High ratio + low threshold: maximum squashing = maximum constancy.
  • Soft knee + some RMS: sounds natural, less robotic.

How changing the values affects volume

  • Threshold ↑ (e.g. –12 dB) : Less of the track is compressed → dynamic swings remain → worse constancy. 
  • Threshold ↓ (e.g. –30 dB) : More of the track is compressed → everything flattened → better constancy (but can sound dead). 
  • Ratio ↑ (e.g. 30:1) : Stronger squash → better constancy, but risk of lifeless sound. 
  • Attack ↑ (slower, 200 ms) : Misses fast peaks → sudden loud spikes slip through → worse constancy. 
  • Attack ↓ (faster, 5 ms) : Catches every spike → better constancy, but can distort transients (drum hits). 
  • Release ↑ (slower, 800 ms) : Volume stays low for longer after a loud bit → quiet parts after loud parts sound muffled → worse. 
  • Release ↓ (faster, 100 ms) : Volume recovers quickly → can cause pumping (breathing) → worse if too fast. 
  • RMS/Peak → 1 (full RMS) : Ignores clicks/spikes, reacts to average → smoother, less pumping, slightly less perfect peak control.
  • Knee ↑ (softer, 6–10 dB) : Compression eases in → more natural, slightly less perfect constancy. 
  • Makeup gain ↑ : Whole track louder → helps quiet parts match, but can clip if too high. 

Recommended Presets 

Podcast / Speech: Maximum constancy, no pumping

  • RMS/Peak: 0.7 (mostly RMS, ignores clicks)
  • Attack: 15 ms
  • Release: 150 ms
  • Threshold: -24 dB
  • Ratio: 12:1
  • Knee: 6 dB
  • Makeup gain: 10 dB

Pop / YouTube Mix: Loud but still musical

  • RMS/Peak: 0.4
  • Attack: 25 ms
  • Release: 200 ms
  • Threshold: -18 dB
  • Ratio: 8:1
  • Knee: 4 dB
  • Makeup gain: 8 dB

Movies / Drama: Preserve dynamics, just tame peak

  • RMS/Peak: 0.2
  • Attack: 50 ms
  • Release: 400 ms
  • Threshold: -14 dB
  • Ratio: 4:1
  • Knee: 8 dB
  • Makeup gain: 6 dB

Classical / Jazz: Light touch, almost no squash

  • RMS/Peak: 0.1
  • Attack: 80 ms
  • Release: 600 ms
  • Threshold: -10 dB
  • Ratio: 3:1
  • Knee: 10 dB
  • Makeup gain: 4 dB

Bottom Line

  • Want radio-loud constancy? Lower threshold, high ratio, fast attack, medium release, add makeup.  
  • Want music to breathe? Raise threshold, lower ratio, slower attack/release, softer knee.  

Audio volume adjustments on other software

  • Audacity: It has a compressor and a limiter. 
  • EasyEffects: It appears to be able to adjust various input and output sources. 
  • Other options on Ubuntu: There's pacmd, swh-plugins and Dyson compressor. 

Disclaimer: A lot of the information about the meaning of the audio terms were generated by Grok, so do take it with a pinch of salt.