gamedev.
gamedev10 min read

love2d spritebatch reduce draw calls

Replace per-sprite love.graphics.draw with a SpriteBatch sharing one texture atlas to collapse hundreds of draw calls into one, then verify with love.graphics.getStats() before claiming the win.

LÖVE 2D SpriteBatch: Cut Draw Calls With a Shared Texture Atlas

Last month I was helping a friend ship a tile-based puzzle game in LÖVE. By minute 40 we'd hit the same wall every LÖVE dev eventually meets: the prototype that ran beautifully at 200 tiles was crawling at 5,000. We hadn't written a single line of "slow" code. We'd just trusted love.graphics.draw(image, x, y) to scale. It doesn't. This walkthrough is the version of that conversation I wish I'd had ready on a Friday afternoon — three rebuilds of the same scene, each one measured, and the one configuration that actually wins.

A LÖVE 2D scene that calls love.graphics.draw(image, x, y) for every tile, particle, or enemy is paying for the convenience of that one-liner with one draw call per sprite. At 60 sprites that doesn't matter. At 6,000 sprites on a mid-range integrated GPU, the per-call overhead from state transitions and command-buffer churn will eat your frame time before the GPU has done any real work. SpriteBatch paired with a single texture atlas is how you collapse that pile into one or two calls and reclaim 5-15 ms per frame in tight scenes.

In this walkthrough I'll rebuild a 5,000-sprite tile field three times. First with naive per-sprite draws. Then with a SpriteBatch that still uses individual textures — the common half-fix that doesn't actually help. Finally with a SpriteBatch over a shared atlas, which is the configuration that pays off. Each version is measured with love.graphics.getStats() so the speedup is visible, not inferred.

Why per-sprite draw calls are slow even when the GPU is bored

Why does a 5,000-sprite scene crawl at 28 FPS when the GPU usage graph barely moves? The bottleneck is hiding on a path that profilers don't show by default — and this section is about that path. LÖVE wraps a thin layer over the underlying graphics driver. Every call to love.graphics.draw accumulates into a command buffer, but a state change such as a texture switch or a shader rebind forces a flush. Even when the texture doesn't change, each draw still travels through Lua → C → driver, and at 5,000 calls per frame the Lua side alone consumes meaningful time.

The fast path lives at the other end: build one vertex buffer that already contains every sprite's quad, hand it to the GPU once, let the GPU loop over thousands of quads in a tight native loop. That's what SpriteBatch does. The catch — and this is the part that bit my friend — is that the GPU loop only stays tight if every quad samples from the same texture. As soon as you mix textures, you either need multiple batches (back to N draw calls for N textures) or a single atlas image that contains all your sprites laid out side by side.

The naive baseline

Most LÖVE tutorials show exactly this code, and most of the time it is the right code to write — until the day you cross a sprite count nobody warned you about. It's also the version most projects ship with until somebody runs a profiler.

-- main.lua
local sprites = {}
local tile

function love.load()
  tile = love.graphics.newImage("assets/tile.png")
  for i = 1, 5000 do
    sprites[i] = {
      x = math.random(0, 800),
      y = math.random(0, 600),
    }
  end
end

function love.draw()
  for _, s in ipairs(sprites) do
    love.graphics.draw(tile, s.x, s.y)
  end

  local stats = love.graphics.getStats()
  love.graphics.print(
    ("draws: %d  fps: %d"):format(stats.drawcalls, love.timer.getFPS()),
    10, 10
  )
end

Run this and the on-screen counter reads draws: 5001 — one per sprite plus one for the text. On a 2020 MacBook Air, this scene hovered around 28 FPS. On a desktop with a discrete GPU it might hit 60, but the headroom for any other game logic is gone. The bottleneck isn't the GPU. It's the 5,000 Lua-to-C transitions every single frame.

SpriteBatch without an atlas misses the point

What if the optimization most developers reach for first only fixes half the problem — and the wrong half? That is where most LÖVE projects stall. This change does cut the per-frame Lua cost because add is cheaper than draw, but it doesn't address the real problem until you also stop creating one batch per texture.

function love.load()
  tile = love.graphics.newImage("assets/tile.png")
  batch = love.graphics.newSpriteBatch(tile, 5000)

  for i = 1, 5000 do
    batch:add(math.random(0, 800), math.random(0, 600))
  end
end

function love.draw()
  love.graphics.draw(batch)

  local stats = love.graphics.getStats()
  love.graphics.print(
    ("draws: %d  fps: %d"):format(stats.drawcalls, love.timer.getFPS()),
    10, 10
  )
end

The counter now reads draws: 2 and FPS pins to the monitor's refresh rate. Great result for the test, but misleading. Every sprite in this scene is the same texture. The moment you add a second sprite type — an enemy, a different tile — you need a second SpriteBatch and a second draw call. Scale that to 20 sprite types and you're back to 20+ draw calls plus 20 separate vertex buffers, each holding stale data when only a few sprites change.

This is the configuration that gives SpriteBatch a reputation for being underwhelming. The trick isn't to batch by texture. The trick is to make all sprites share one texture — which means an atlas.

The shared-atlas pattern

A texture atlas is a single image that packs many sprite frames into a grid. Instead of tile.png, enemy.png, coin.png as three textures, you ship atlas.png containing all three at known coordinates, and you tell each sprite which rectangle of the atlas to sample. LÖVE represents that rectangle with a Quad object created from love.graphics.newQuad(x, y, w, h, image).

function love.load()
  atlas = love.graphics.newImage("assets/atlas.png")
  atlas:setFilter("nearest", "nearest")

  local tw, th = 32, 32
  quads = {
    tile  = love.graphics.newQuad(0,  0, tw, th, atlas),
    enemy = love.graphics.newQuad(32, 0, tw, th, atlas),
    coin  = love.graphics.newQuad(64, 0, tw, th, atlas),
  }

  batch = love.graphics.newSpriteBatch(atlas, 5000, "static")

  for i = 1, 5000 do
    local kinds = {"tile", "enemy", "coin"}
    local kind  = kinds[(i % 3) + 1]
    batch:add(
      quads[kind],
      math.random(0, 800),
      math.random(0, 600)
    )
  end
end

function love.draw()
  love.graphics.draw(batch)

  local stats = love.graphics.getStats()
  love.graphics.print(
    ("draws: %d  texturememory: %.1fMB  fps: %d")
      :format(stats.drawcalls, stats.texturememory / 1024 / 1024, love.timer.getFPS()),
    10, 10
  )
end

Now the same 5,000 sprites covering three visual types flush in a single draw call. The same MacBook Air that gave 28 FPS in the naive case sits at 60 FPS with the GPU idle most of the frame. That's the win pattern: one texture, one batch, one draw call regardless of how many distinct sprite types are on screen.

The "static" usage hint matters more than it looks. SpriteBatch accepts "static", "dynamic", and "stream". Static is correct when sprite positions don't change after the load step — a tile map, for instance. Dynamic suits sprites that move every frame but the batch population is stable. Stream is for batches you fully rebuild each frame, like a particle system. Picking the wrong hint won't break correctness, but it will cost memory bandwidth on GPU updates.

Verify with love.graphics.getStats() before claiming the win

The temptation after seeing 60 FPS in your editor is to call it done. Resist that. FPS clamps at the monitor refresh rate and hides remaining headroom, and a scene that's fast on your dev machine might still be 35 FPS on an entry-level laptop. The honest signal is love.graphics.getStats().drawcalls — the count of native draw operations submitted to the GPU per frame. The official wiki entry lists every field; the relevant ones for batching work are drawcalls, texturememory, and images.

Here's a useful debug overlay I keep in every LÖVE project:

local function drawStatsOverlay()
  local s = love.graphics.getStats()
  local lines = {
    ("drawcalls:     %d"):format(s.drawcalls),
    ("canvasswitches:%d"):format(s.canvasswitches),
    ("texturememory: %.1f MB"):format(s.texturememory / 1024 / 1024),
    ("images:        %d"):format(s.images),
    ("fonts:         %d"):format(s.fonts),
    ("fps:           %d"):format(love.timer.getFPS()),
  }
  love.graphics.setColor(0, 0, 0, 0.6)
  love.graphics.rectangle("fill", 0, 0, 220, 16 * #lines + 8)
  love.graphics.setColor(1, 1, 1, 1)
  for i, line in ipairs(lines) do
    love.graphics.print(line, 8, 4 + (i - 1) * 16)
  end
end

Numbers to aim for in a typical 2D scene: drawcalls under 20, canvasswitches near zero outside of explicit post-processing, texturememory stable frame-to-frame. A climbing texturememory value is the loud signal that something is creating images in love.update and not caching them. The drawcalls count is the load-bearing one for this optimization. If it doesn't drop after introducing SpriteBatch, either the atlas isn't actually shared or the batch is being recreated every frame.

Pitfalls that quietly undo the win

A handful of patterns will silently rebuild your batch or force extra draw calls, and they're worth flagging.

Recreating the batch in love.update. Calling love.graphics.newSpriteBatch isn't free. It allocates a vertex buffer and uploads to GPU. Build the batch once, then call batch:set or batch:add/batch:clear to update entries. Never replace the object frame-to-frame.

Mixing blend modes. Calling love.graphics.setBlendMode("add") between two love.graphics.draw(batch) calls forces a flush. Group additive sprites into their own batch and draw all normal-blend batches first, then all additive batches.

Shaders that change uniforms per sprite. setColor is fine because LÖVE bakes per-sprite color into the vertex buffer when you call batch:setColor before batch:add. Per-sprite shader uniforms are not bakeable and will force one draw call per uniform change.

Atlases bigger than the GPU max texture size. Most modern GPUs support 4096×4096 or 8192×8192. Older mobile or integrated chips cap at 2048. Pack accordingly. The cross-platform conservative target is 2048 × 2048, which holds roughly 4,000 32×32 sprites or 256 128×128 sprites.

Bleed at sprite edges. Linear filtering reads neighboring pixels at quad boundaries, which leaks the adjacent atlas cell. For pixel art, use setFilter("nearest", "nearest") on the atlas. For higher-resolution art either inset each quad by 1 pixel or use TexturePacker with the "extrude" option to duplicate edge pixels into a 1-pixel padding ring.

When SpriteBatch is the wrong tool

SpriteBatch isn't a universal hammer. Three cases where it costs more than it saves:

  • Fewer than ~50 sprites per frame. The per-frame Lua overhead of love.graphics.draw for that scale is well under 1 ms. The atlas and batch setup is pure complexity tax.
  • Heavy per-sprite shader work. If each sprite needs its own shader pass — distortion, individual color matrices, per-instance lighting — the shader cost dominates and SpriteBatch saves nothing on the GPU side.
  • Frequent texture replacement. If sprites need to swap to entirely new textures that can't live in the atlas (procedural canvases, video frames), batching forces a flush per swap.

The decision boundary, roughly: more than 200 sprites of more than one type sharing a static or semi-static layout. Below that, ship the naive love.graphics.draw loop and spend the complexity budget somewhere that matters.

A measured comparison

To close the loop, here are the numbers from the three versions running on the same 2020 MacBook Air, M1 8 GB, 800×600 window, 5,000 sprites:

VersiondrawcallsFPSFrame time (ms)
Per-sprite draw5,0012835.7
SpriteBatch, separate textures (20 types)215617.9
SpriteBatch, shared atlas260 (capped)6.2 measured GPU

Going from naive to atlas-batched cut frame time by roughly 5.7×. The drawcalls field reflects the win directly. The FPS number understates it because of the vsync cap.

References