PHP How are arrays implemented in PHP, and how does code affect the speed of working with them?

META

Activist
SUPREME
MEMBER
Joined
Mar 1, 2026
Messages
118
Reaction score
379
Deposit
0$
Arrays are the bread and butter of a PHP developer. We use them constantly, but rarely think about how they are implemented internally. Yet their internal structure directly affects the speed and memory usage of our applications. Let’s take a closer look.

Let’s debunk three myths:

Accessing an array element always takes the same amount of time.

In PHP, regular arrays and associative arrays are the same thing.

Using a reference in foreach is faster than simply iterating over the elements.


Imagine the following situation. You create an array and add elements to it using the same operator: $array[] = $value;. The keys become sequential: 0, 1, 2, 3, ... 9999. The last key in the array is 10000. Then you add a new element with the key 100000000. The question is: will the speed of adding this last element be the same as for all the previous ones?

Most people probably wouldn’t even think about it. It seems like there shouldn’t be any difference. But since the question is being asked, maybe there actually is?

We’ll figure out where these differences come from and why they appear later. (Spoiler: in my case, the insertion time increased by 2872 times.)

An array is one of the most frequently used types in PHP. Understanding how it works internally helps to:

avoid unnecessary allocations and hash recalculations;

design data structures more effectively;

predict peak memory consumption;

avoid small performance “micro-pitfalls.”

Two “modes” of arrays: packed and hash

In PHP, the array type can be implemented in different ways. Internally, it has two storage modes: packed and hash table.

Packed — a packed list. This is a list of values with keys 0, 1, 2, … and no string keys. All keys are a sequence of numbers starting from 0. The data is stored as one continuous block of memory containing the values. It has no hash index and no buckets, which makes it memory-efficient and fast.

Hash — a hash table. This structure is used in all other cases: when keys are arbitrary — strings, negative numbers, very large numbers, keys with “gaps,” or mixed types. The data is stored in a hash table where elements are distributed into buckets based on the hash value of the key.

How the storage mode is chosen

In an empty array, the structure is not yet defined. The actual structure is allocated when the first value is written, and that operation determines the storage mode.

The general rule is simple: if the array keys start from 0 and go sequentially, the array will be packed. In all other cases, it becomes a hash array.

Packed arrays may contain small “gaps,” for example after an unset. However, an array may switch to hash mode in cases such as:

A string key appears:
$a['x'] = 1;

A negative key appears:
$a[-1] = 1;

A large jump in the index:
$a[1_000_000] = 1;
(PHP will not allocate a million empty slots.)

Some operations that break the “dense list” structure may also trigger the switch.

<?php

// 1) Empty array: the real structure is not defined yet
$a = [];

// 2) Two consecutive inserts -> packed
$a[] = 'A'; // key 0
$a[] = 'B'; // key 1

// 3) Any string used as a key => conversion to hash
$a['user'] = 'Ivan'; // the array is now in hash mode

// 4) A large jump in the index -> hash
$b = [];
$b[] = 1;
$b[] = 2; // packed
$b[1_000_000] = 3; // becomes hash (PHP will not allocate a million empty slots)

// 5) Negative key -> immediately hash
$c = [];
$c[-1] = 'cat'; // hash

PHP can switch from packed to hash, but never back from hash to packed.

How a hash table works in PHP

Imagine a cabinet with many numbered drawers (buckets). The key of an array element is transformed into a number (a hash), and this number determines which drawer to use — that’s where the element is stored and where it will later be searched for.

How an element is added

1. The hash of the key is calculated and a bucket is selected.


2. If the bucket is empty — the element is placed there.


3. If the bucket already contains something (collision) — the element is placed next to it (added to a chain inside that bucket).


4. If the key already exists — the value is simply updated; no duplicate is created.



How an element is found

1. The hash is calculated again and the corresponding bucket is located.


2. The elements in that bucket are scanned until the one with the matching key is found.



This process is fast — in most cases.

As a result, the time required to access array elements depends on the storage mode used by the array. Moreover, at some point the array may change its storage strategy (switch to hash mode), and that conversion can take a noticeable amount of time.

Let’s test this and verify that the difference really exists.

The script below:

fills an array with 10,000 elements sequentially (packed mode), measuring the total time and the average time to insert a single element;

then adds an element with the key 100000000 (a huge index “jump”), measuring the time for this single insertion — this is where the conversion from packed to hash occurs;

adds one more element after the conversion (a normal insertion already in hash mode) so we have something to compare against;

calculates how many times the insertion with conversion is slower than a normal insertion in packed mode and a normal insertion in hash mode;

runs the scenario several times and shows the averaged results.

<?php

declare(strict_types=1);
gc_disable();

function median(array $xs): float {
sort($xs);
$n = count($xs);

return $n ? ($n % 2 ? $xs[intdiv($n, 2)] : ($xs[$n/2 - 1] + $xs[$n/2]) / 2) : NAN;
}

function ns(callable $fn): int {
$t = hrtime(true);
$fn();

return hrtime(true) - $t;
}

$N = 10000; // array size to fill
$BIG = 100000000; // large index to trigger packed->hash conversion
$RUNS = 9; // number of runs

printf(
"PHP %s | SAPI=%s | JIT=%s | int=%d | N=%d | BIG=%d | RUNS=%d\n",
PHP_VERSION,
PHP_SAPI,
ini_get('opcache.jit') ?: 'off',
PHP_INT_SIZE,
$N,
$BIG,
$RUNS
);

// Build a packed array using only $a[]
$build = function() use ($N): array {
$a = [];
for ($i = 0; $i < $N; $i++) {
$a[] = $i;
}

return $a;
};

// Small warm-up to reduce noise
for ($w = 0; $w < 2; $w++) {
$tmp = $build();
unset($tmp);
}

$fill = $avgAppend = $conv = $hashIns = [];

for ($r = 0; $r < $RUNS; $r++) {
$a = [];

// Filling in packed mode
$tFill = ns(function() use (&$a, $build) { $a = $build(); });
$avg = $tFill / max(count($a), 1); // ns per insertion

// Convert packed -> hash by setting a large index
$tConv = ns(function() use (&$a, $BIG) { $a[$BIG] = -1; });

// Insertion after conversion (already hash structure)
$tHash = ns(function() use (&$a) { $a[] = -2; });

$fill[] = $tFill;
$avgAppend[] = $avg;
$conv[] = $tConv;
$hashIns[] = $tHash;
}

// Aggregation: medians
$mf = median($fill) / 1e9; // seconds
$ma = median($avgAppend); // ns
$mc = median($conv); // ns
$mh = median($hashIns); // ns

echo "\nMedian (after warm-up):\n";
printf("- Filling N=%d: %.6f s; average per insert: %.1f ns\n", $N, $mf, $ma);
printf("- Insertion with large index (packed->hash conversion): %.1f ns\n", $mc);
printf("- Regular insertion after conversion: %.1f ns\n", $mh);
printf("- Conversion vs avg packed insert: ×%.1f\n", $mc / max($ma, 1.0));
printf("- Conversion vs regular hash insert: ×%.1f\n", $mc / max($mh, 1.0));

At the beginning of the article there was a question about adding a new element. The insertion time in that example differs many times precisely because of the conversion.

Also note the fairly large difference even for regular element insertions.

What are buckets and collisions (hash mode)

A bucket is a “slot” in the hash table where elements are placed if they share the same index:

hash(key) mod table_size.

A collision occurs when different keys produce the same index and end up in the same bucket. In that case, the elements inside the bucket are linked into a chain, and during lookup PHP iterates through that chain until it finds the matching key.

An important detail: element lookup is performed not only by the hash, but also by iterating over elements that have the same hash.

On average, element access remains fast (O(1)) because there are many buckets and elements are usually distributed evenly.

Collisions “in real life”: example with integer keys

Let’s try to artificially create collisions (place many elements into the same bucket) and compare the access speed. For this we need an associative array.

For string keys, PHP adds a random salt to the hash for every process/request. Because of this, intentionally creating a series of collisions for strings from user code is practically impossible.

However, with integer keys we can artificially overload a single bucket and observe the slowdown in access time.

Let’s write a script that creates two arrays with different keys. In one of them, we will choose keys in such a way that they mostly fall into the same bucket. Then we will try to access the array elements in a random order.

<?php

function colliding(int $n, int $shift): array {
$a = [];
for ($i = 0; $i < $n; $i++) {
$a[$i << $shift] = $i;
}

return $a;
}

function distributed(int $n): array {
$a = [];
for ($i = 0; $i < $n; $i++) {
$a[($i << 1) | 1] = $i;
}

return $a;
}

function bench(array $a, string $label): void
{
$keys = array_keys($a);
shuffle($keys);

$t0 = hrtime(true);
$s = 0;

foreach ($keys as $k) {
$s += $a[$k];
}

$dt = (hrtime(true)-$t0)/1e9;
printf("%s: %d lookups in %.4f s (sum=%d)\n", $label, count($a), $dt, $s);
}

$n = 10000;
$shift = (PHP_INT_SIZE===8) ? 20 : 12;
$A = colliding($n, $shift);
$B = distributed($n);

bench($A,'colliding');
bench($B,'well-distributed');

Compare the results obtained. The access speed to array elements differs by 260 times (PHP 7.4) and 178 times (PHP 8.2).

How arrays work with memory and how they grow

Packed (vector)

Values are stored sequentially in one continuous block of memory. The array has a capacity.

As long as there is free capacity, new elements are simply appended to the end (O(1) complexity).

When the capacity is exceeded, a larger memory block is allocated and the existing elements are copied into it. This operation is expensive (O(n)), but it happens rarely. On average, insertion still remains O(1).

Hash (hash table)

The structure tracks the load factor.

When the number of elements becomes too large, the number of buckets is increased (usually ×2), and all elements are redistributed across the new table.

Deletions leave “holes” (slots marked as empty). PHP tries to reuse them, and occasionally the table may be cleaned up.

The load factor is the ratio of the number of elements to the number of buckets. When this value exceeds a certain threshold, PHP increases the size of the table to reduce the number of collisions.

The Copy-on-Write principle

Imagine you have an array with one million records and you pass it to a function. To improve performance, PHP does not copy the array immediately — instead, it creates a reference to the same array.

However, if you attempt to modify an element, the copying process is triggered, which may take noticeable time.

The same thing happens with regular copying:

$b = $a;

This does not copy the elements immediately. Both variables point to the same underlying array (but the reference counter becomes greater than 1). The first modification of either copy triggers the actual duplication of the data.

This operation may cost O(n).

Loops with count() and loops with foreach

count()

You can often see code like this:

for ($i = 0; $i < count($a); $i++) { ... }

Here count($a) is placed directly in the loop condition.

You can make the code run faster (especially with large arrays) simply by storing count($a) in a separate variable. For arrays, count() itself is an O(1) operation. The number of elements is stored inside the hash table/list structure (in Zend HashTable it is the field nNumOfElements) and is updated during insertions and deletions. The function simply reads this ready value.

However, calling the function on every iteration still adds some overhead.

foreach

You may also encounter code like this:

foreach ($a as &$v) { ... }

Here $v is passed by reference. Sometimes people comment that this approach should be faster because a reference is used. But in reality, it’s not that simple.

A foreach by value does not copy the element data. It is a cheap operation: the engine only increases the reference counter of the zval. The actual bytes (strings/subarrays) are not copied unless you modify them.

A foreach by reference turns the elements into references. Even if you don’t modify them, the engine has to work with different semantics, which creates additional overhead and may break the packed mode for some operations. Therefore, you should not use & unless it is actually needed.

If you need to modify the original array in place, iterating by reference is a good option. But remember: if there are other references to the array (for example after $b = $a), the first modification will trigger a full array copy due to the copy-on-write mechanism.

What we will compare

Let’s try to compare different loop variants:

for with count() inside the loop vs for with count() stored in a variable;

foreach by value vs foreach by reference (when the reference is not actually needed).

<?php

ini_set('memory_limit', '1024M');

printf(
"PHP %s, int-size=%d, JIT=%s\n",
PHP_VERSION,
PHP_INT_SIZE,
ini_get('opcache.jit') ?: 'off'
);

function bench(string $label, callable $fn): void
{
// warm-up
$fn();

$t0 = hrtime(true);
$res = $fn();
$dt = (hrtime(true) - $t0) / 1e9;

printf("%-35s %.6f s (checksum=%s)\n", $label . ':', $dt, $res);
}

function makeList(int $n): array {
// Sequential keys 0..n-1 (packed array)
$a = [];
for ($i = 0; $i < $n; $i++) {
$a[$i] = $i;
}

return $a;
}

$n = 500_000;
$a = makeList($n);

// 1) for with count() in the condition
bench("for (count() in each iteration)", function() use ($a) {
$sum = 0;

for ($i = 0; $i < count($a); $i++) { // function call on each iteration
$sum += $a[$i];
}

return $sum;
});

// 2) for with count() stored in a variable
bench("for (count() cached)", function() use ($a) {
$sum = 0;
$len = count($a); // computed once

for ($i = 0; $i < $len; $i++) {
$sum += $a[$i];
}

return $sum;
});

// 3) foreach by value
bench("foreach by value", function() use ($a) {
$sum = 0;

foreach ($a as $v) {
// only the zval container is copied (refcount++), data is not copied
$sum += $v;
}

return $sum;
});

// 4) foreach by reference
bench("foreach by reference (no modification)", function() use ($a) {
$sum = 0;

foreach ($a as &$v) {
// elements become references; some optimizations are lost
$sum += $v;
}

unset($v); // important: clear the reference after foreach

return $sum;
});


The results show that using count() provided a slight speed boost. However, it's important to understand that the measurement was conducted on a large array.

On the other hand, using foreach by reference versus by value demonstrated a fourfold difference. That's quite significant.

Sometimes it's worth manually converting an array from a hash to a packed array. For this, you can use the array_values() function.

Let's compare how much memory each type of array requires. Below is an example.

<?php

function usage()
{
printf("mem=%.1f MB\n", memory_get_usage()/1024/1024);
}

$n = 200_000;
$a = [];
usage();

for ($i = 0; $i < $n; $i++) {
$a[] = $i; // packed
}

echo " packed: ";
usage();

$a['x'] = 1; // conversion to hash
echo "Switching to hash: ";
usage();

// Return to packed
$a = array_values($a);
echo "After array_values (back to packed): ";
usage();


This example illustrates the difference in memory usage. Note how much more efficient PHP 8.2 is. Another nuance is that memory is allocated in blocks. If in the above example you set it to 200K elements instead of 250K, the memory test results will not change.

### How PHP Increases Memory as Array Size Grows
Let's examine when and by how much PHP increases memory and copies the array as the number of elements in it increases.

<?php
$arr = [];
$prev = memory_get_usage();

echo "Element\tMemory\t\tDelta\n";
echo "==========================================\n";

for ($i = 0; $i < 100000; $i++) {
$arr[] = $i;
$curr = memory_get_usage();
$delta = $curr - $prev;

if ($delta > 0 && $i > 0) {
echo "$i\t\t" . number_format($curr) . " B\t" . number_format($delta) . " B\n";
}

$prev = $curr;
}
?>

An array in PHP is either a fast list (packed) or a universal hash table (hash).

By choosing keys and the insertion method, you influence the internal structure, which affects speed and memory.

Understanding when expansion or hash updates occur, and how copy-on-write works, helps to write predictable and efficient code.
 
Top Bottom