Methodology – The Swing Is On

Foundation

The classification system

The central unit across all three models is the classification – a grouping of candidates by their political positioning and expected preference behaviour. Seven classifications are used. They were designed to separate voter groups with meaningfully different preference flow patterns while remaining tractable given the available data.

Classification	Includes
Labor	Australian Labor Party
Coalition	Liberal Party, National Party, Liberal National Party (QLD), Country Liberal Party (NT)
Greens	Australian Greens
One Nation	Pauline Hanson's One Nation
Other, Left-leaning	Animal Justice Party, Legalise Cannabis Party, Climate 200 and Muslim Votes Matter-backed candidates, other left-leaning parties and independents whose preferences flowed predominantly to left-of-centre candidates
Other, Right-leaning	Trumpet of Patriots, Family First, Libertarian Party, other right-leaning candidates and independents whose preferences flowed predominantly to right-of-centre candidates
Other, Unaligned	Candidates with no clearly discernible ideological positioning, or where limited publicly available information existed on their political beliefs, and whose preference flows showed no consistent directional pattern

How candidates were classified for 2025

Major-party candidates (Labor, Coalition, Greens, One Nation) were assigned by party membership. The three "Other" classifications were assigned to independents and minor parties using a sequential four-step process.

For independents, the steps were:

If 65% or more of the candidate's preferences flowed to a left-leaning candidate over a right-leaning candidate in the seat's final 2CP, assign Other, Left-leaning – and vice versa for Other, Right-leaning. This 2CP-based step is primarily useful for sorting past candidates where detailed policy information may no longer be readily available. For future candidates in similar moulds, the same thresholds are indicative of likely classification, but steps 2–4 (endorsements, stated positions) should take precedence where that information exists.
If still undetermined, consider endorsements or funding from ideologically aligned groups (e.g. Climate 200, Muslim Votes Matter).
If still undetermined, review the candidate's ABC candidate statement or linked website for positions on immigration, climate action, social issues, and similar.
If still undetermined, classify as Other, Unaligned.

For minor parties, the same four steps applied, but step 1 used the party's two-party preferred (2PP) flow between Labor and Coalition nationally, rather than an electorate-specific 2CP result.

The Coalition grouping covers four separate entities – Liberal, National, LNP, and CLP – because they effectively operate as a joint force at the federal level, rarely contest House of Representatives seats against each other, and have sufficiently similar preference-receiving patterns to be treated as one classification in the model.

The three "Other" classifications each cover a range of candidates whose preference flows vary meaningfully within the group. These groupings exist because aggregate preference behaviour differs meaningfully between them; within each group, individual candidate behaviour will vary, and the model uses group averages that will not perfectly represent any single candidate.

Run a Race

Preference distribution model

The preference simulator eliminates candidates one by one, lowest to highest, and at each round distributes the eliminated candidate's votes to the remaining candidates using a twelve-rule hierarchy. Rules are tried in order from highest to lowest confidence; the first that produces a valid result is used. The badge in each round's output shows which rule fired and why.

The key design insight is that preference flows depend on who is eliminated and who remains – not just the eliminated candidate's classification in isolation. A Greens voter chooses differently when the remaining field is [Labor, Coalition] than when it is [Labor, One Nation, Coalition].

Rule 1

2025 AEC data – exact match (n ≥ 5)

The model searches 133 observed combinations from the 2025 AEC distribution of preferences data for an exact match with five or more instances (n ≥ 5) – the same eliminated classification and the same multiset of remaining classifications. If found, the observed preference flows are applied directly.

The most common combinations: Greens eliminated, [Coalition, Labor] remaining (n=72, 81.5% to Labor); One Nation eliminated, [Coalition, Greens, Labor] remaining (n=69, 61.9% to Coalition).

The n ≥ 5 threshold reflects the principle that more observations produce more reliable estimates. Confidence: 78–95%, scaling with n and observed variance.

Rule 2

2025 AEC data – nearest match (n ≥ 5)

If no exact n ≥ 5 match exists, the model finds the nearest match using the Jaccard index on the multisets of remaining classifications, requiring at least 60% similarity. A higher similarity score is preferred; ties are broken by observation count.

A coverage threshold also applies: the matched scenario must direct at least 50% of its flow to classifications present in the current field. This applies to Rules 2, 4 and 6.

The 60% Jaccard floor – roughly "more elements in common than not" – was calibrated empirically; lowering it further risks matching structurally different fields. Confidence: 52–75%, scaling with similarity score and observed variance.

Rule 3

2025 AEC data – exact match (n = 2–4)

If no match with n ≥ 5 exists, the model repeats the exact-match search in the 2–4 observation range. A coverage check applies: if the matched scenario assigns no share to a classification present in the current field, the result is discarded and the search continues.

Confidence: ~55–68%.

Rule 4

2025 AEC data – nearest match (n = 2–4)

If no exact n = 2–4 match exists, the model runs a Jaccard nearest-match search restricted to 2–4 observation scenarios, again requiring ≥ 60% similarity. The same coverage check applies.

Confidence: ~40–65%.

Rule 5

2025 AEC data – exact match (n = 1)

A final exact-match search at the single-observation level. These flows derive from one observed instance; the coverage check still applies.

Confidence: ~47–60%. Single-observation exact matches carry high variance and should be treated with caution.

Rule 6

2025 AEC data – nearest match (n = 1)

A Jaccard nearest-match search restricted to n = 1 scenarios. Both the sparse sample and the indirect match make results here the least reliable of the data-driven rules.

Confidence: ~28–58%. Treat results from this rule as highly approximate.

Rule 7

Bloc-equivalent swap

No data match is available at any sample size. The model substitutes ideologically equivalent classifications – Labor↔Greens (left-bloc swap), Coalition↔One Nation (right-bloc swap), or both simultaneously – finds a match on the swapped scenario, then re-labels outputs back. Example: Coalition eliminated in [Greens, Labor, One Nation] has no direct data, but One Nation eliminated in [Coalition, Greens, Labor] does (n=69). The right-bloc swap yields: One Nation ~62%, Greens ~20%, Labor ~18%.

Confidence: 45%. The left-bloc assumption (Labor ≈ Greens) is stronger than the right-bloc (Coalition ≈ One Nation).

Rule 8

Ideological proxy

For Other, Left-leaning, the model uses Labor's observed flows as a proxy (falling back to Greens). For Other, Right-leaning, it uses Coalition's flows (falling back to One Nation).

Confidence: 30–35%. Other Left-leaning and Other Right-leaning cohorts are known to have high preference variance.

Rule 9

Unaligned interpolation

Only for Other, Unaligned eliminated candidates. Rules 1–8 are run independently for Other, Left-leaning and Other, Right-leaning as proxies and the two distributions are averaged.

Confidence: 30%.

Rule 10

Same-class loyalty

No match exists via any data search or equivalent swap, but at least one remaining candidate shares the eliminated candidate's classification. 80% of the eliminated votes are assigned to same-classification candidates; the remaining 20% are split equally among candidates of all other classifications.

Confidence: 25%.

Rule 11

Ideological bloc alignment

No same-classification candidates remain in the field. Votes are distributed using calibrated bloc weights: left-leaning classifications (Labor, Greens, Other, Left-leaning) receive higher weight when a left-leaning candidate is eliminated; right-leaning classifications (Coalition, One Nation, Other, Right-leaning) when a right-leaning one is. The weights (1.0 within-bloc : 0.43 unaligned : 0.18 cross-bloc) are applied at the bloc level and then divided equally among candidates within each bloc.

Confidence: 20%.

Rule 12

Equal split

No other rule produces a valid result. The eliminated candidate's votes are split equally across all remaining candidate classifications. This fires only in highly unusual field configurations where all other rules are inapplicable.

Confidence: 10%.

Manual Override

SA 2026 state election calibration

Six specific scenarios that lack sufficient 2025 federal data but have reliable preference flow data from the 2026 South Australian state election (computed from official SAEC distribution of preferences data) are available as calibration overrides. They are applied by default in Model an Election, Run a Race, and Track a Poll, and can be toggled off in the "Edit preference flows" panel:

Coalition→[Labor, One Nation]: 33.4% to Labor, 66.6% to One Nation (n=11)
Greens→[Labor, One Nation]: 80.4% to Labor, 19.6% to One Nation (n=12)
Labor→[Coalition, One Nation]: 71.5% to Coalition, 28.5% to One Nation (n=2)
Coalition→[Greens, Labor, One Nation]: 18.9% to Greens, 28.2% to Labor, 52.9% to One Nation (n=13)
Labor→[Coalition, One Nation, Other Left-leaning]: 19.1% to Coalition, 13.2% to One Nation, 67.7% to Other, Left-leaning (n=2)
Labor→[Coalition, One Nation, Other Right-leaning]: 32.5% to Coalition, 21.5% to One Nation, 46.0% to Other, Right-leaning (n=2)

In opinion polls, One Nation's share of the national primary vote has increased substantially from its 2025 level, making these scenarios materially more common. All six lack a qualifying n ≥ 5 match in 2025 federal data, so the overrides do not displace any observed federal data. Compulsory preferential voting data from the Victorian state election (November 2026) will be considered for incorporation into the model as it becomes available.

Confidence: 70%.

Note that the data input here is the transfer of votes when a candidate is eliminated, which includes both their primary votes and votes they may have received from previously-eliminated candidates. For example, a Coalition candidate that received an unusually high number of Greens preferences may, upon elimination, have a greater vote transfer to a left-leaning candidate over a right-leaning candidate than expected.

These skews are more impactful when the candidate being eliminated has received a greater portion of their votes (at that count) from other candidates. It is less significant for candidates that already had large primary votes, and has no impact on candidates eliminated in the first round. When preference flows are aggregated across many seats, these skews should smooth out (especially in more common elimination patterns) – but remain an element to watch out for. A term like “One Nation voters” in this preference distribution context might more literally be translated as “voters who preferred One Nation above all other remaining candidates”.

Internal consistency – 2025 federal election

Running the model back over all 150 House of Representatives seats from the 2025 federal election – using the actual primary vote counts as inputs – shows how consistently it produces results that match the preference flows it was trained on. Because the model is calibrated on the same data, this is a consistency check rather than an independent accuracy test.

Metric	Result
Correct final 2CP pair identified	144 / 150 seats (96.0%)
Correct winner within correctly paired seats	143 / 144 (99.3%)
Correct winner overall	148 / 150 seats (98.7%)
2CP within ±1pp of actual result	79 / 144 correctly-paired seats (55%)
2CP within ±2pp of actual result	114 / 144 correctly-paired seats (79%)
2CP within ±5pp of actual result	144 / 144 correctly-paired seats (100%)
Mean absolute error (2CP)	1.19pp
Systematic bias (2CP)	−0.54pp (model slightly underestimates winner's margin)

The slight negative bias – the model marginally underestimates the winner's margin on average – reflects a structural feature of how preference flows are measured. The observed data flows are cross-seat averages: in safe seats the winner tends to attract stronger-than-average preferences, while in marginal seats they attract weaker ones. When the model applies the average flow to each seat, it systematically underestimates comfortable winners. Because Australian federal seats skew towards comfortable wins rather than marginal contests, the net effect is a small but consistent negative bias – a form of regression to the mean built into the averaging process.

The 2 incorrect winner predictions – Cowper and Forrest – were both extremely tight contests, both Low confidence, and both involved Other Left-leaning independents in rural or regional seats. The model slightly over-estimated how strongly progressive preferences would flow to these candidates compared to the urban and suburban independents that dominate the Other Left-leaning category nationally.

Important note: The model was calibrated on the same 2025 election data used in this check. These figures measure internal consistency – how closely the model's outputs reflect the data it was built from – not out-of-sample predictive accuracy. Real-world performance on future elections may differ.

Reliability rating

Each simulation produces a High, Medium, or Low confidence rating based on three factors: a vote-weighted average of rule quality scores across all elimination rounds; penalties for tight elimination margins at the critical penultimate round (which determines the final pair); and a penalty for a very small predicted final margin.

High confidence

98/98

correct 2CP pair

98/98

correct winner

98 seats in 2025 backtest

Medium confidence

37/40

correct 2CP pair

40/40

correct winner

40 seats in 2025 backtest

Low confidence

9/12

correct 2CP pair

10/12

correct winner

12 seats in 2025 backtest

Model an Election

Swing modelling

The swing model applies user-specified vote swings to the actual 2025 primary vote distribution across all 150 House of Representatives seats, then runs the preference simulator on the resulting adjusted votes to produce a modelled national seat count.

Demographic data

Each swing rule can target a specific demographic group or all voters. Demographic fractions are drawn from the 2021 Australian Census at the electorate level, covering 17 variable groups and 82 individual characteristics – including combined age-and-sex cohorts, ancestry, country of birth, language spoken at home, religion, educational attainment, labour force status, personal income, family household composition, housing tenure type, and household income.

The swing algorithm

Each swing rule specifies five parameters:

Swing percentage – the share of the specified voter group to shift
Demographic variable – which Census characteristic to target (or "all voters")
Geographic scope – all seats, a specific state, or a geography type
Source classification – which party's votes to move from
Destination classification – which party to move them to

For each seat in scope, when "all voters" is selected:

votes_shifted = (swing_pct / 100) × source_votes_in_seat

When a demographic variable is selected, the model instead estimates how many of the source classification's actual voters belong to that demographic group, using location-calibrated vote rates from the same ecological regression model that powers Build a Voter:

votes_shifted = (swing_pct / 100) × min(demographic_fraction × total_votes × P(source | demographic, location), source_votes_in_seat)

Here demographic fraction is the proportion of the seat's enrolled voters in the specified Census category, and P(source | demographic, location) is the estimated probability that a voter in that demographic group, in that location, voted for the source classification. These rates are pre-computed per demographic category for each of the 150 electorates, eight states and territories, and four geography types. The product of the three terms estimates how many voters in a given seat are both in the target demographic and voted for the source classification – for example, a Greens swing among 18–24 year olds correctly accounts for the fact that Greens voters skew substantially younger than the electorate as a whole, and that this skew varies meaningfully by seat. The shifted votes are deducted from the source classification and added to the destination, capped at available source votes.

Multiple overlapping swing rules

When multiple rules target the same source classification in the same seat, all rules are applied simultaneously rather than sequentially. The combined fraction removed from the source is:

f_total = 1 − ∏(1 − f_i)

where f_i is the effective fraction for rule i (the swing percentage scaled by the demographic rate where applicable). Because each f_i lies between 0 and 1, f_total is always less than 1 – over-extraction from the source is mathematically impossible regardless of how many rules are stacked. The total votes removed are then distributed proportionally across destination classifications according to each rule's individual fraction.

Note that this formula treats demographic groups as independent. If two rules target the same source with different demographic filters (for example, young voters and renters), voters who belong to both groups may be counted in both fractions. The model flags this with a warning when it occurs.

Estimating swings from a target vote distribution

The swing model includes a secondary tool that inverts the normal process: rather than building swing rules manually, you enter a complete target primary vote distribution for your chosen demographic or geographic scope, and the tool works out which combination of swings most plausibly produced it. The resulting swing cards are added to the canvas and behave identically to hand-crafted rules.

To do this, the tool first computes a baseline – the estimated primary vote distribution for the selected scope under 2025 conditions – using the same voter profiling model as Build a Voter. For "all voters" with no demographic filter, the baseline is simply the population-weighted average of actual 2025 primary vote shares across the seats in scope. When a demographic group is selected, the model draws on location-calibrated rates for that category; when two characteristics are selected simultaneously, the same logit-additive formula applies (see Voter profiling). Classifications with no candidates in scope are excluded.

Once the baseline is established, each classification is labelled a loser or a gainer based on whether its target share is below or above baseline:

Δ_k = t_k − b_k

where b_k is the baseline share and t_k the target share. Differences smaller than 0.005 percentage points are treated as rounding noise and ignored.

When there are multiple losers and multiple gainers, the swing is not uniquely determined by the aggregate changes alone – the model also needs to decide how much of each loser's decline went to each gainer. It resolves this using the preference flow data as a guide: for each loser–gainer pair (L, G), it calculates a propensity – the observation-weighted average fraction of L's eliminated preferences that historically flowed to G in 2025 AEC data:

p(L→G) = Σ_j [ f_j(G) × n_j ] / Σ_j n_j

where j indexes AEC observations where L was the eliminated classification and G was still in the field. This reflects how naturally L's voters gravitate toward G when L is no longer an option. Where no AEC data exists for a given pair, a uniform prior is used instead.

These propensities seed a transfer matrix T[L][G] – the percentage points of primary vote moving from each loser L to each gainer G. The matrix must satisfy two constraints simultaneously: each row must sum to L's total loss, and each column must sum to G's total gain:

Σ_G T[L][G] = |Δ_L| for each loser L (row sums match total loss) Σ_L T[L][G] = Δ_G for each gainer G (column sums match total gain)

Starting from T₀[L][G] = p(L→G) × |Δ_L|, the algorithm alternately rescales columns to meet the gain constraints and rows to meet the loss constraints, repeating until convergence (the Sinkhorn algorithm). The result is the transfer matrix most consistent with the propensity priors that also exactly matches the observed gains and losses.

Each cell is then converted to a swing card expressed as a percentage of L's 2025 baseline vote:

pct[L→G] = T[L][G] / b_L × 100

This puts it in the same format as any other swing rule – "X% of L's voters moved to G" – for the chosen scope. Swings involving classifications with no candidates in scope are not generated.

Track a Poll

Polling average

The Track a Poll page shows a rolling average of published opinion polls since the 2025 federal election. Rather than a simple moving average, each poll's contribution to the average at any given date decays smoothly with time using a Gaussian kernel – so polls from three weeks ago still count, just less than polls from yesterday.

Gaussian kernel

The kernel has a standard deviation (σ) of 14 days. At exactly 14 days from the target date, a poll carries about 61% of the weight it would have if dated exactly on that day; at 28 days, about 14%. This produces a smooth average that is meaningfully reactive to recent polling without discarding older data abruptly.

The 2025 AEC election result is included as an anchor observation with three times the weight of a standard poll. This ensures the average starts at the known result rather than jumping erratically when only one or two polls have been published.

Frequency damping

Pollsters vary in how often they publish. A house that releases weekly would otherwise crowd out less frequent competitors even when all are equally reliable. To prevent this, each poll is assigned a frequency weight of 1/√n, where n is the number of polls from the same pollster (or pollster family) published within a 28-day window centred on that poll. A pollster's fourth poll in a busy window carries half the weight of its first.

MRP releases (flagged separately) are treated as distinct from standard polls from the same house, reflecting the different methodology and sample size.

Outlier damping

Different pollsters use different questionnaire designs – some prompt minor parties more explicitly, some collapse small parties into "Other" differently, and some use different conventions for handling undecided respondents. These structural differences can produce one pollster's figures sitting consistently outside the range of everyone else on a particular field.

To reduce (but not eliminate) the influence of these outlying readings, each poll is compared to a leave-one-out local mean: the average of all other polls published within 28 days of it. If a poll deviates from this local consensus by more than 3 percentage points on any party field, its weight is reduced by a smooth penalty:

penalty = 1 / (1 + max(0, deviation − 3) / 5)

A poll 5pp off the local consensus retains about 71% of its base weight. A poll 10pp off retains about 44%. No poll is excluded entirely. The penalty is computed using the largest deviation across all five party fields, so a poll that is consistently close to the field on four parties but far off on one still incurs a penalty.

This does not fully correct for systematic house effects – if a pollster consistently reads 4pp higher than everyone else on One Nation across every poll in the dataset, outlier damping will penalise each individual poll modestly but will not remove the house effect from the average altogether. It is a smoothing mechanism, not a house-effect correction.

Seat projection

The simulate button on Track a Poll applies the rolling average at the selected date as a uniform national swing from 2025, then runs that swing through the same preference simulator used by Model an Election. The "Other" total is decomposed into Left-leaning (52.7%), Right-leaning (43.3%) and Unaligned (4.0%) in proportion to the 2025 composition before preferences are distributed.

The seat totals should be treated as illustrative. The model applies the same swing uniformly to every seat; it does not account for local factors, candidate effects, or the fact that national swing is rarely evenly distributed geographically. Seats near the margin are particularly sensitive to small movements in the average.

Build a Voter

Voter profiling

Build a Voter uses ecological regression to estimate vote probabilities for a hypothetical voter defined by their location and demographic characteristics. The model operates in three stages: regression, national calibration, and per-location calibration. The final output is a set of pre-computed rates – one per demographic category per electorate – that directly reflect the actual 2025 primary vote distribution in each location.

Stage 1 – Ecological regression

For each demographic variable (age, ancestry, income bracket, etc.), a weighted multinomial logistic regression is run on 2025 booth-level data. The regression models the relationship between the proportion of a booth's enrolled citizens in a given demographic category and that booth's vote shares across all seven classifications. Booth size is used as the regression weight.

The regression is specified in logit space and fitted independently for each demographic category using L-BFGS-B optimisation. The core output for each (demographic category × classification) pair is a predicted vote rate at the 95th-percentile population proportion for that category across all booths – denoted rate‑p95. This within-range prediction is used in preference to the extreme rate-in (proportion = 100%) or rate-out (proportion = 0%) values, which are extrapolations beyond the observed data range.

Stage 2 – National calibration

The raw regression estimates are adjusted to correct for ecological confounding. Because certain demographic groups are geographically concentrated – young people in inner-city seats, renters in high-Greens electorates – the regression can capture location effects that have nothing to do with the demographic characteristic itself. Without correction, a model trained purely on booth-level data would systematically overestimate, for example, the Greens vote among young voters simply because young voters happen to live in Greens-leaning areas.

The calibration anchors each rate in logit space so that the model's implied prediction at the national average proportion for a category exactly matches the actual 2025 national primary vote shares. For each category c and classification k:

logit(rate‑p95‑cal[k]) = logit(rate‑p95[k]) + logit(ActualNat[k]) − logit(NatImplied[k])

where NatImplied is the model's raw prediction evaluated at the national average proportion.

Stage 3 – Location calibration

The nationally calibrated rates are then further adjusted for each specific electorate (or other selected geography). For each location, an offset vector – one value per classification, in logit space – is found iteratively such that the weighted average of the location-adjusted demographic rates, across all demographic categories at their local proportions, equals the actual 2025 primary vote distribution for that location.

This means the demographic rates for a seat like Indi – which recorded 45% Other, Left-leaning – are anchored to that 45% base, not the national 8% average. A 25–34 year old in Queensland is predicted relative to Queensland's actual Labor vote, not the national Labor share. The adjustment converges in a small number of iterations and is run for every electorate, state, geographic type, and the national aggregate.

The resulting per-category, per-location rates are pre-computed and committed to the repository. The tool retrieves the appropriate row for the user's selected location at runtime – no calculation is performed in the browser beyond the final multiplicative combination.

Sex and age variables

The ecological regression approach cannot identify sex-based voting differences because the male/female ratio is nearly constant across all booths nationally (R² ≈ 0.0003) – there is simply no variation in the predictor from which to estimate an effect. Sex rates are instead drawn from the 2025 Australian Election Study (AES), a post-election survey of a nationally representative sample of voters. The AES estimates are then location-calibrated using the same iterative offset method as the ecological regression variables, so Male and Female rates for each electorate are anchored to that electorate's actual vote distribution.

Age and age-by-sex variables also use AES survey rates rather than ecological regression at the national level. The ecological regression for Age correctly identifies that younger demographics lean left and older demographics lean right, but cannot separate Labor from Greens support within the same geography – young people in inner-city seats are concentrated in areas that lean Labor overall, so the regression systematically over-attributes their left lean to Labor and under-estimates Greens. AES cross-tabulated data directly captures this split at the individual level, giving materially more accurate age-based predictions nationally.

Two-characteristic model

When two characteristics are selected, the model uses a multiplicative combination of both location-calibrated rates, with the local base vote as the reference point:

rate‑AB_L,k ∝ rate‑A_L,k × rate‑B_L,k / V_L,k

where V_L is the actual 2025 primary vote distribution for the selected location, and rate‑A_L and rate‑B_L are the location-calibrated rates for each characteristic. The result is normalised to sum to 100%. This formulation ensures that selecting a single characteristic returns its calibrated rates directly, and that both characteristics adjust the base proportionally. Cross-tabulation data from the 2021 Census is used to estimate the joint prevalence of the two characteristics for pool size estimation.

The result is deterministic – the same profile always produces the same result.

Geography selection

Location calibration is built into the pre-computed rates, not applied as a post-hoc scaling factor. When a specific electorate is selected, the tool retrieves rates that were calibrated to that electorate's actual 2025 primary vote distribution. When a state or geographic type (inner metropolitan, provincial, etc.) is selected, rates calibrated to the aggregate vote distribution for that group are used. Selecting "All seats" uses rates calibrated to the national primary vote. A classification that did not field a candidate in the selected geography has a zero base rate and will not appear in the result.

Data sources

The 80 demographic variables used in the voter profiling model are drawn from the 2021 Australian Census, counting persons by place of usual residence where possible, and by place of enumeration where required (for variables set at the dwelling level), covering Australian citizens aged 18 or over. Cross-tabulation conditional probabilities for the two-characteristic logit-additive model are also derived from the 2021 Census at the national level.

Regression models are estimated from booth-level data compiled from the 2025 AEC first preference results matched to 2021 Census SA1 and SA2 geographic boundaries via a spatial weighting scheme. Each booth is assigned a weighted mix of SA1 areas that fall within its polling catchment, and these weights are used to aggregate Census proportions to the booth level. A total of 574 regression models were estimated, covering 82 demographic categories across 15 Census variables. Sex, age, and age-by-sex categories (20 categories in total) are sourced from the 2025 AES rather than the Census regression. National calibration offsets and per-location calibration offsets are pre-computed for all 168 locations (150 electorates, 8 states and territories, 4 geographic types, and the national aggregate).

Variables not fully tracked or adjusted

Several variable groups include only a subset of their possible response options in the user interface – either because the full list is very large (ancestry, language used at home, country of birth) or because several options have very small population shares (religious affiliation, tenure type). Where options have been excluded from the interface, they were still retained in the underlying demographic proportions used to compute per-seat totals: the tracked options were not rescaled to fill the gap. Ecological regression models were simply not run for the excluded categories.

The only responses actually removed from the dataset – and the remaining categories subsequently renormalised to sum to 100% – are "not stated" and "inadequately described" responses, which appear across multiple Census variables and carry no meaningful demographic signal.

In addition, several response options were merged together where intuitive to do so. This applies for age, country of birth of parents, highest educational attainment, labour force status, personal income, family household composition, and household income.

Variables not included in the model

Several Census variables were considered but excluded from the model:

Marital status, student status, Defence Force service, unpaid child care, and need for assistance with core activities: excluded due to ecological confounding – booth-level proportions for these variables are highly concentrated, with most booths recording very similar values, leaving little variation from which to estimate meaningful effects.
Indigenous Status (INGDP): not included as a standalone variable to avoid overlap with the Aboriginal Australian ancestry option. It is acknowledged that Indigenous Status is more expansive than the ancestry variable – it also captures Torres Strait Islander peoples and those who identify as Aboriginal but did not nominate one of the major ancestry options (since the Census ancestry question allows a maximum of two responses).
Employment variables (public/private sector, industry, occupation, method of travel to work): excluded because a significant portion of the population is either unemployed or not in the labour force (both captured by the Labour force status variable).
Long-term health conditions: excluded due to the very wide and disparate range of possible conditions, making aggregate modelling unreliable.

Each of these decisions may be revisited in future versions of the model as the data sources and methods are refined.

Methodological notes

Ecological inference

This method is a form of ecological inference – inferring individual behaviour from group-level statistics. Each regression uses only booth-level aggregate data: the proportion of a booth's population with a given characteristic, and the booth's vote shares. The assumption is that these correlations, observed across thousands of booths, reflect genuine demographic voting patterns. This is imperfect: renters in a high-renting, high-Greens electorate are assumed to be more likely to vote Green, but not every renter in that electorate did. The model reflects observed correlations at the booth level, not certainty about any individual voter.

Dwelling variables

Three variable groups – family household composition, tenure type, and household income – are set at the dwelling level in the Census, meaning all usual residents of a dwelling share the same value for these variables. Person-level counts are available (the Census records how many people live in each household type), and these are used to derive accurate population totals for each category. However, the underlying attribute still reflects the household rather than the individual: a person in a high-income household is recorded as high-income regardless of their own earnings, and a renter is recorded as renting regardless of whether they are the leaseholder. These variables should be read as indicators of the household environment rather than confirmed individual characteristics.

Known limitations

Where the model is most likely to be wrong

Voter participation rates

The voter profiling model assumes an even distribution across all demographics of eligible people (Australian citizens aged 18+) who are enrolled, who turn out to vote, and who cast formal votes. In practice, these participation rates vary by demographic group. The AEC publishes electorate-level enrolment, turnout, and informal vote rates for the 2025 federal election, and it is likely that participation varies systematically with demographic characteristics – for example, lower enrolment and turnout rates among voters with Aboriginal Australian ancestry, and higher rates of informal voting among voters with lower educational attainment.

However, AEC participation data is reported at the electorate level and does not map neatly onto Census demographic variables. Given this granularity limitation, no adjustment is made for differential participation rates. This means the model's results reflect shares of the Census citizen population aged 18+, not actual voter composition – which will differ to the extent that participation is uneven across demographic groups.

Within-category variance in "Other" classifications

Each "Other" category covers a range of candidates whose preference flows vary meaningfully within the group. The model uses group averages and cannot distinguish between sub-types without additional manual input. Results in seats with unusual "Other" candidates – particularly new independent candidacies with no established preference pattern – should be interpreted with this in mind. The reliability rating will typically flag these as Medium or Low due to higher rule variance, but not always.

Other considerations

Unusual candidate field configurations – the twelve-rule hierarchy covers the vast majority of scenarios observed in 2025, but new party formations or unusual multi-candidate fields may fall through to lower-confidence rules. Check which rule fired in each round – the rule badge and confidence note in the simulation output indicate where the model is drawing on thinner data.
Compulsory preferential voting only – the model assumes all votes reach the final two candidates, as required under compulsory preferential voting (CPV) in Australian federal House of Representatives elections. It cannot model optional preferential voting (OPV) systems – used in some state lower houses – where votes may exhaust before the count resolves.
Primary votes are an input, not an output – the simulator models what happens to votes after the count begins. Predicting primary vote shares at a future election is a separate problem, and uncertainty in that prediction is not captured in the reliability rating. The swing model partially addresses this by letting you specify hypothetical primary vote shifts – but determining what those shifts should be is the user's judgement call.
Preference flows assumed constant under swing scenarios – the model assumes that preference flows from each classification remain unchanged even when the composition of a classification's primary vote changes due to swings. It is ultimately a modelling tool trained on historic data, not predictive of future changes in preference behaviour. Preference flows can be manually overridden if desired.