Technical annex Download

6 Data analysis

6.1 Questionnaire processing

As in previous years, questionnaire data collected online is logically prevented from containing data contrary to the questionnaire instructions (such as multiple responses to a question requiring a single answer).

Paper questionnaires are returned in supplied freepost Business Reply Envelopes (2nd class) to the scanning house. Envelopes are machine opened and questionnaires collated, guillotined and prepared for scanning. Any other items of correspondence are set aside for review and response by Ipsos or NHS England, as appropriate.

Paper questionnaires are scanned and processed using barcode recognition and Optical Mark Recognition technology, with operator verification of uncertain entries. All marks on the forms are recognised at this stage, regardless of whether they are in accordance with the questionnaire instructions.

Paper questionnaires were accepted and included if they were received by 1 April 2025, with the online survey closing the same day.

6.2 Inclusions and exclusions

The rules and protocols used for delivering the data for the 2025 reports are unchanged from 2024 and are as follows:

  • All completed online responses, along with all paper questionnaires received with identifiable reference numbers allowing linkage to a GP practice, are eligible for inclusion.

  • Returned questionnaire figures are based only on those qualifying for inclusion in the dataset as described in this document.

  • The published response rates are based on all completed, valid questionnaires returned and all questionnaires sent. They have not been adjusted to exclude questionnaires which did not reach the patient, e.g. where envelopes have been returned undelivered etc. However, weighted and adjusted response rates have also been included in Chapter 7: Response rates, which takes into account the selection likelihood and undelivered questionnaires. The following are excluded from the reports:

    • All questionnaires marked as completed by under-16s.
    • All questionnaires where there is only data for a limited number of questions (e.g. only the first page was completed).
    • All questionnaires where the barcode number was not in the valid range for the live wave of the survey.
    • All blank questionnaires.

Questionnaire data are combined from online and scanned data sources. Where duplicates between mode of completion exist, the data used are selected according to the case that is the most complete (i.e. with the fewest unanswered questions). If there is no difference in completeness, the data used are then selected according to a priority order with online data having precedence. Where paper duplicates exist, the earliest return is included.

6.3 Quality assurance

A number of checks were undertaken at key stages of the survey, including during the sample preparation and data cleaning stages. These help to identify obvious errors in the sample and response data, such as the inclusion of ineligible patients or incorrect coding.

6.4 Editing the data

The data editing procedures are unchanged for 2025, following the same principles as used previously.

For the completed paper questionnaires, there is a degree of completion error that occurs (e.g. ticking more than one box when only one response is required, answering a question that is not relevant, or missing questions out altogether). Therefore, it is necessary to undertake a certain amount of editing to ensure the data is logical. For example:

  • If a patient ticks more than one box where only one answer is required, then their reply for that question is excluded.

  • Where patients are allowed to select more than one box for a particular question, the reply for that question is excluded if they select two conflicting answers – for example, at Q5 (‘Which of the following online GP services have you used in the last 12 months?’), if a patient ticks any of the first seven options as well as ‘None of these’, then their response for that question is excluded. The following list shows the questions this applies to, as well as the response options that are treated as single code only:

  • Q5 ‘Which of the following online GP services have you used in the last 12 months?’ – ‘None of these’
  • Q14 ‘How did your GP practice deal with your request?’ – ‘I don’t know or I can’t remember’
  • Q15 ‘What did you do when you couldn’t contact your GP practice or didn’t know what the next step would be?’ – ‘I didn’t do anything’
  • Q18 ‘Did you do any of the following before trying to get an appointment with your GP practice?’ – ‘I didn’t do anything before trying to get an appointment with my GP practice’
  • Q19 ‘Were you offered the following choices?’ – ‘I didn’t need a choice’ and ‘I can’t remember’
  • Q30 ‘What was the outcome of the appointment on this occasion?’ – ‘No further action was needed’ and ‘I can’t remember’
  • Q33 ‘In the last 12 months, have you contacted or used an NHS service when you wanted care or advice from a healthcare professional at your GP practice but it was closed?’ – ‘No’
  • Q34 ‘Which of the following services did you contact or use on that occasion?’ – ‘I can’t remember’
  • Q37 ‘Have you experienced any of the following in the last 12 months?’ – ‘None of these’
  • Q39 ‘Which of the following long-term conditions or illnesses do you have?’ – ‘I don’t have any long-term conditions’
  • Q47 ‘Thinking about the last 12 months, which of the following services have you used a pharmacy for?’ – ‘None of these’
  • Q51 ‘Were you able to get an NHS dental appointment?’ – ‘Yes’ and ‘I can’t remember’
  • If all boxes are left blank the reply for that question is excluded.

  • If a patient fails to tick the relevant answer for a filter question any responses are excluded from the subsequent questions relating to the filter question. For example, if a patient responds to Q7 (‘How often do you get to see or speak to your preferred healthcare professional when you ask to?’) without having first responded ‘Yes’ at Q6 (‘Is there a particular healthcare professional at your GP practice you usually prefer to see or speak to?’), their response to Q7 is removed.

  • For the question on whether they have any physical or mental health conditions or illnesses (Q38), patients who initially answer other than 'Yes' have their answer recoded to 'Yes' if they went on to select any physical or mental health conditions or illnesses at Q39.

6.5 Weighting strategy

The overall approach to weighting for the 2025 survey was consistent with the previous waves, with one notable difference. This year, a small number of additional questions were asked online-only (and included in the accessible formats), but were not included in the paper survey (see Chapter 2: Questionnaire and material design). The online responses and responses received via accessible formats represent 83.3% of the total number of completed surveys.

As a result, a separate weight was created for these questions, because previous testing had shown the demographic profile of respondents who participated in the survey online was different to the profile of respondents to the survey overall. These weights were designed using the same overall approach but with the following differences:

  • the ‘all respondents’ weight, used for analysis of all questions asked via all formats (i.e. online, paper, large print, telephone), treats everyone who responded to the survey as a valid response for weighting.

  • the ‘online-only’ weight, used for analysis of questions asked online and via an accessible format, treats only those who responded using an eligible format (i.e. online, large print, telephone but not on paper) as a valid response.

As this is the first time that the ‘online only’ questions (i.e. those not included on the paper questionnaire) are being published, they have been classified as ‘Official Statistics in development’. The trustworthiness, quality and value of these will be evaluated with a view to removing the ‘in development’ label in time.

Both weights were generated using the same approach, to correct for the sampling design and to reduce the impact of non-response bias. These weights were calculated using the following three stages:

  • Step 1: creation of design weights to account for the unequal probability of selection.

  • Step 2: generation of non-response weights to account for differences in the characteristics of responders and non-responders.

  • Step 3: generation of calibration weights to ensure that the distribution of the weighted responding sample across practices resembles that of the population of eligible patients, and that the age and gender distribution within each Integrated Care System (ICS) matches the population of eligible patients within the ICS.

Design weights were computed to correct for the disproportionate sampling of patients by GP practice, as the inverse of the probability of selection, i.e. by dividing the total number of eligible patients in the practice at the time of sampling by the number sampled. This was the same for both the ‘all respondents’ and ‘online-only’ weights.

Non-response weights were constructed using a model-based approach to estimate the probability of taking part in the survey. These were calculated separately for the 'all respondents' and 'online-only' weights. These models, created using the current year’s data, estimated the probability of responding based on the age and gender of the patient and the socio-economic characteristics of the neighbourhood in which the patient lived. These weights aim to reduce the demographic and socio-economic differences between respondents and non-respondents.

Data from the GPPS sampling frame (patient’s age, gender, and region) was linked to external data using the home postcode of the patient. This consisted of measures from the 2021 Census: output area aggregated measures of ethnicity, marital status, overcrowding, household tenure and employment status, as well as the indicator of multiple deprivation score (IMD) and ACORN group.

The probability of response was estimated using a logistic regression model with eligible response (or not) as the outcome measure and the measures described above included as covariates. Standardised design weights were applied when running the model to obtain unbiased estimates for the coefficients - to ensure that the design effects were accounted for within the model (e.g. the oversampling of smaller practices).

The models allow us to identify patterns in non-response behaviour: female patients were more likely to respond than male patients, younger patients were less likely to respond than older patients. There were also some differences by region, with response lowest in London and the North West, and highest in the South West. Response was also lower in ACORN groups T (‘Constrained Pensioners’), S (‘Cash-strapped Families’) and U (‘Challenging Circumstances’).

Response also decreased for patients living in Census Output Areas (OAs) with the following characteristics:

  • higher levels of deprivation based on IMD scores.

  • a higher proportion of people from ethnic minority backgrounds.

  • a higher proportion of single, separated, or divorced people.

  • a higher proportion of households with three or more people.

  • a higher proportion of privately rented households; and/or

  • a lower proportion of employees.

The non-response weights were calculated as the reciprocal of the predicted probability of response estimated from the model. To avoid very large weights, the non-response weights were capped for the 1% largest values. The non-response weights were multiplied by the design weight to obtain the starting weights for the calibration.

The starting weights were then calibrated to practice population counts, and to population counts by age / gender within each ICS. These calibrations were run separately for the 'all respondents' and 'online-only' weights. The population totals used for the calibration were estimated from the sampling frame.

To avoid very large weights, the ratio of the calibration weights to their starting weights was trimmed at a value of 2.5. Finally, the weights were separately standardised to sum to their relevant sample size.

6.6 Unweighted and weighted respondent profiles

The tables below show the unweighted and weighted profiles of patients taking part in the survey. These are based on self-reported responses to the survey, with the exception of Index of Multiple Deprivation (IMD) which is derived from the individual’s postcode.

  Unweighted N Unweighted % Weighted N Weighted %
Which of the following best describes you?
Female 400,408 57.1% 360,765 51.4%
Male 292,693 41.8% 329,652 47.0%
Non-binary 1,134 0.2% 2,366 0.3%
Prefer to self-describe 1,013 0.1% 1,457 0.2%
I would prefer not to say 5,811 0.8% 7,163 1.0%
Is your gender identity the same as the sex you were registered at birth?
Yes 686,823 98.3% 684,042 97.8%
No 3,831 0.5% 6,038 0.9%
I would prefer not to say 7,904 1.1% 9,619 1.4%
Age
16-24 25,108 3.6% 63,011 9.0%
25-34 50,568 7.2% 115,433 16.5%
35-44 80,865 11.5% 125,511 17.9%
45-54 109,975 15.7% 117,517 16.7%
55-64 157,835 22.5% 115,547 16.5%
65-74 149,075 21.3% 82,719 11.8%
75-84 99,152 14.1% 57,777 8.2%
85+ 23,863 3.4% 19,003 2.7%
I would prefer not to say 4,997 0.7% 5,201 0.7%
Ethnicity
White 559,234 80.7% 535,903 77.1%
Mixed or multiple ethnic groups 10,652 1.5% 14,830 2.1%
Asian or Asian British 68,726 9.9% 80,637 11.6%
Black, Black British, Caribbean or African 37,393 5.4% 42,802 6.2%
Other ethnic group 10,065 1.5% 11,766 1.7%
I would prefer not to say 7,223 1.0% 8,793 1.3%
Sexuality
Heterosexual or straight 629,757 90.5% 613,319 87.9%
Gay or lesbian 11,146 1.6% 16,464 2.4%
Bisexual 9,585 1.4% 16,757 2.4%
Other 7,648 1.1% 9,591 1.4%
I would prefer not to say 37,513 5.4% 41,600 6.0%
Religion
No religion 215,312 30.8% 258,839 37.0%
Buddhist 4,704 0.7% 5,160 0.7%
Christian 376,462 53.9% 320,956 45.9%
Hindu 14,788 2.1% 16,850 2.4%
Jewish 3,738 0.5% 3,209 0.5%
Muslim 42,228 6.0% 49,700 7.1%
Sikh 6,681 1.0% 6,378 0.3%
Any other religion 8,047 1.2% 8,847 1.3%
I would prefer not to say 26,313 3.8% 29,351 4.2%
Carer
Carer 124,642 17.9% 113,350 16.2%
IMD deprivation quintiles
Most deprived 1 144,543 20.6% 148,054 21.1%
2 141,795 20.2% 147,684 21.1%
3 143,731 20.5% 140,855 20.1%
4 139,596 19.9% 133,618 19.1%
Least deprived 5 131,221 18.7% 130,314 18.6%
Disability
Disability 273,402 38.9% 249,128 35.4%
Number of long-term conditions
1 214,028 32.5% 209,348 31.9%
2 123,862 18.8% 104,513 15.9%
3+ 128,671 19.5% 98,793 15.0%

 

6.7 Confidence intervals

Estimates from the GPPS are based on a sample of the population. Therefore, they are measures with some uncertainty. This uncertainty is represented by applying confidence intervals, which are ranges within which we are fairly confident (95%) that the true population value lies had everyone eligible for the survey been sampled and returned a questionnaire.

The table below gives examples of what the confidence intervals look like for a practice, PCN and ICS with an average number of responses, as well as the confidence intervals at the national level, based on weighted data.

Table 6.1: Confidence interval examples for practices, PCNs, ICSs and national data

Average number of responses on which results are based Approximate confidence intervals for percentages at or near these levels
Level 1:
10% or 90%
Level 2:
30% or 70%
Level 3:
50%
+/- (percentage
points)
+/- (percentage
points)
+/- (percentage
points)
National 702,837 0.11 0.16 0.18
ICS 16,734 0.68 1.03 1.13
PCN 539 3.42 5.22 5.70
Practice 113 6.86 10.48 11.43

 

For example, in an ICS where 16,734 people responded (the average number of responses at ICS level) and where 30% give a particular answer, the confidence interval is +/- 1.03 percentage points from that survey estimate (i.e. between 28.97% and 31.03%). The confidence intervals published here are a guide to the size of the confidence intervals around the GPPS data. Confidence intervals are also affected by weighting and are wider where results are based on a smaller number of responses.

Lower and upper limits for confidence intervals for a selection of questions are presented in the practice, PCN and ICS Excel reports on the Surveys and Reports page of the website.

Within the context of GPPS, where some satisfaction scores are around 99%, there is more scope for a survey estimate to fall below 99% than above, purely because there are far more possible lower scores (this makes sense intuitively as well as probabilistically). The confidence interval has to take this limit into account, and, in such circumstances, the lower limit is expected to be larger than the upper limit. As a result, Wilson’s method is used to calculate confidence intervals, which accounts for this, and permits intervals to be asymmetric – i.e. the lower and upper limits can be unequal in size (unlike other confidence interval tests)2.

Power calculations are carried out to estimate the size of the real effect that would be required to be likely to find a statistically significant difference in the statistical test performed. This level of likelihood is called “power”, and the acceptable level is usually set at 80%, i.e. the difference would be significant for 80% of the tests on average if the survey was repeated. Like the confidence intervals, the power calculations are based on weighted data. The following table shows the size of the real percentage point (pp) difference in the population between a pair of ICSs, pair of PCNs and pair of practices with an average number of responses, that would be detected with 80% power in the survey data3.

Table 6.2: Power calculations for PCNs, ICSs and practices

Average number of responses on which results are based Difference between the two estimates
Level 1:
Lower estimate =
10%
Level 2:
Lower estimate =
30%
Level 3:
Lower estimate =
50%
+/- (percentage
points)
+/- (percentage
points)
+/- (percentage
points)
ICS 16,734 1.4 2.1 2.3
PCN 539 8.0 11.0 11.4
Practice 113 18.2 22.8 22.5

 

Using an example, comparing two practices with the same number of responses (113), if the result for the first practice was that 50% of patients said their experience of making an appointment was good (‘very good’ or ‘fairly good’), then the percentage in the second practice would need to be at least 27.5% or at most 72.5% for a statistical difference to be identified between these two practices with an acceptable level of statistical power (80%) i.e. 22.5 percentage points higher or lower, as outlined in the table above.

2Standard confidence interval testing uses the Wald method.

3Power calculations apply a statistical test to protect against the risk of false negatives. False negatives occur when a difference that does exist is declared as not existing.