Comparison of averages by level of statistical significance. Basic terms and concepts of medical statistics

Today it's really too easy: you can walk up to a computer and, with little to no knowledge of what you're doing, create intelligence and nonsense with truly astonishing speed. (J. Box)

Basic terms and concepts of medical statistics

In this article we will present some key concepts statistics relevant to medical research. The terms are discussed in more detail in the relevant articles.

Variation

Definition. The degree of dispersion of data (attribute values) over the range of values

Probability

Definition. Probability is the degree of possibility of the occurrence of a certain event under certain conditions.

Example. Let us explain the definition of the term in the sentence “Probability of recovery when using medicinal product Arimidex is 70%." The event is “recovery of the patient”, the condition “the patient takes Arimidex”, the degree of possibility is 70% (roughly speaking, out of 100 people taking Arimidex, 70 recover).

Cumulative probability

Definition. The Cumulative Probability of surviving at time t is the same as the proportion of patients alive at that time.

Example. If it is said that the cumulative probability of survival after a five-year course of treatment is 0.7, then this means that of the group of patients under consideration, 70% of the initial number remained alive, and 30% died. In other words, out of every hundred people, 30 died within the first 5 years.

Time before event

Definition. Time before an event is the time, expressed in some units, that has passed from some initial point in time until the occurrence of some event.

Explanation. As units of time in medical research days, months and years appear.

Typical examples of initial times:

    start monitoring the patient

    surgical treatment

Typical examples of the events considered:

    disease progression

    occurrence of relapse

    patient death

Sample

Definition. The portion of a population obtained by selection.

Based on the results of the sample analysis, conclusions are drawn about the entire population, which is valid only if the selection was random. Since it is practically impossible to select randomly from a population, efforts should be made to ensure that the sample is at least representative of the population.

Dependent and independent samples

Definition. Samples in which study subjects were recruited independently of each other. An alternative to independent samples is dependent (connected, paired) samples.

Hypothesis

Two-sided and one-sided hypotheses

First, let us explain the use of the term hypothesis in statistics.

The purpose of most research is to test the truth of some statement. The purpose of drug testing is most often to test the hypothesis that one drug is more effective than another (for example, Arimidex is more effective than Tamoxifen).

To ensure the rigor of the study, the statement being verified is expressed mathematically. For example, if A is the number of years that a patient taking Arimidex will live, and T is the number of years that a patient taking Tamoxifen will live, then the hypothesis being tested can be written as A>T.

Definition. A hypothesis is called two-sided if it consists in the equality of two quantities.

An example of a two-sided hypothesis: A=T.

Definition. A hypothesis is called one-sided (1-sided) if it consists in the inequality of two quantities.

Examples of one-sided hypotheses:

Dichotomous (binary) data

Definition. Data expressed by only two valid alternative values

Example: The patient is “healthy” - “sick”. Edema “is” - “no”.

Confidence interval

Definition. The confidence interval for a quantity is the range around the value of the quantity in which the true value of that quantity lies (with a certain level of confidence).

Example. Let the quantity under study be the number of patients per year. On average, their number is 500, and the 95% confidence interval is (350, 900). This means that, most likely (with a probability of 95%), at least 350 and no more than 900 people will contact the clinic during the year.

Designation. A very commonly used abbreviation is: CI 95% is a confidence interval with a confidence level of 95%.

Reliability, statistical significance (P - level)

Definition. Statistical significance result is a measure of confidence in its “truth”.

Any research is carried out on the basis of only a part of the objects. A study of the effectiveness of a drug is carried out not on the basis of all patients on the planet, but only on a certain group of patients (it is simply impossible to conduct an analysis on the basis of all patients).

Let's assume that as a result of the analysis a certain conclusion was made (for example, the use of Arimidex as an adequate therapy is 2 times more effective than Tamoxifen).

The question that needs to be asked is: “How much can you trust this result?”

Imagine that we conducted a study based on only two patients. Of course, in this case the results should be treated with caution. If a large number of patients were examined (numerical value “ large quantity“depends on the situation), then the conclusions drawn can already be trusted.

So, the degree of confidence is determined by the p-level value (p-value).

A higher p-level corresponds to more low level confidence in the results obtained from the analysis of the sample. For example, a p-level equal to 0.05 (5%) indicates that the conclusion drawn from the analysis of a certain group is only a random feature of these objects with a probability of only 5%.

In other words, with a very high probability (95%) the conclusion can be extended to all objects.

Many studies consider 5% as an acceptable p-level value. This means that if, for example, p = 0.01, then the results can be trusted, but if p = 0.06, then you cannot.

Study

Prospective study is a study in which samples are selected on the basis of an initial factor, and some resulting factor is analyzed in the samples.

Retrospective study is a study in which samples are selected on the basis of a resulting factor, and some initial factor is analyzed in the samples.

Example. The initial factor is a pregnant woman younger/over 20 years old. The resulting factor is the child is lighter/heavier than 2.5 kg. We analyze whether the child’s weight depends on the mother’s age.

If we recruit 2 samples, one with mothers under 20 years of age, the other with mothers older, and then analyze the mass of children in each group, then this is a prospective study.

If we recruit 2 samples, in one - mothers who gave birth to children lighter than 2.5 kg, in the other - heavier, and then analyze the age of the mothers in each group, then this is a retrospective study (naturally, such a study can be carried out only when the experiment is completed, i.e. all children were born).

Exodus

Definition. A clinically significant phenomenon, laboratory indicator or sign that serves as an object of interest to the researcher. When conducting clinical trials, outcomes serve as criteria for assessing the effectiveness of a therapeutic or preventive intervention.

Clinical epidemiology

Definition. Science that makes it possible to predict a particular outcome for each specific patient based on the study clinical course illnesses in similar cases using strict scientific methods studying patients to ensure accuracy of forecasts.

Cohort

Definition. A group of study participants united by some common feature at the time of its formation and studied over a long period of time.

Control

Historical control

Definition. A control group formed and examined in the period preceding the study.

Parallel control

Definition. A control group formed simultaneously with the formation of the main group.

Correlation

Definition. Statistical relationship between two characteristics (quantitative or ordinal), showing that higher value One characteristic in a certain part of cases corresponds to a larger value - in the case of a positive (direct) correlation - the value of another characteristic, or a smaller value - in the case of a negative (inverse) correlation.

Example. A significant correlation was found between the levels of platelets and leukocytes in the patient’s blood. The correlation coefficient is 0.76.

Risk coefficient (RR)

Definition. The risk ratio is the ratio of the probability of the occurrence of some (“bad”) event for the first group of objects to the probability of the occurrence of the same event for the second group of objects.

Example. If the probability of developing lung cancer in non-smokers is 20%, and in smokers - 100%, then the CR will be equal to one fifth. In this example, the first group of objects are non-smokers, the second group are smokers, and the occurrence of lung cancer is considered as a “bad” event.

It's obvious that:

1) if KR = 1, then the probability of an event occurring in groups is the same

2) if KP>1, then the event occurs more often with objects from the first group than from the second

3) if KR<1, то событие чаще происходит с объектами из второй группы, чем из первой

Meta-analysis

Definition. WITH statistical analysis that summarizes the results of several studies investigating the same problem (usually the effectiveness of treatment, prevention, diagnostic methods). Pooling studies provides a larger sample for analysis and greater statistical power for the combined studies. Used to increase the evidence or confidence in a conclusion about the effectiveness of the method under study.

Kaplan-Meier method (Kaplan-Meier multiplier estimates)

This method was invented by statisticians E.L. Kaplan and Paul Meyer.

The method is used to calculate various quantities associated with the observation time of a patient. Examples of such quantities:

    probability of recovery within one year when using the drug

    chance of relapse after surgery within three years after surgery

    cumulative probability of survival at five years among patients with prostate cancer following organ amputation

Let us explain the advantages of using the Kaplan-Meier method.

The values ​​of the values ​​in “conventional” analysis (not using the Kaplan-Meier method) are calculated based on dividing the time interval under consideration into intervals.

For example, if we are studying the probability of a patient’s death within 5 years, then the time interval can be divided into 5 parts (less than 1 year, 1-2 years, 2-3 years, 3-4 years, 4-5 years), so and for 10 (six months each), or for another number of intervals. The results for different partitions will be different.

Choosing the most appropriate partition is not an easy task.

Estimates of values ​​obtained using the Kaplan-Meier method do not depend on the division of observation time into intervals, but depend only on the life time of each individual patient.

Therefore, it is easier for the researcher to carry out the analysis, and the results are often better than the results of “conventional” analysis.

The Kaplan - Meier curve is a graph of the survival curve obtained using the Kaplan-Meier method.

Cox model

This model was invented by Sir David Roxby Cox (b. 1924), a famous English statistician, author of more than 300 articles and books.

The Cox model is used in situations where the quantities studied in the survival analysis depend on functions of time. For example, the probability of relapse after t years (t=1,2,...) may depend on the logarithm of time log(t).

An important advantage of the method proposed by Cox is the applicability of this method in a large number of situations (the model does not impose strict restrictions on the nature or shape of the probability distribution).

Based on the Cox model, an analysis can be performed (called Cox analysis), the result of which is the value of the risk coefficient and the confidence interval for the risk coefficient.

Nonparametric statistical methods

Definition. A class of statistical methods that are used primarily for the analysis of quantitative data that does not form a normal distribution, as well as for the analysis of qualitative data.

Example. To identify the significance of differences in the systolic pressure of patients depending on the type of treatment, we will use the nonparametric Mann-Whitney test.

Sign (variable)

Definition. X characteristics of the object of study (observation). There are qualitative and quantitative characteristics.

Randomization

Definition. A method of randomly distributing research objects into the main and control groups using special means (tables or random number counter, coin toss and other methods of randomly assigning a group number to an included observation). Randomization minimizes differences between groups on known and unknown characteristics that potentially influence the outcome being studied.

Risk

Attributive- additional risk of an unfavorable outcome (for example, disease) due to the presence of a certain characteristic (risk factor) in the subject of the study. This is the portion of the risk of developing a disease that is associated with, explained by, and can be eliminated if the risk factor is eliminated.

Relative risk- the ratio of the risk of an unfavorable condition in one group to the risk of this condition in another group. Used in prospective and observational studies when groups are formed in advance and the occurrence of the condition being studied has not yet occurred.

Rolling exam

Definition. A method for checking the stability, reliability, performance (validity) of a statistical model by sequentially removing observations and recalculating the model. The more similar the resulting models are, the more stable and reliable the model is.

Event

Definition. The clinical outcome observed in the study, such as the occurrence of a complication, relapse, recovery, or death.

Stratification

Definition. M a sampling technique in which the population of all participants who meet the inclusion criteria for a study is first divided into groups (strata) based on one or more characteristics (usually sex, age) potentially influencing the outcome of interest, and then from each of these groups ( stratum) participants are recruited independently into the experimental and control groups. This allows the researcher to balance important characteristics between the experimental and control groups.

Contingency table

Definition. A table of absolute frequencies (numbers) of observations, the columns of which correspond to the values ​​of one characteristic, and the rows - to the values ​​of another characteristic (in the case of a two-dimensional contingency table). Absolute frequency values ​​are located in cells at the intersection of rows and columns.

Let's give an example of a contingency table. Aneurysm surgery was performed in 194 patients. The severity of edema in patients before surgery is known.

Edema\ Outcome

no swelling 20 6 26
moderate swelling 27 15 42
pronounced edema 8 21 29
m j 55 42 194

Thus, out of 26 patients without edema, 20 patients survived after surgery, and 6 patients died. Of the 42 patients with moderate edema, 27 patients survived, 15 died, etc.

Chi-square test for contingency tables

To determine the significance (reliability) of differences in one sign depending on another (for example, the outcome of an operation depending on the severity of edema), the chi-square test is used for contingency tables:


Chance

Let the probability of some event be equal to p. Then the probability that the event will not occur is 1-p.

For example, if the probability that a patient will remain alive after five years is 0.8 (80%), then the probability that he will die during this time period is 0.2 (20%).

Definition. Chance is the ratio of the probability that an event will occur to the probability that the event will not occur.

Example. In our example (about a patient), the chance is 4, since 0.8/0.2=4

Thus, the probability of recovery is 4 times greater than the probability of death.

Interpretation of the value of a quantity.

1) If Chance=1, then the probability of an event occurring is equal to the probability that the event will not occur;

2) if Chance >1, then the probability of the event occurring is greater than the probability that the event will not occur;

3) if Chance<1, то вероятность наступления события меньше вероятности того, что событие не произойдёт.

Odds ratio

Definition. Odds ratio is the odds ratio for the first group of objects to the odds ratio for the second group of objects.

Example. Let us assume that both men and women undergo some treatment.

The probability that a male patient will remain alive after five years is 0.6 (60%); the probability that he will die during this time period is 0.4 (40%).

Similar probabilities for women are 0.8 and 0.2.

The odds ratio in this example is

Interpretation of the value of a quantity.

1) If the odds ratio = 1, then the chance for the first group is equal to the chance for the second group

2) If the odds ratio is >1, then the chance for the first group is greater than the chance for the second group

3) If the odds ratio<1, то шанс для первой группы меньше шанса для второй группы

Task 3. Five preschoolers are given a test. The time taken to solve each task is recorded. Will statistically significant differences be found between the time to solve the first three test items?

No. of subjects

Reference material

This assignment is based on the theory of analysis of variance. In general, the task of analysis of variance is to identify those factors that have a significant impact on the result of the experiment. Analysis of variance can be used to compare the means of several samples if there are more than two samples. One-way analysis of variance is used for this purpose.

In order to solve the assigned tasks, the following is accepted. If the variances of the obtained values ​​of the optimization parameter in the case of influence of factors differ from the variances of the results in the absence of influence of factors, then such a factor is considered significant.

As can be seen from the formulation of the problem, methods for testing statistical hypotheses are used here, namely, the task of testing two empirical variances. Therefore, analysis of variance is based on testing variances using Fisher's test. In this task, it is necessary to check whether the differences between the time of solving the first three test tasks by each of the six preschoolers are statistically significant.

The null (main) hypothesis is called the put forward hypothesis H o. The essence of e comes down to the assumption that the difference between the compared parameters is zero (hence the name of the hypothesis - zero) and that the observed differences are random.

A competing (alternative) hypothesis is called H1, which contradicts the null hypothesis.

Solution:

Using the analysis of variance method at a significance level of α = 0.05, we will test the null hypothesis (H o) about the existence of statistically significant differences between the time of solving the first three test tasks for six preschoolers.

Let's look at the table of task conditions, in which we will find the average time to solve each of the three test tasks

No. of subjects

Factor levels

Time to solve the first test task (in seconds).

Time to solve the second test task (in seconds).

Time to solve the third test task (in seconds).

Group average

Finding the overall average:

In order to take into account the significance of time differences in each test, the total sample variance is divided into two parts, the first of which is called factorial, and the second - residual

Let's calculate the total sum of squared deviations from the overall average using the formula

or , where p is the number of time measurements for solving test tasks, q is the number of test takers. To do this, let's create a table of squares

No. of subjects

Factor levels

Time to solve the first test task (in seconds).

Time to solve the second test task (in seconds).

Time to solve the third test task (in seconds).

The main features of any relationship between variables.

We can note the two simplest properties of the relationship between variables: (a) the magnitude of the relationship and (b) the reliability of the relationship.

- Magnitude . Dependency magnitude is easier to understand and measure than reliability. For example, if any man in the sample had a white blood cell count (WCC) value higher than any woman, then you can say that the relationship between the two variables (Gender and WCC) is very high. In other words, you could predict the values ​​of one variable from the values ​​of another.

- Reliability (“truth”). The reliability of interdependence is a less intuitive concept than the magnitude of dependence, but it is extremely important. The reliability of the relationship is directly related to the representativeness of a certain sample on the basis of which conclusions are drawn. In other words, reliability refers to how likely it is that a relationship will be rediscovered (in other words, confirmed) using data from another sample drawn from the same population.

It should be remembered that the ultimate goal is almost never to study this particular sample of values; a sample is of interest only insofar as it provides information about the entire population. If the study satisfies certain specific criteria, then the reliability of the found relationships between sample variables can be quantified and presented using a standard statistical measure.

Dependency magnitude and reliability represent two different characteristics of dependencies between variables. However, it cannot be said that they are completely independent. The greater the magnitude of the relationship (connection) between variables in a sample of normal size, the more reliable it is (see the next section).

The statistical significance of a result (p-level) is an estimated measure of confidence in its “truth” (in the sense of “representativeness of the sample”). More technically speaking, the p-level is a measure that varies in decreasing order of magnitude with the reliability of the result. A higher p-level corresponds to a lower level of confidence in the relationship between variables found in the sample. Namely, the p-level represents the probability of error associated with the distribution of the observed result to the entire population.

For example, p-level = 0.05(i.e. 1/20) indicates that there is a 5% chance that the relationship between variables found in the sample is just a random feature of the sample. In many studies, a p-level of 0.05 is considered an "acceptable margin" for the level of error.

There is no way to avoid arbitrariness in deciding what level of significance should truly be considered "significant". The choice of a certain significance level above which results are rejected as false is quite arbitrary.



In practice, the final decision usually depends on whether the result was predicted a priori (i.e., before the experiment was carried out) or discovered a posteriori as a result of many analyzes and comparisons performed on a variety of data, as well as on the tradition of the field of study.

Generally, in many fields, a result of p .05 is an acceptable cutoff for statistical significance, but keep in mind that this level still includes a fairly large margin of error (5%).

Results significant at the p .01 level are generally considered statistically significant, while results at the p .005 or p .00 level are generally considered statistically significant. 001 as highly significant. However, it should be understood that this classification of significance levels is quite arbitrary and is just an informal agreement adopted on the basis of practical experience in a particular field of study.

It is clear that the greater the number of analyzes that are carried out on the totality of the collected data, the greater the number of significant (at the selected level) results will be discovered purely by chance.

Some statistical methods that involve many comparisons, and thus have a significant chance of repeating this type of error, make a special adjustment or correction for the total number of comparisons. However, many statistical methods (especially simple exploratory data analysis methods) do not offer any way to solve this problem.

If the relationship between variables is “objectively” weak, then there is no other way to test such a relationship other than to study a large sample. Even if the sample is perfectly representative, the effect will not be statistically significant if the sample is small. Likewise, if a relationship is “objectively” very strong, then it can be detected with a high degree of significance even in a very small sample.

The weaker the relationship between variables, the larger the sample size required to detect it meaningfully.

Many different measures of relationship between variables. The choice of a particular measure in a particular study depends on the number of variables, the measurement scales used, the nature of the relationships, etc.

Most of these measures, however, follow a general principle: they attempt to estimate an observed relationship by comparing it with the “maximum conceivable relationship” between the variables in question. Technically speaking, the usual way to make such estimates is to look at how the values ​​of the variables vary and then calculate how much of the total variation present can be explained by the presence of "common" ("joint") variation in two (or more) variables.

Significance depends mainly on the sample size. As already explained, in very large samples even very weak relationships between variables will be significant, while in small samples even very strong relationships are not reliable.

Thus, in order to determine the level of statistical significance, a function is needed that represents the relationship between the “magnitude” and “significance” of the relationship between variables for each sample size.

Such a function would indicate exactly “how likely it is to obtain a dependence of a given value (or more) in a sample of a given size, assuming that there is no such dependence in the population.” In other words, this function would give a significance level
(p-level), and, therefore, the probability of erroneously rejecting the assumption of the absence of this dependence in the population.

This "alternative" hypothesis (that there is no relationship in the population) is usually called null hypothesis.

It would be ideal if the function that calculates the probability of error were linear and only had different slopes for different sample sizes. Unfortunately, this function is much more complex and is not always exactly the same. However, in most cases its form is known and can be used to determine significance levels in studies of samples of a given size. Most of these functions are associated with a class of distributions called normal .

Let's consider a typical example of the application of statistical methods in medicine. The creators of the drug suggest that it increases diuresis in proportion to the dose taken. To test this hypothesis, they give five volunteers different doses of the drug.

Based on the observation results, a graph of diuresis versus dose is plotted (Fig. 1.2A). Dependency is visible to the naked eye. Researchers congratulate each other on the discovery, and the world on the new diuretic.

In fact, the data only allow us to reliably state that a dose-dependent diuresis was observed in these five volunteers. The fact that this dependence will manifest itself in all people who take the drug is no more than an assumption.
ZY

With

life It cannot be said that it is groundless - otherwise, why carry out experiments?

But the drug went on sale. More and more people are taking it in hopes of increasing their urine output. So what do we see? We see Figure 1.2B, which indicates the absence of any connection between the dose of the drug and diuresis. Black circles indicate data from the original study. Statistics has methods that allow us to estimate the likelihood of obtaining such an “unrepresentative”, and indeed confusing, sample. It turns out that in the absence of a connection between diuresis and the dose of the drug, the resulting “dependence” would be observed in approximately 5 out of 1000 experiments. So, in this case, the researchers were simply unlucky. Even if they had used the most advanced statistical methods, it still would not have prevented them from making mistakes.

We gave this fictitious, but not at all far from reality example, not to point out the uselessness
ness of statistics. He talks about something else, about the probabilistic nature of her conclusions. As a result of applying the statistical method, we do not obtain the ultimate truth, but only an estimate of the probability of a particular assumption. In addition, each statistical method is based on its own mathematical model and its results are correct to the extent that this model corresponds to reality.

More on the topic RELIABILITY AND STATISTICAL SIGNIFICANCE:

  1. Statistically significant differences in quality of life indicators
  2. Statistical population. Accounting characteristics. The concept of continuous and selective research. Requirements for statistical data and the use of accounting and reporting documents
  3. ABSTRACT. STUDY OF THE RELIABILITY OF TONOMETER INDICATIONS FOR MEASURING INTRAOCULAR PRESSURE THROUGH THE EYELID 2018, 2018

Research usually begins with some assumption that requires verification using facts. This assumption - a hypothesis - is formulated in relation to the connection of phenomena or properties in a certain set of objects.

To test such assumptions against facts, it is necessary to measure the corresponding properties of their bearers. But it is impossible to measure anxiety in all women and men, just as it is impossible to measure aggressiveness in all adolescents. Therefore, when conducting research, it is limited to only a relatively small group of representatives of the relevant populations of people.

Population— this is the entire set of objects in relation to which a research hypothesis is formulated.

For example, all men; or all women; or all the inhabitants of a city. The general populations in relation to which the researcher is going to draw conclusions based on the results of the study may be more modest in number, for example, all first-graders of a given school.

Thus, the general population is, although not infinite in number, but, as a rule, inaccessible for continuous research, a set of potential subjects.

Sample or sample population- this is a group of objects limited in number (in psychology - subjects, respondents), specially selected from the general population to study its properties. Accordingly, the study of the properties of the general population using a sample is called sampling study. Almost all psychological studies are selective, and their conclusions extend to general populations.

Thus, after a hypothesis has been formulated and the corresponding populations have been identified, the researcher faces the problem of organizing a sample. The sample should be such that the generalization of the conclusions of the sample study is justified - generalization, extension of them to the general population. Main criteria for the validity of research conclusionsthese are the representativeness of the sample and the statistical reliability of the (empirical) results.

Representativeness of the sample- in other words, its representativeness is the ability of the sample to represent the phenomena under study quite fully - from the point of view of their variability in the general population.

Of course, only the general population can give a complete picture of the phenomenon being studied, in all its range and nuances of variability. Therefore, representativeness is always limited to the extent that the sample is limited. And it is the representativeness of the sample that is the main criterion in determining the boundaries of generalization of research findings. However, there are techniques that make it possible to obtain a sample representativeness sufficient for the researcher (These techniques are studied in the course “Experimental Psychology”).


The first and main technique is simple random (randomized) selection. It involves ensuring such conditions that each member of the population has equal chances with others to be included in the sample. Random selection ensures that a variety of representatives of the general population can be included in the sample. In this case, special measures are taken to prevent the emergence of any pattern during selection. And this allows us to hope that ultimately, in the sample, the property being studied will be represented, if not in all, then in its maximum possible diversity.

The second way to ensure representativeness is stratified random sampling, or selection based on the properties of the general population. It involves a preliminary determination of those qualities that can influence the variability of the property being studied (this could be gender, level of income or education, etc.). Then the percentage ratio of the number of groups (strata) differing in these qualities in the general population is determined and an identical percentage ratio of the corresponding groups in the sample is ensured. Next, subjects are selected into each subgroup of the sample according to the principle of simple random selection.

Statistical significance, or statistical significance, the results of a study are determined using statistical inference methods.

Are we insured against making mistakes when making decisions, when drawing certain conclusions from the research results? Of course not. After all, our decisions are based on the results of the study of the sample population, as well as on the level of our psychological knowledge. We are not completely immune from mistakes. In statistics, such errors are considered acceptable if they occur no more often than in one case out of 1000 (probability of error α = 0.001 or the associated confidence probability of a correct conclusion p = 0.999); in one case out of 100 (probability of error α = 0.01 or the associated confidence probability of a correct conclusion p = 0.99) or in five cases out of 100 (probability of error α = 0.05 or the associated confidence probability of a correct conclusion output p=0.95). It is at the last two levels that decisions are made in psychology.

Sometimes, when talking about statistical significance, they use the concept of “level of significance” (denoted as α). The numerical values ​​of p and α complement each other up to 1,000 - a complete set of events: either we made the right conclusion, or we made a mistake. These levels are not calculated, they are given. The level of significance can be understood as a kind of “red” line,” the intersection of which will allow us to speak of this event as non-random. In every good scientific report or publication, the conclusions drawn should be accompanied by an indication of the p or α values ​​at which the conclusions were drawn.

Methods of statistical inference are discussed in detail in the Mathematical Statistics course. Now we just note that they have certain requirements for the number, or sample size.

Unfortunately, there are no strict guidelines for pre-determining the required sample size. Moreover, the researcher usually receives the answer to the question about the necessary and sufficient number too late - only after analyzing the data of an already surveyed sample. However, the most general recommendations can be formulated:

1. The largest sample size is required when developing a diagnostic technique - from 200 to 1000-2500 people.

2. If it is necessary to compare 2 samples, their total number must be at least 50 people; the number of samples being compared should be approximately the same.

3. If the relationship between any properties is being studied, then the sample size should be at least 30-35 people.

4. The more variability property being studied, the larger the sample size should be. Therefore, variability can be reduced by increasing the homogeneity of the sample, for example, by gender, age, etc. This, of course, reduces the ability to generalize conclusions.

Dependent and independent samples. A common research situation is when a property of interest to a researcher is studied on two or more samples for the purpose of further comparison. These samples can be in different proportions, depending on the procedure for their organization. Independent samples are characterized by the fact that the probability of selection of any subject in one sample does not depend on the selection of any of the subjects in another sample. Against, dependent samples are characterized by the fact that each subject from one sample is matched according to a certain criterion by a subject from another sample.

In general, dependent samples involve pairwise selection of subjects into compared samples, and independent samples imply an independent selection of subjects.

It should be noted that cases of “partially dependent” (or “partially independent”) samples are unacceptable: this unpredictably violates their representativeness.

In conclusion, we note that two paradigms of psychological research can be distinguished.

So-called R-methodology involves the study of the variability of a certain property (psychological) under the influence of a certain influence, factor or other property. A sample is a set of subjects.

Another approach Q-methodology, involves the study of the variability of a subject (individual) under the influence of various stimuli (conditions, situations, etc.). It corresponds to the situation when the sample is a set of stimuli.