Authors:
(1) Antoine Loriette, IRCAM, CNRS, Sorbonne Universite, Paris, France ([email protected]);
(2) Baptiste Caramiaux, Sorbonne Universite, CNRS, ISIR, Paris, France ([email protected]);
(3) Sebastian Stein, School of Computing Science, University of Glasgow, Glasgow, Scotland, United Kingdom ([email protected]);
(4) John H. Williamson, School of Computing Science, University of Glasgow, Glasgow, Scotland, United Kingdom ([email protected]).
Table of Links
5 Modelling User Behaviour
We propose to investigate probabilistic modelling of observed behavioural characteristics during gameplay. The goal is, first, to fit a probabilistic model onto baseline behavioural data. Second, we propose to use the trained model to assess the likelihood of observed behavioural data, when changing the input modality or design parameters, under the model.
5.1 Behavioural Features
Prior to modelling, we need to specify which behavioural features to consider. Some domain knowledge can be needed to chose these features. Here, our reference model includes the actions the players produces, measured by the interkey interval (IKI), and the resulting effect in the game, measured by Pac-Man’s turning time (PTT). IKI is computed as the number of frames between two consecutive issued commands. PTT is computed as the number of frames in which PacMan stand motionless between two consecutive turns. While IKI is common in the literature related to typing (Dhakal et al., 2018), the choice of PTT has its roots from insights provided by game sessions with the prototype and backed by the experimental qualitative feedback: when poor control is afforded, producing rapid turns with Pac-Man becomes really hard. IKI and PTT features can be computed independently on the input modality considered (keyboard or tracker).
5.2 Gameplay Reference Model
The data needed to fit a reference model can be collected from baseline sessions. These baseline sessions have been recorded to that effect during the PRE-TEST and POST-TEST conditions on KEYBOARD in the previous experiment.
We made the following assumptions. Both IKI and PTT were modelled as continuous random variables. Even though they take discretised positive values measured in frames, they represent a time measurement. We also assumed that IKI and PTT generated independent events at some constant average rate.
5.2.1 Normalisation
The values for the log likelihood (LL) were computed according to Equation 5. In the same fashion as SCORE, one value per game was obtained. We also computed a normalised value for LL, marked as NLL in the following, by taking the inverse of LL and multiplying it with its average value obtained over the PRE-TEST and POST-TEST levels, NLL= mean(LL)/LL, in a similar fashion to how NSCORE was computed.
5.2 Gameplay Reference Model
The data needed to fit a reference model can be collected from baseline sessions. These baseline sessions have been recorded to that effect during the PRE-TEST and POST-TEST conditions on KEYBOARD in the previous experiment.
We made the following assumptions. Both IKI and PTT were modelled as continuous random variables. Even though they take discretised positive values measured in frames, they represent a time measurement. We also assumed that IKI and PTT generated independent events at some constant average rate.
5.2.1 Normalisation
The values for the log likelihood (LL) were computed according to Equation 5. In the same fashion as SCORE, one value per game was obtained. We also computed a normalised value for LL, marked as NLL in the following, by taking the inverse of LL and multiplying it with its average value obtained over the PRE-TEST and POST-TEST levels, NLL= mean(LL)/LL, in a similar fashion to how NSCORE was computed.
5.3 Comparing NLL to NSCORE
We inspected the likelihood of IKI and PTT observations per participant from PRE-TEST and POST-TEST. Figure 6 reports the results. We observed a clear difference in standard deviations, within and across participants, between NSCORE and NLL with a value of 0.22 and 0.06, respectively. For NSCORE, five participants (2, 3, 5, 7 and 9) have an average NSCORE which lies further than one standard deviation from the overall mean, while for NLL only two participants (7 and 8) present the same deviation from the mean. This shows that NLL is less subject to inter-user variability than NSCORE.
Then we inspected the relationship between NLL and NSCORE. We computed the Pearson correlation through linear regression between NLL and NSCORE. The test revealed a significant correlation of 0.62 (p < 0.001, with slope = 0.83, intercept = 0.14, stderr = 0.07), leading to 39% of explained variance. A scatter plot (Figure 7) of their associated values for all measures except for the keyboard condition illustrates this relationship.
This correlation indicates that the statistical model based on the frequency of command inputs and their effect on game states is, to some extent, an indicator for user performance. The statistical model does not capture all the information necessary to predict score, such as the Pac-Man position in the maze or an understanding of the player’s tactics.
However, it is logical that the user ability to issue commands and control the avatar, modelled by IKI and PTT, is a factor influencing success in the game.
5.4 Effect of design parameters on NLL
Previous results showed that NLL exhibits a linear relationship with NSCORE. Here we inspected whether the design parameters (spread and time rate) impacts NLL. To do so, we computed IKI and PTT features from the data collected using both the keyboard input modality and the movement-based input modality.
Figure 8 depicts the statistics computed on NLL values for each condition. The figure reports the likelihoods computed using data from the movement-based input modality under the six testing conditions in the middle and rightmost columns.
5.5 Sampling Period
Finally, one of the reason for designing such model was the availability of many more samples for the variables IKI and PTT than observations of SCORE per unit of time. We measured from the TRACKER condition the time elapsed between samples for SCORE and LL (Figure 9). Obtaining one sample for SCORE took on average 2210 (s.d.1057) frames, while one sample for LL was available every 40 (s.d.31) frames on average. The measure of user behaviour based on low-level variable of gameplay provides 55 times more samples per frame than the counterpart model based on SCORE, thus exhibiting a lower latency.