Thomas Arnold

52 Weeks of Data Pattern Analysis: Week 9: Paradigms

With Week 9, I would like to discuss paradigms again.

In 2012, I created a developmental lag solution to the age crime curve puzzle that seemed to do a pretty good job of explaining why the age crime curve occurs.  The solution was developed after I made a series of breakthroughs during several years of intense effort.  Excited, I spent about a year writing a book on the solution I created.  I called this book “The Criminological Puzzle.”  Click the link below for a free copy.

The Criminological Puzzle Book

I sent “The Criminological Puzzle” to 250 developmental criminologists and no one got back to me with any feedback.  I have since asked several people to look at it and I have not received any substantive feedback on the major points of my theory.  The few comments I was able to get have ranged from “I’m too busy to look at this right now,” “You should focus on citing existing material in the criminological literature,” and “You should not write about the difficulty of understanding this material.”

If I describe the general points of my age crime curve theory to ordinary people, I get a general agreement that my ideas make sense.  However, when I try to explain the mathematical details, I get glassy eyed stares and a lack of interest. 

Where am I going wrong?  This all makes perfect sense to me.  It seems important.  I used parts of this theory to build a health risk model that was 50% more accurate.  The ideas seem sound.

To be honest, I had given up on trying to explain this to anyone, and was going to work on it when I retire in 5 years. The solution appears to require several paradigm shifts and to expect someone to follow all of these shifts at once seemed unrealistic.  I had come to the conclusion that I need to publish a series of papers or a better version of my book if I am to provide some stepping stones to the solution that people might be able to follow.

There are multiple challenges.  The math is hard.  My solution uses a form of probability calculus that is generally not taught in school.  Then, once you get past the math, the theory is based on human development over the life course, which very few people seem to understand and fewer people are collecting data on.  Finally, my theory makes little logical sense because the solution to the age crime curve has very little to do with the causes of crime.  This is a macro theory that does not fit with any prevailing scientific models.  It seems to be a whole new way to think about things.

My position on waiting to work on this after I am 70 and retired has started to change recently.  A few months ago, I met some people on LinkedIn who encouraged me to pursue my goal of trying to explain this.  Their encouragement, plus the fact that there are some data scientists on LinkedIn who understand some of the probability calculus I am using, is the reason I started the 52 weeks of data pattern analysis blog posts. 

Here are the eight 52 Weeks blog posts I created so far.

Week 1: Introduction

Week 2: A Primer on Paradigms

Week 3: A Paradigm Conundrum

Week 4: What is Your Epistemology of Change?

Week 5: The Healthcare Dilemma

Week 6: Theory Driven Data Science

Week 7: Cumulative Distribution Functions

Week 8: The Male Age Crime Curve

Thanks again for all of the wonderful feedback I have received so far.

My Age Crime Curve Solution

When Quetelet first started looking at the age crime curve in 1831, he proposed that the reasons for the age crime curve were obvious.  He suggested that strength and passion developed before wisdom.  Because people developed high levels of strength and passion before wisdom was fully developed, crime was rising rapidly in youth and young adulthood.  After wisdom started to grow in young adulthood, crime started to decline until the really wise older people committed almost no crimes.

Quetelet’s theory is essentially a developmental lag theory, and my theory is similar.  I have eliminated passion from the model and just focus on the trajectory of strength and mental capacity over the life course.  I have been able to put the math related to normal cumulative distribution functions into the model, which makes everything work out. 

My solution to the age crime curve involves looking at the trajectories of strength and mental capacity over the life course.   There appears to be a 5 year lag between the development of peak strength (at about age 25) and the development of peak intelligence (at about age 30).  My theory is that this developmental lag causes the age crime curve.  It is the intersection of these two curves that causes the reversal of the crime trend at age 18.

The basic premise of my theory is that mental capacity reduces the chance of crime and strength increases the capacity for crime. The idea that increases in intelligence are causing a drop in criminal tendency from a young age appears to fit the data provided by Tremblay on the development of aggression over the life course.  Apparently two year old children are the most aggressive humans, hitting, biting, screaming, etc., but they are too weak to do much damage.  

His fascinating work can be found here.

Tremblay on Aggression Over the Life Course

My theory is that, even though aggression is declining from a young age, 18 year old people are committing more crimes because they are getting stronger and can do more damage.

The effects of strength and mental capacity on the propensity for crime over the life course are plotted in the image at the top of the page.  In order to accurately visualize what is happening to these two trajectories, we have to flip the effects of mental capacity upside down.  When I add the combined effects of strength and mental capacity together over the life course, I get the projected age propensity curve shown below.  This is simple straight forward addition of the two curves.

Recall from week 8 that the age crime curve is related to the age propensity curve through a normal transformation from a percentage to a Z Score.  This model give a 99.995% R Squared over the first and last parts of the age crime curve.

I modeled criminal propensity using a Z score transformation from percents to Z scores using the Probit (Z + 5).  Crime is measured as the percentage of the population at each age that is participating in crime.  If we use an inverse normal transformation to transform the criminal propensity Z scores (Probits) from the previous plot to crime percentages, we get the projected estimate of the age crime curve shown below.

This particular version is not as accurate as it could be, but you get the idea.  The model predicts the age crime curve with a high degree of accuracy.  The model provides a “proof of concept” that the developmental curves for strength and mental capacity can be added together to produce a criminal propensity by age curve.  The criminal propensity by age curve then can be transformed to a cumulative trajectory that reflects the age crime curve. 

Confused?  Perhaps my problem with explaining this becomes a little clearer.

Paradigms

Does any of this make sense?  What does not?  I can only guess at where I am losing people.  Your input is welcome.

I’m looking at the paradigms I had to overcome in my own mind to develop this solution, and it seems like there are a whole bunch of them. These include the following.

  1. The age crime curve should be related to the causes of crime.
  2. You should use real data to do scientific research.
  3. Statistical models should focus on observable data.

I had to replace these paradigms in order to build the model I developed. 

  1. The age crime curve is not directly related to the causes of crime.
  2. I can create my own data.
  3. The models have to use the data we can see to figure out what is happening with the data we can’t see.  

I will stop here.  I would appreciate any input from readers.  Which paradigms would you need to change for this to make sense?

Posted by Thomas Arnold

52 Weeks of Data Pattern Analysis: Week 8: The Male Age Crime Curve

I would like to show the results of a major breakthrough in the study of crime.  Recall that the age crime curve was discovered by Quetelet in 1831.  Since then, many people have tried to understand why it occurs.  These attempts have resulted in failure because the age crime curve is a “cluster problem.”  That is, there are multiple major conceptual issues that need to be addressed to solve the age crime puzzle.  I was able to find the solution to one piece of the puzzle, which I will discuss here, and this permits one to find the solution to the rest of the age crime puzzle.

The plot shown above has the results from my age crime curve calculations.  The blue line is the actual age crime curve.  The estimates I found for the age crime curve are represented by a white dashed line.  Note that the age crime curve estimates are almost exactly the same as the actual age crime curve.  The accuracy of my model exceeds an R Squared of .9995.  I will go into the reasons why below.

I am not bragging when I call these findings a major breakthrough.  The age crime curve is like a Rosetta stone for criminology.  In 1983, Hirschi and Gottfredson, two of criminology’s top scholars, noted that “When attention shifts to the meaning or implications of the relation between age and crime, that relation easily qualifies as the most difficult fact in the field.”  If we can’t explain the age crime curve, we are missing a major piece of the causal puzzle of crime.

The Age Crime Curve is a Cumulative Distribution Function

My theory about the cause of the age crime curve was that the age crime curve was a developmental artifact.  This was also Quetelet’s theory in 1831. He suggested that this conclusion was obvious.

I developed this conclusion separately while studying life course development.  Research had shown that crime is inversely related to intelligence.  Research had also shown that muscular people commit more crimes that non-muscular people.  Might these two facts help explain crime by age?  We know that strength and intelligence are both changing over the life course.  What if the development of intelligence lags the development of strength?

This was my working hypothesis: The development of mental capacity over the life course lagged the development of strength over the life course, and the developmental lag between strength and mental capacity caused the age crime curve. 

I will dive deeper into my lagged developmental theory in future weeks.  There was a problem that I had to address first.  The data made no sense.

The problem I had with the actual age crime curve data was that my theory predicted that the age crime curve in the period from age 0-18 should be a straight line plot.  My theory also predicted that crime rates from 46-84 should be falling linearly. If my theory was correct, crime rates should be rising linearly from 0-18 and falling linearly from 46-84.  If you look at the actual age crime data in the plot above, you will see that the plot sections from 0-18 and from 46-84 are rising and falling curved lines and not straight lines.  Why?

I eventually realized that the problem with lack of straight line plots was related to the fact that the propensity for crime by age from 0-84 is a set of 84 normal probability distributions and the age crime curve was a set of 84 points from 84 cumulative distribution function points.

Recall that in Week 7, I demonstrated how probability density functions are related to cumulative distribution functions.

Week 7: Cumulative Distribution Functions

A cumulative distribution function is a sigmoid curve that results when a probability density function passes a threshold.  The value of the cumulative distribution function is the area under the curve that is selected.  See the animated version of the creation of cumulative distribution functions below.

In the case of the age crime curve, the threshold was the perceived difference between crime and non-crime.  Note that, theoretically, crimes are severe harms.  There are many types of harm that people commit.  When the harm exceeds a certain level, the legal system generally categorizes that behavior as crime.  In this case, the propensity for crime (harming others) is normally distributed and the criminal justice system selects the act that they call “crimes”, creating a threshold.

If you look at the male age crime curve shown above, you see that the sections of the age crime curve from 0-18 and from 46-84 are sigmoid shaped plots.  My question was, are these sections sigmoid curves representing a cumulative distribution function?  Was there a propensity/selection process?

To avoid confusion, please note that the section of the age crime curve from 19-45 also seems to be a sigmoid shape, but the section from 19-45 is not sigmoid, it is an exponential decay curve.  I will discuss the reasons for this exponential decay in future weeks.  I want to focus this week on the age crime curve from ages 0-18 and 46-84.

The Age Crime Curve Math

To test my theory that the propensity for crime rose linearly from 0-18 and dropped linearly from 46-84, I set up an Excel spreadsheet with the age crime rates by year.  I split the age crime curve into three sections, 1) 0-18, 2)19-45, and 3) 46-84.  Sections 0-18 and 46-84 were estimated using linear formulas and section 19-45 was estimated by minimizing the error by year.

I started out trying to find the linear solution for Z.  Recall that a straight line formula for Z is Z = A + BX.  In my case, A was the Intercept, B was the Slope, and X was Age.

I played with the slopes and intercepts for the Z score formulas from 0-18 and 40-84 until I found the best fits.  The linear Z score formulas that fit best were as follows.

  • 0-18: Z by Age = -5 + .3316*Age
  • 46-84: Z by Age = -.79 – .0548*Age

I then calculated the crime rate for each Z score using Excel.

I found that the best fit required a constant in the formula.  This was not unexpected since I was using percentage of crime by age, rather that actual crime rates.  The constant was 13.778 and the estimated crime rate formula became.

Estimated Crime Rate by Age = p(Z by Age) / Constant = p(Z by Age) / 13.778

The results of these calculations are shown below.  Note again that ages 19-45 were calculated manually and were not part of the model.  The model generates an R Squared of .999465 from 0-18 and .995629 from 46-84.

The Probability of Crime by Age

The results from the calculations described above appear to demonstrate convincingly that crime is normally distributed and that the age crime curve is a cumulative distribution function.  If we transform the Z scores to Probits so that we can see the shape of the probability plot of crime by age, we get the following.

Note that the plot below is shown in Probits.  The Method of Probits is a Z score transformation that was developed by Bliss in 1934.  A Probit is a “probability unit” that is calculated by adding 5 to the Z score.  The formula for a Probit is as follows.

Probit = Z + 5

The Probit transformation transforms the mean of a standard normal distribution from a mean = 0 to a mean = 5. The purpose is to avoid negative Z scores, which range from -5 to 5 for the standard normal distribution.

As an amusing aside, the Method of Probits was developed to make it easier to determine germ kill rates in the presence of antiseptics.  This begs the question.  What do germs and criminals have in common?

If my mathematical model for the age crime curve is correct, the probability of crime rises linearly from 0-18, drops in a curvilinear fashion from 19-45, and then drops linearly from 46-84.  See below.

Conclusion

I hope the description of my discovery about the mathematical nature of the age crime curve makes some sort of sense.  The age crime curve is a cumulative distribution function that is created by 84 individual normal distributions.  One normal propensity distribution for each year of age.

Note that some people might question the high R Squared values I got.  I have gone over this and it should not be a problem.  I am using age crime data aggregated over millions of people over 10 years.  This is population level data and is missing the variation present in individual data.  This model shows the mathematical form of the age crime curve at the population level.

The point of my thesis is that logically, the age crime curve from 0-84 is a cumulative distribution function with 84 data points representing the crime rate for each age.  If crime at each age is normally distributed, the crime rate at each age should be a point on a normal cumulative distribution function.  That appears to be what I found.

These findings are logically consistent and they support a developmental theory of the age crime curve.  More to come about that.

Note that this same model works as well for the female age crime curve.  This model also fits the cumulative number of chronic conditions in a health propensity model. 

These finding provide a statistical Rosetta stone in the sense that these data tell us that criminal propensity and health are normally distributed.  

Questions?

Posted by Thomas Arnold

52 Weeks of Data Pattern Analysis: Week 7: Cumulative Distribution Functions

This week, I would like to introduce the issues related to cumulative distribution functions.  There seems to be an almost universal lack of attention in the scientific literature given to the topic of cumulative distribution functions.  Understanding cumulative distribution functions is essential to understanding the world around us. 

The material for this week builds upon the material presented in the past two weeks of data pattern analysis, so if you have not read those posts, you probably should look them over.  In particular, you should look at the Week 5 post on the healthcare dilemma.  The Week 5 post provides a visual guide for the processes involved in creating cumulative distribution functions.  These are variation and selection.

52 Weeks of Data Pattern Analysis: Week 5: The Healthcare Dilemma

The lack of understanding related to cumulative distribution functions is a problem because we can’t solve the conceptual problems related to the age crime curve or the health cost curve without understanding cumulative distributions.  Most of the data we consume consists of points on a cumulative distribution function, yet few people actually consider the properties of cumulative distributions.

A Quick Primer on the Math

Cumulative distributions are built by calculating sums of occurrences.  In terms of the Calculus involved, the formula for a cumulative distribution function is an Integral.  While it is not absolutely imperative that one understands the math, a quick primer may help the reader who is not familiar with this topic. 

The Wikipedia article on Cumulative Distribution Functions is recommended.

Cumulative Distribution Functions on Wikipedia

Two Types of Cumulative Distributions

There are two types of cumulative distributions.  Issues related to “the process of selection” distinguish the two types of cumulative distribution functions.  The issues related to the process of selection seems to be almost universally ignored when discussing cumulative distributions.  

In particular, we need to be aware of whether there is selection with replacement or selection without replacement.

  1. Cumulative distributions using selection without replacement
  2. Cumulative distributions using selection with replacement

I have not seen anyone write about cumulative distributions using selection with replacement, so I will save that for a future week.  When working with cumulative distributions using selection without replacement, the cumulative distribution for the normal probability distribution is a sigmoid curve. 

The outcome variable for cumulative distribution functions in selection without replacement is the sum of a binary variable with a 1 (selected) or 0 (not selected.)  I understand that some would use logistic regression here.  The outcomes are similar, but the theoretical explanations differ.

Thinking in Areas Under the Curve

In both types of cumulative distribution function, the outcome is expressed as a rate.  The cumulative rate can be modeled as the area under a curve.  In the case of selection without replacement, the cumulative distribution can be modeled as the area under the normal distribution curve.  This process is shown in the animated GIF shown above.  The outcome for normal distributions is a sigmoid curve.

The sigmoid curve is only studied in a few disciplines.  The disciplines where people use cumulative distributions from the normal distribution that come to top of mind are the following.  If you want to add some, that would be welcome.

  1. Germ kill rates in the presence of antiseptics
  2. Diffusion of innovations
  3. Item response theory

I am going to argue that almost all disciplines that study living systems should use cumulative distributions to model processes.  The problem that seems to be ignored with cumulative distributions is nonlinearity in outcomes with small selection rates.  If one follows the animation over several iterations, you will notice that the rate of accumulation is greatest in the middle of the normal curve.  There can be a change in the mean of several Z scores with very little change in the selection volume.  This causes real problems when comparing rates of participation.

To illustrate the problems when comparing rates of participation, it will help to look at an example.

Male and Female Data Scientists

The problem with rate comparisons using cumulative distributions is that we need to think in terms of areas.  The sum of the probabilities is the area under the probability distribution that is selected.  See the animated GIF at the top of the page.

There  is a highly nonlinear relationship between changes in the rate expressed as a percentage and changes in in the probability expressed as a mean change in the Z score.  This is especially true when rates are small.  To illustrate this, lets look at the differences in rates of male and female data scientists.

Approximately 80% of the data scientists are estimated to be male.  This is about a 400% difference in the percentages expressed as a ratio.  See the sample article below.

Why the World Needs More Women Data Scientists

If we calculate the differences in percentages in data scientist occupations between males and females at the population level, using the total US work force over 20 years old in the denominator, we see that the data scientist participation rates for males and females are both very low.  Seven hundredths of a percent of the male US workforce (7 in 10,000 workers) are data scientists and 2 hundredths of a percent of the female US workforce (2 in 10,000 workers) are data scientists.

Comparing the Probability Density Functions

If we look at the differences between the probability density functions for males and females, we find that the differences are small.  There is only a three tenths difference in the Z scores for the male and female populations in terms of the cumulative probability of becoming a data scientist.

Note that you can verify this for yourself using the NORV.S.INV function in Excel.  Any inverse normal transformation function will work.

The mean female probability of becoming a data scientists is at -3.5 standard deviations and the mean male probability of becoming a data scientists is at -3.2 standard deviations.  

Female Probability of Becoming a Data Scientist

Male Probability of Becoming a Data Scientist

The Unobserved Phenomena

Can you see the problem?

When people look at things like gender differences in the rates of employment for data scientists, they usually just look at the data they have on participation as a data scientist.  They don’t look at the data on non-participation.  The vast majority of the population (99.5%) has chosen to not become a data scientist.  Probabilistically, the population differences in the probabilities of becoming a data scientist between the populations of males and females are very low.

Ignoring the cumulative distribution function in the population data occurs almost all of the time in almost every dataset that is presented.  For example, I had to dig down into the US Bureau of Labor Statistics data to get actual real numbers of data scientists and even then, I had to estimate the percentages of males and females.  If you search Google for gender differences in data science employment, it is almost impossible to find numbers.  Almost all you can find are percentages.

Note that the scale shown above ranges from -10 to 10 standard deviations.  This is because for a normal distribution with almost no participation, the normal probability distribution ranges from -10 to 0. 

Think about this.  Almost all of the data is unobserved.

Conclusion

This was intended to be an introduction to cumulative distribution functions.  I hope that I have aroused some curiosity.  How does this help explain the age crime curve or the health cost curve?  How can we use this in “theory driven data science?”

It took me many years to wrap my head around this.  Who thinks in areas?  The only way that I was able to make sense of this was to visually explore the data.  

Data Pattern Analysis.

Any input is welcome.  Does this make sense?  Can you see why this is important?  What does not make sense?

Posted by Thomas Arnold

52 Weeks of Data Pattern Analysis: Week 6: Theory Driven Data Science

I’ve been trying to build a base literature for “data pattern analysis.”  Data pattern analysis helps build better data science models by providing clues about latent processes that we cannot observe.  A theoretical model of these latent processes can be used to develop a “theory driven data science.”

In general, as shown in the graphic above, the practice of data science is largely devoid of theoretical input.  This was pointed out by Chris Anderson’s post titled “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete”. He argued that there is so much data available that all one needs is a good data extraction model to get value from the data.

There is some merit to his observations. The approaches used in data science do provide inference without theory and provide better predictive models that traditional science.  However, the basic premise that theory is not needed is flawed. 

The problem with lack of theory is the “missing data problem” (52 Weeks of Data Pattern Analysis: Week 5: The Healthcare Dilemma).  In many important areas where data science is used, we are observing latent processes where over 90% of the data is missing.  Theory has the power to enhance data science by providing a model of this missing data. I suggest that throwing out theory in the shift to data science is like “throwing the baby out with the bath water.”

First, theory can dramatically improve the predictive power of data science models.  In my own experience, a more accurate theory regarding the nature of health helped increase the explained variance in a population health ranking model by 50%. 

Second, theory provides an enhanced ability to create prescriptive models in addition to predictive models.   Data science without theory is unlikely to provide prescriptive solutions.  If we actually want to fix something, having an accurate theoretical understanding of how it works will make it more likely that we will succeed.

The Nature of Theory Driven Data Science

If we examine the problems related to traditional scientific theories, one issue becomes clear.  Traditional scientific theories tend to be “micro theories.”  One causal factor, or at most a few factors, are proposed to “cause” something. 

For example, in criminology, “low self-control” is proposed as a “cause” of crime.  There is a correlation between low self-control and the commission of criminal acts.  Therefore, “control theory” is a popular scientific theory of criminal behavior.  

The problem is that crimes are the result of an infinite variety of genetic and environmental factors.  A good data science model will out predict a micro theory model every time.  Micro-theories are typically too restrictive to be of much use outside of academia.

The type of theory that I propose for “theory driven data science” is a macro-theory.  We need to understand the “nature of nature.” How do living systems function?  How might that knowledge help us build better data science predictive models?  How might this knowledge help us fix problems that arise?

A Physics of Living Systems (The Facts of Life)

I propose that a theory driven data science should provide a macro focus rather than a micro focus.  I have narrowed the theoretical issues that are most important for data science with living systems down to three main areas that I call “the facts of life.”  Life is 1) variable, 2) dynamic, and 3) selective.

Life is Infinitely Variable

First life is variable and has an infinite variety of possibilities.  Life is a function of gene environment interactions (GxE).  Tens and hundreds of thousands of genes interact with an infinitely variable environment. If we look at the number of possible interactions it is 2N, where N is the number of possible individual factors  (https://en.wikipedia.org/wiki/Interaction_(statistics)).  I argue that since the smallest genome found so far has 182 genes and humans have over 3 billion genes (https://en.wikipedia.org/wiki/Genome), for all intents and purposes, life is infinitely variable.

Life is Always Dynamic

Living systems are constantly fluctuating. There are two important dynamics that need to be addressed by data science.  First, life is a function of high dimensional chaos.  Living systems are constantly shifting their behavior in response to the environment.  Second, living systems develop over time.  They have a beginning, growth and decline, and an end.  The first factor leads to short term fluctuation, and the second leads to long term variation in the behavior of the system.  I will argue that a theory driven data science will address the issues related to dynamic variability.

Life is Selective

Living systems operate within normal ranges and we seldom consider their function until the system falls outside the “normal” range of functioning.  This poses a big problem in criminology, health care, and may other disciplines.  Since we usually don’t observe “normal” behavior, and tend to focus on the abnormal behavior, we are typically missing over 90% of the data.  This is an under explored area and needs to be addressed if model accuracy is to be improved.

Conclusion

I propose that a “theory driven data science” provides dramatically better prediction and enhances the possibility for prescriptive data science.  By focusing on “the facts of life” one can build better data science models.

The work on a theory driven data science is based on 17 years of data analysis, intense study, and a lot of deep thinking.  Most of the physical work is buried in a mass of spreadsheets, statistical models, and written documents on my hard drives.

I have been trying to pull this work out and craft it into some sort of organized set of ideas and proof of concept documents.  If you go back over my LinkedIn posts, and blogs on https://www.datapatternanalysis.com you will find some bits and pieces. 

The previous work I posted is based on the three facts of life that I proposed above.

Posted by Thomas Arnold

A Breakthrough in Health Risk Ranking

I created a health risk ranking system that is 50% more accurate than commercial models.  Technically, commercial models rank cost, not risk.  My model ranks health levels.

I wanted to make the white papers available to anyone who is interested.  I need to rewrite these, but I will err on the side of presenting OK, vs great.

I started by building a traditional health ranking model. 

See the white paper here.

Risk Model V25.9 White Paper

And the Technical Guide here.

Risk Model V25.9 Technical Guide

In 2017, I was able to get a 50% improvement in risk ranking accuracy through the use of a rank based inverse normal transformation.  

See the explanation here.

Technology Breakthrough.

 

Posted by Thomas Arnold

52 Weeks of Data Pattern Analysis: Week 5: The Healthcare Dilemma

The Dynamic Normal Distribution with Threshold

I am going to change direction this week.  My previous posts did not seem to be very effective in engaging reader participation, so I want to try a different approach.  I want to discuss something I call the healthcare dilemma.  The healthcare dilemma I am referring to is the problem with missing data.

Recall that I had suggested that we need to understand how the healthcare cost distribution arises if we want to be able to effectively change the cost structure.  The problem is that we have an over abundance of data on a few patients and no data from a sizable portion of the patients.  In the data I was using from a medium size healthcare system, 45% of patients seen in the past 3 years did not show up in the last year.  See below.

The Missing Data Problem

The fact that 45% of the patients have no data, and most patients have limited data creates a missing data problem.  I have been working on this problem for some time, and the most accurate health care model I have been able to devise is something I call the “dynamic normal” distribution.  This is based on the premise that health is both dynamic and normal.

If you look at the plot on the upper left of this post, you will see the instantaneous health of a sample of patients with the names Red, Blue, Green, Cyan, and Magenta.  Their health levels are constantly fluctuating within the parameters of a normal distribution.  You should be able to relate to this.  Some days you feel great, and some days you don’t.  Your health normally fluctuates.

Most of the time, your health is pretty good and you don’t need a doctor.  If you get seriously ill, you might “cross the line” and have to interact with the health care system.  I expanded the health care interaction case on the upper right.  Every so often, one of our patients get ill enough and they have to interact with the healthcare system.

The Key to Improving Risk Ranking

The key to improving risk ranking is to recognize that health is “dynamic normal.”  The annual health care cost plot shown above is generated from a dynamic normal distribution.  It is a relatively simple matter to transform the nonlinear cost values to a linear patient rank variable and use that linear variable as the outcome rather than the cost.  In regression, linear and normal is always better than nonlinear and highly skewed.

Transforming the nonlinear skewed cost to a normal linear patient rank variable increases the patient health ranking accuracy by 50%.  I have been struggling to get people to understand this.

Health is dynamic normal.  This creates a health care measurement dilemma.  We need to understand the nature of health before we try to change it.

There is one more piece to this puzzle that is related to differences in fluctuation levels.  The health of high risk patients fluctuates more than the health of low risk patients. I will try to address this topic it in future posts.

Posted by Thomas Arnold

52 Weeks of Data Pattern Analysis: Week 4: What is Your Epistemology of Change?

What is Your Epistemology of Change?

Catching up?  The posts at http://www.datapatternanalysis.com/ have the previous week’s posts.  The activity section on my LinkedIn profile has them as well.  https://www.linkedin.com/in/thomas-arnold-phd/

This week, I will continue exploring the work I did with criminal offender risk scores.  Recall from Week 3, I had finished my Master’s thesis with a “paradigm conundrum.”  The Level of Service Inventory-Revised (LSI-R) was supposed to be able to measure changes in recidivism risk.  Based on the prevailing theory, my results opened up the possibility that the LSI-R could not measure changes in risk.

The average person might not have thought too much about this, but I had been thinking about this a lot.  See my post on the “slow change method.”

https://www.linkedin.com/feed/update/urn:li:activity:6817818955353243649/

Part of the reason that I had thought about this so much was due to a counseling class I took that consisted of looking at various types of counseling methods through the question “What is Your Epistemology of Change?”  The class was part of a series of classes in Family Systems counseling that were taught by the best teacher I ever had.  His name was Michael (Mick) Mayhew, and the classes were part of the Family Systems curriculum at St. Cloud State University in Minnesota, US. 

I don’t know if I ever completely wrapped my head around the meaning of “epistemology” in this context.  The word seems to be generally associated with “knowledge” and so I translated this as “how does change occur and how will we know when change has occurred?”  This was a great exercise for anyone interested in promoting change.  What is Your Epistemology of Change?

A year spent analyzing change

After I finished my Master’s Thesis, I was fortunate to get a year of tutoring in the statistics of life course sociology from Ross Macmillan at the University of Minnesota.  The results of that year of analysis are posted at

The-Nonlinear-Dynamics-of-Criminal-Behavior

I apologize for the academic nature of that paper.  For those not well versed in life course criminology, the references might seem obscure.  I will need to rewrite this if I want to reach a broader audience.

How can we assess change when something is always changing?

If you could take one thing away from this paper, it would be the images on Page 30.  In those analyses, I looked at changes in risk scores over three measurement cycles.  The first measurement cycle contained a “get to know you” change due to raters getting to know the offenders better.  The changes in risk scores between assessment 2-3 and assessments 3-4 show that change is constant.

Change is CONSTANT!

This is so important.  People are constantly changing.  This challenges our current scientific paradigms. 

How do you create change in a constantly changing system?  We don’t study the changes that occur because we tend to focus on two point in time measurements.  We almost totally ignore the constant hum of “natural change” in our scientific studies.  

For an exception to this trend, read Fleeson’s 2001 paper.

https://personality-project.org/revelle/syllabi/classreadings/fleeson.2001.pdf

Fleeson shows that personality traits are not stable.  They are constantly shifting over the course of the day.  My work shows that traits are shifting over months as well.

We need to change our research paradigms

We need to stop focusing on single point in time measures.  We need to start looking at how traits change over time.

 

 

Posted by Thomas Arnold

52 Weeks of Data Pattern Analysis: Week 3: A Paradigm Conundrum

This week I wanted to provide a deeper exploration into the concept of paradigms.  Recall that in Week 1, I wrote about the Age Crime Curve and Health Cost Curve and I indicated that I had found the reasons that these curves are shaped the way they are.  After several failed attempts to explain these curves, I wanted to try again with 52 weeks of data pattern analysis. 

In week 2, I tried to go back to the beginning, where I started to develop the solution to these curves.  I provided an overview of the concept of “paradigms” and went over some discoveries that I had made while working on my master’s thesis.  The short version of the story regarding my master’s thesis was that the results of my calculations had created a “paradigm conundrum.”

I Have a Plan!

I recognize that this is going deeper into the weeds of criminal justice research than most of you probably want to go.  There is a point however, and if you can bear with me, I will take you there.  I have a plan.

The problem with understanding the age crime curve and the health cost curve is that our scientific paradigms are flawed and we must start thinking differently.  The solution to these curves is about 5 paradigm shifts deep.  I know how to navigate these shifts because they are in a decade’s worth of research I did and never published.

I am claiming that we have the following situation where there are multiple paradigm shifts required.

  1. Paradigm 1: Shift your thinking
  2. Paradigm 2: Shift your thinking again
  3. Paradigm 3: Shift your thinking again
  4. Paradigm 4: Shift your thinking again
  5. Paradigm 5: Shift your thinking again

If I try to discuss paradigm shift 5, I doubt that anyone will understand what I am writing because I have glossed over a connected series of four previous unpublished discoveries that each involved a separate major paradigm shift. 

Paradigm shift 1 is hard enough to understand.  In paradigm shift 1, I will be discussing how we think about traits like the propensity for crime or the propensity for health.  There is an almost universal misunderstanding of the nature of traits.  People tend to think of traits as stable, but traits are highly dynamic. Understanding the nature of traits is essential for anyone who is interested in changing themselves, or facilitating change in others.

A Paradigm Conundrum

After I did my master’s thesis, I was facing a paradigm conundrum.  The paradigm conundrum was created because my analyses created questions about some existing criminal justice paradigms.  These paradigms have been driving criminal justice research on the concept of offending risk since the early 1970s.  Probably no one outside of criminal justice has heard about these issues, so I probably should provide a basic explanation of the issues. This will help you understand the context of what follows.

In the 1970s, we had a crime bump in the US and several Western nations that was largely due to the baby boom generation having lots of children in the 1950s. These children all reached their peak offending ages according to the age crime curve in the 1960s and 1970s.  Note that some will argue that this crime bump is not due to demographic shifts in population age intersecting with the age crime curve, but I have some analyses that support my claims.  More paradigms shifts are needed before I can cover that research and I will cover those paradigm shifts as well.

In 1974, Robert Martinson wrote a critique of criminal justice research called “What works?—questions and answers about prison reform.”  This critique was essentially a critique of the lack of rigor in the research methods being used in criminal justice.  Because of the lack of rigor, he argued that we really were not sure if treatment was effective.  However, rather than focusing on his critique regarding research quality, his message was construed as “nothing works” in criminal offender rehabilitation.

You can read a little about Robert Martinson’s work on Wikipedia.

https://en.wikipedia.org/wiki/Robert_Martinson

If you do a Google search for “robert martinson what works” without the quotes, you can read more about the fuss that was caused by the Robert Martinson article.

The combination of the crime bump, the Martinson article, and some other things that happened during the 1970s and 1980s lead to a decision by policy makers to dramatically curtail efforts to rehabilitate prisoners.  There was a “lock em up and throw away the key” mentality at the time.  Rehabilitation efforts were largely eliminated and replaced with long term fixed sentences.  Our prison populations exploded from about 100 prisoners per 100,000 people in the US population to up around 700 prisoners incarcerated per 100,000.

The What Works Paradigm

In response to the Martinson article and other events at the time, an effort was made to revive the offender rehabilitation paradigm.  The concept of “what works” was developed. The effort to find the things that worked to rehabilitate offenders was promoted in part by Don Andrews and James Bonta in Canada in the 1970s and early 1980s.  Sadly, Don Andrews has passed away.  Dr. James Bonta is on LinkedIn, so I will tag him on this. 

In response to the perception in the 1970s that criminal risk can’t be changed, Andrews and Bonta had suggested that not enough attention was devoted to discovering “what works” to reduce recidivism risk.  They argued that with the proper treatment, criminal recidivism risk can be changed for the better.  They wrote numerous articles and several books about the topic of what works in offender rehabilitation. 

The three pillars of efforts by Andrews and Bonta to promote what works were to focus on 1) static recidivism risk, 2) dynamic treatment needs, and 3) responsivity to treatment considerations.  In order to quantify static recidivism risk and dynamic treatment needs, Andrews and Bonta developed a “dynamic risk assessment instrument” called the “Level of Service Inventory-Revised” (LSI-R).  They suggested that the LSI-R could measure levels of static offender risk and changes in dynamic treatment needs.

The Dynamic Predictive Validity Test

If you recall, my master’s thesis involved testing the “dynamic predictive validity” of a criminal offender risk assessment.  The instrument scores I was testing were generated with the LSI-R, and “dynamic predictive validity” was a psychometric test that had been invented by Andrews and Bonta in the 1980s and 1990s. 

The basic premise behind the dynamic predictive validity thesis was as follows.

  1. Criminal recidivism risk is dynamic (it is changing over time).
  2. The LSI-R is a “dynamic” risk assessment instrument capable of measuring changes in recidivism risk.
  3. Prediction accuracy improves from the first to the second assessment with the LSI-R.
  4. The improvement in the LSI-R accuracy from the first to second assessment is because 1) offender recidivism risk changed between the first and second assessments with the LSI-R and 2) the LSI-R detected the changes in recidivism risk and that was why the second LSI-R assessment score was more accurate.

The Paradigm Conundrum

The results of my master’s thesis created a paradigm conundrum.  First, my results replicated all of the previous research. Prediction accuracy improved from the first to second assessment with the LSI-R. 

However, since prediction did not improve from the second to third assessment or from the third to the fourth assessment, it appeared that some other mechanism was at work. Whey would prediction only improve on the first two assessments.

I had several possible explanations for my findings.

  1. The work in my master’s thesis was flawed.
  2. The dynamic predictive validity thesis was flawed. It was possible that the LSI-R could not measure change and something else was causing the improvement in prediction accuracy for the LSI-R from the first to the second assessment.
  3. Offender risk was not changing significantly.

Regarding number 1, there were all sorts of possible problems with my thesis.  The samples were getting smaller.  Perhaps something happened with the smaller samples.  However, my smaller samples were bigger than those that had been used in previous research. Something was wrong, but it was not clear whether it was my data or methods. My methods seemed to be exactly like those used in the previous research.

Number 2 was also a possibility.  Andrews had pointed out that there was another reason that the LSI-R could become more accurate between the first and second assessment.  He had indicated that rater accuracy could be improving between assessments because the rater was getting to know the offender better after the first assessment.  This was a distinct possibility that would explain my results.  If the rater had already spent 6 months working with the offender between assessment one and two, the rater might have hit the top of the learning curve and not get much better on the third and fourth assessments.  How would one determine if rater improvement was causing the results?

Number 3 seemed to be unlikely since there were changes in the LSI-R scores between assessments.  However, were the changes in score big enough to produce measurable changes in offending rates?  Were the changes in score insignificant?  How does one tell if a risk score change is big enough to be significant?

The Solution is Coming!

I will try to explain how I went through each of these points in a step by step fashion to resolve this paradigm conundrum.  The process I went through involved a rigorous step by step analyses with hundreds of different tests.  I did not know how to do these analyses, and I seemed to be in uncharted territory, so I invented a process. 

The process I developed has direct implications for our understanding of human traits.

I promise to skip some of the boring parts and stick to the parts that you should care about.  More to come …

Posted by Thomas Arnold

52 Weeks of Data Pattern Analysis: Week 2: A Primer on Paradigms

APrimerOnParadigms

Last week, I kicked off 52 weeks of data pattern analysis.  This is week 2.  My goal for this series is to see if I can explain some new and unexpected findings related to my scientific research in criminal justice, criminology, and health.  The past attempts that I have made to explain my results seem to have failed to generate any clear understanding.  In my first post for Week 1, I mentioned the age crime curve and health cost curve puzzles as two examples of solutions I have found where I can’t seem to explain my results, but there are several other discoveries that I would like to be able to discuss as well.

I promised that I would try to keep the overall explanation simple enough for the general reader to understand.  So for this week, I would like to discuss the concept of “paradigms.”  A paradigm is a shared consensus about how science is supposed to be conducted. 

https://en.wikipedia.org/wiki/Paradigm

Thomas Kuhn popularized the concept of paradigms in his book on the structure of scientific revolutions.

https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Revolutions

Kuhn suggested that science does not progress smoothly, but advances in an uneven fashion, with slow and gradual advances for several years in a row, and then fast change.  During the slow and steady advance phase, which Kuhn calls “normal science”, there is a set of “paradigms” or shared understandings of the generally accepted theories and research practices. 

https://en.wikipedia.org/wiki/Normal_science

During the “normal science” phase the scientific paradigms are able to provide a set of relatively coherent solutions to the major scientific puzzles scientists are trying to solve. Then, over time, some new puzzles emerge which can’t be solved using the existing paradigms, and it becomes clear that the existing paradigms are flawed.  When the flaws in the existing paradigms become obvious enough, a new approach is developed and a “paradigm shift” occurs.

https://en.wikipedia.org/wiki/Paradigm_shift

In my research, I discovered several instances where the prevailing scientific paradigms were flawed, and I had to make personal paradigm shifts in order to proceed.  The process of finding flawed paradigms began with my master’s thesis. 

I think that it is important to explain what happened with my thesis, even though most of you could care less about the subject I was studying, since the findings from my thesis were what caused the beginning of my 10 year, million dollar, odyssey into the analysis of change and the solution to the age crime and health cost curves.  In my thesis, I found a flawed paradigm.

http://www.datapatternanalysis.com/wp-content/uploads/2021/08/Dynamic-changes-in-the-Level-of-Service-Inventory-Revised-LSI-R-and-the-effects-on-prediction-accuracy.pdf

My thesis was based on a paradigm called “dynamic predictive validity” that was popular in the criminal justice literature at the time.  The dynamic predictive validity paradigm states that if you measure something twice, and whatever you are measuring has changed between measurements, the second measurement will be a better predictor of outcome than the first measurement in the period after the second measurement.  In my thesis, I was measuring the risk of reoffending for criminal offenders on parole, and testing to see if repeated assessments improved the prediction of reoffending rates.

The dynamic predictive validity paradigm had been tested many times and never failed to generate better predictions on the second assessment.  In my thesis, I changed the test a little, adding tests for improved prediction between the 2nd and 3rd assessment, and the 3rd and 4th assessments.  In these cases, prediction failed to improve.  The paradigm seemed to be flawed somehow. 

Why should prediction improve between the 1st and 2nd assessments and not between the next two pairs of assessments?  Was the propensity for criminal offending really changing or not?  Scores changed.  Did risk change?

I would like to stop here and continue on the next phase of my research next week.  In that post, I will discuss some research I did on the nature of change.  In the years after I finished my thesis, I became obsessed to finding the answer to the puzzles it generated. This was where I truly began to question the accuracy of our paradigms about the nature of traits like the risk of criminal offending.

Posted by Thomas Arnold

Newly Unemployed! Opening the Oyster!

I was laid off on Tuesday (August 3, 2021) from my job, which has been my source of income for the past four and a half years.  I have already had some condolences, and I wanted to try to explain my situation so that I don’t get any more.  I really do appreciate any concerns that people might have, but this is not an unexpected event.  This is really for the best.

I will be OK.  I am getting some severance and I can collect unemployment if needed.  My wife and I own three houses, and I am planning to sell one to pay the loan and get some cash.  I have a lot of skills and I will find something to do until I retire.  Since I am 65, I could retire now, but I am hoping to stay working for another 5 years.

I feel a need to provide an explanation.

I had a really great time working at this job, but I was in a position that made no sense for them or me.  I had been hired as a senior data analyst, and I was overjoyed to get this job.  I live in rural MN and there are not a lot of jobs doing data science.  I would have had to drive to Minneapolis, which is 70 miles away.  This job was only 30 miles away, and it was doing something I love, which is playing with data.

Over the next couple of years, I was amazingly innovative.  I created several mission critical systems.

  1. A complete Medicare ACO management system
  2. A complete Medicaid management system
  3. A world class population health ranking system
  4. A grant management system
  5. A system for program evaluation using propensity score matching
  6. More custom deliverables

The problem was that I was the only one who knew how all of these systems worked.  I was becoming a data technician instead of a data scientist.  What was worse for them was that if something happened to me, they would not be able to keep these systems running.

I told them two years ago that this situation was a ticking time bomb and they had two choices.

  1. Hire more people internally to support my systems.
  2. Hire an outside firm to manage these systems.

They chose option 2, hiring an outside firm to help them manage their data analysis, and I fully support their decision.  Rural MN does not have the talent pool that they needed.  Building an internal team would have been very difficult.

It took them two years to find, hire, and get the outside firm up and running.  They are using a company called LightBeam and it seems to be working for them.

I became redundant, and I was fortunate to be laid off gently.  So, thanks for any concerns. 

Look for some exciting new blog posts!

Posted by Thomas Arnold