We investigated France's mass profiling machine

200 points by sebg 2 years ago

As noted by the other investigating organization (La Quadrature du Net), the problem is worse than just the question of the algorithm implementation.

The problem is that the CNAF deliberately targets small unintentional errors rather than large-scale intentional fraud.

Why that? Because fraud is harder to detect and to prove (you have to provide evidence it was intentional).

So smaller and poorer families are disproportionately affected by the controls, because the algorithm was designed to do so. No computational adjustment can fix that: it is the initial intent that is broken.

Source: https://www-laquadrature-net.translate.goog/2023/11/27/notat... (auto-translated in english)

addcommitpush 2 years ago

> The problem is that the CNAF deliberately targets small unintentional errors rather than large-scale intentional fraud.
The French welfare system is incredibly complex - see for instance this [0] simplified description of housing allowances which is 80 (!) pages long. This is not the most complex part of the system. With such a system, there are massive amounts of errors, both too-much-money-given and not-enough-money-given. The scale is so large that the French Court of Accounts refused to certify the CNAF accounts last year[1]: thoses errors represent about 7.5% of the CAF budget.
So basically the probability to have an error is just a function of how complex your situation is, and thus the "algorithm" targets more complex situations - change in your marital situation, having adult children (which may or may not need to be taken into account when applying for benefits depending on a bazillion variables), and so on, increases your probability to be targeted.
[0] https://www.ecologie.gouv.fr/sites/default/files/Brochure-ba...
[1] https://www.ccomptes.fr/fr/publications/certification-des-co...
- TheRoque 2 years ago
  
  As a french person, I don't get why no politician ever talks about simplifying those things. It sounds so easy and such a quick win, leading to more visibility on the budget, and people getting easily access to their due money. But I think I know the answer: current government doesn't want to make welfare easy to access, they want to actually deter people to use it unless they absolutely need it.
  
  brnt 2 years ago
  
  If France is anything like the Netherlands, the benefits system isn't used by a few poor's, it's used by the majority of the country, instead of adapting tax codes. Simplifying it is going to (unintentionally and intentionally) hurt some groups, and nobody what's to be on the hook for it, despite it having been an increasingly large election theme.
  France at least has a culture of constitutional reboots.
  
  WarOnPrivacy 2 years ago
  
  > current government doesn't want to make welfare easy to access, they want to actually deter people to use it unless they absolutely need it.
  The more complex a system, the more skill and resources (ability, time, finances) it takes to navigate it.
  Eventually, complexity serves to exclude everyone but people who are able to make a career out of pursuing benefits - who are also more likely to be fraudsters.
  
  croisillon 2 years ago
  
  some cabinets have had a "ministère de la simplification"
  
  addcommitpush 2 years ago
  
  It's quite simple I think:
  1. for any given budget, there is no Pareto-improving reallocation: if you want to give more money to someone, then the money has to come from someone else.
  2. given the current complexity, there are a _lot_ of edge cases to account for. If you do not want to make any loser after a reform, and not have as many edge cases, then you'd need to pump a lot of money to a lot of people so that "edge case people" who become "average joe" do not lose out. See for instance the people who end up with less disposable income when their pension is raised (https://www.alternatives-economiques.fr/vrais-faux-gagnants-...) (!). See also the riffraff about the "montant net social" - now the exact income (which is basically net salary + a bunch of things your employer pay for you and are counted as income) you need to report on welfare application is written on pay slip. Nice simplification here, right? People who reported wrong income (only net salary generally) were upset that it was a plot to decrease welfare.
  3. people genuinely genuinely love special cases. Hence the tradeoff for the government between adressing a special case but adding more complexity always end up with more special cases, more complexity.
  Some examples: since housing is expensive, people want to help renters with cash payment (of course it can't be bundled with the basic income, it has to be its own benefit), but they also want some public housing with below market rents. Now you need to acocunt for the in kind benefit of having a below market rent in the rules of housing benefits if you want to be relatively just between those two populations.
  Recently, the cash benefit for handicaped people computation was changed - it depends only on the receiver's income and not on household's income (the main argument was that household income as an input makes handicapped less autonomous on one hand, and decreases working incentives for them). Now this means that some benefits are computed at the individual level, other at the household level. Of course there is a transitory period where people can be grandfathered-in the old rules so as not to make any losers.
  And so on and so forth.
  Large families need their own benefits, because they have unique(tm) needs, you just can't make a per child benefit that just scale.
  And so on and so forth.
  Did I mention that you want to help overseas territories with special fiscal rules?
  And so on and so forth.
  I think the two most prominent examples of this were the two failed Macron reform: the first pension reform (universal public pension fund instead of several) and the basic income (revenu universel d'activité - basically a merger of APL+RSA+PA at least). Always a special category that lose out if they own complexity-inducing special case is ironed out.
  4. because of points 1-4, no one understands anything and thus there is a strong suspicision that the government is here to rob you of [your benefits | your pension | etc] when there is a reform proposal.
dorfsmay 2 years ago

Same issue as with tax fraud. Governments prefers dealing with mistakes and small frauds they are cheap fast success, simple to detect, no lawyer to fight back, but the opposite for large frauds.
This was explain to me by an accountant when I wondered why they wanted to have me fix what felt like I significant errors, mean while a big corporation was in the news for what was clearly tax fraud but the government was dragging they feet about it.
- ethbr1 2 years ago
  
  It's also, generally, a complexity problem.
  Corporate accounting is more complicated than family accounting. Even if the corporation isn't trying to do anything complicated!
  Consequently, there are more edge cases and grey areas. As an accountant friend said to me, it's more like law than science -- knowing lots of laws and regulations, plus history, and deciding how to mostly correctly classify various things.
  So something like trying to write the most efficient assembly algorithm possible, while Congress is modifying the ISA every year.
  (Which isn't to say that loopholes don't exist, corporations don't abuse them, or corporate tax attorneys don't delay enforcement actions... but is to say that even in best case, family accounting is much simpler than corporate)
  
  wslh 2 years ago
  
  > It's also, generally, a complexity problem.
  I don't think so in 2023. I worked several years in a tax agency and it was mainly a problem of "motivation". I have a friend who pursued "data warehouse" for 30 years there... nowadays you can crunch all the information and find patterns . I would even suggest that tax agencies should anonymize data and create data bounties to help them. In the same way DARPA creates cyberchallenges [1].
  [1] https://www.darpa.mil/about-us/timeline/cyber-grand-challeng...
  
  ethbr1 2 years ago
  
  How much of internal accounting state leaks into tax filings?
  I assume when you're looking at forensic tax accounting, you're identifying present-year vs previous-year discrepancies?
  Or is there enough required in filings to generate something like a complete shadow accounting for a company?
  
  wslh 2 years ago
  
  There are relationships between agencies and corporations to link information. It is not only your filings what is at stake.
  
  Ridj48dhsnsh 2 years ago
  
  I think it'd be impossible to anonymize data in such a way that it's still useful but not easily identifiable with public or partial private information.
wslh 2 years ago

> The problem is that the CNAF deliberately targets small unintentional errors rather than large-scale intentional fraud.
I think your observation is the greatest one: you don't target "Madoffs or SBFs" you just look at the low hanging fruit where you look at simple probabilistic causality A => B instead of, or at the same time, targetting big malicioous actors. Big and/or complex crimes and corruption are protected.
The irony is that many times the crimes commited by big actors could be simpler to analyze based on tax and financial information.
xjay 2 years ago

Related: The AI Incident Database [1]
> The AI Incident Database is dedicated to indexing the collective history of harms or near harms realized in the real world by the deployment of artificial intelligence systems. Like similar databases in aviation and computer security, the AI Incident Database aims to learn from experience so we can prevent or mitigate bad outcomes.
[1] https://incidentdatabase.ai/
RileyJames 2 years ago

Are they taking this approach only in the negative form?
Ie; if they detect you’ve made a mistake an and not claimed something you are eligible for. Or paid some tax that you are actually eligible to deduct/offset somehow, does it notify or automate the process of fixing that error?
I’m not opposed to the idea of automate to ensure compliance with the law / regulations. But enforcement should go both ways. The goal of automation of policy should be for the policy to be maximally effective. It shouldn’t just penalise those who incorrectly claim, but rather maximise eligible claims. And by that same measure, ensure those ineligible are rejected.
Obviously without any penalities, this would lead to over claiming, and reliance on the system to reject. But I think the system should be resilient enough to handle that.
While it’s not to such an extreme extent, there is a polar opposite attitude / policy towards entitlement of government benefits in Australia and New Zealand. In Australia you may be eligible for something, but it’s your responsibility to know that and claim it. And the government takes a ‘we defend the benefits from those who are ineligible’ attitude (more so if the Liberal party is in power, see Robo Debt scandal).
In New Zealand they take a ‘these benefits must go to those who are eligible’ attitude. And they model the extent to which they fulfil that objective based on how many claim vs how many their model states are eligible.
The outcome is that in NZ they call after you’ve had a baby to ensure you’re receiving additional tax benefits, services, etc. In Australia you’d only receive a call / letter to notify your benefits are being cut off, and there’s some difficult->impossible method of re-applying. Such as contacting a call centre that’s queue is full by 9:30 and stops accepting calls…
mjhay 2 years ago

The purpose of a system is what it does.
bannedbybros 2 years ago

[dead]

knallfrosch 2 years ago

> which groups are more likely to be flagged. Did I miss the part where they checked whether the risk assessment was correct?

I see a lof of studies like this: »'Male drivers aged 18-25 leaving clubs at night' are a group that is overrepresented in police DUI (Driving Under the Influence of alcohol) checks.«

Stopping there is okay if you want to claim that any such targeting is wrongful. But the more useful information would be whether these groups are justly targeted.

jwie 2 years ago

This is a significant omission from the article.
The other part is how do you really measure efficacy of these systems? Unless you did some secret shopping, which in this case would be paying people to defraud public services occasionally, how would you really know how effective these systems were at reducing fraud costs to public services?
- occamrazor 2 years ago
  
  You also do random audits, to check that the targeting follows the actual risk.
mistrial9 2 years ago

no - add race, spoken language, education, social status, income, politics.. those indicators are always present, and are controversial today as differentiation for street law enforcement
- snowpid 2 years ago
  
  French republic and many other European nations dont collect on race. And "racial thinking in Europe" does not fit very well into American thinking.
  Where are you from?
  
  bashinator 2 years ago
  
  > French republic and many other European nations dont collect on race.
  Which is a convenient way for them not to track patterns of racial discrimination in their social systems.
  
  snowpid 2 years ago
  
  why should the government empowers pseudo biological bullshit? For me it is very weird to write human race into driver licence, census or university application.
  
  Ridj48dhsnsh 2 years ago
  
  Because perceived race is a big factor in how most people treat strangers, so having that information would likely be useful in identifying unfair bias in enforcement.
  
  snowpid 2 years ago
  
  Also first sentences is wrong. Human societies had lots of categories than just race.
  
  snowpid 2 years ago
  
  Racism doesn't start with race. You can trace racist behaviour way before people invented "humen races".
  
  Ridj48dhsnsh 2 years ago
  
  Racism definitely is based on race, hence the name. Maybe you're referring to some other kind of out-group bias based on tribal or familial status?
  
  bashinator 2 years ago
  
  Why should government investigate whether their employees practice pseudo-biological bullshit?
  
  marcinzm 2 years ago
  
  As someone else said, it makes it so much easier to ignore when it's used to go after people:
  https://www.hrw.org/news/2023/10/19/french-high-court-recogn....
  
  snowpid 2 years ago
  
  I don't see how state collected races are helping to these situations.
  Also racism is very well convoluted in Europe. American race hierarchies don't help.
  
  digging 2 years ago
  
  > Also racism is very well convoluted in Europe. American race hierarchies don't help.
  Then use European race dynamics. Or even better, use local national race dynamics. But you can't do that if you don't collect data or make any effort to understand national race dynamics.
  It absolutely does not follow that "Because American race dynamics =/= European race dynamics", therefore "there's no point in considering race dynamics in police action."
  Worst case scenario, you spend tax money proving that people don't practice racism in France. That'd be a hell of a discussion point.
  
  mardifoufs 2 years ago
  
  There's more racism in France than in the US in my experience. The difference is as you said that France would rather ignore racism or point at the US than care about it. So your comment is quite ironic.
  
  mistrial9 2 years ago
  
  California here, born American

zelda-mazzy 2 years ago

I read their report on Rotterdam's welfare fraud prediction system a few months ago and it was fascinating. Really opened my eyes to a lot of things, specifically how fast any bias can create a feedback loop that causes one demographic to be investigated more often than others.

nonrandomstring 2 years ago

Seems two things are at play here, the model, and how the model is used. The article was quite dry, refreshingly analytical and not overly critical of the expected ground truth that the poor are punished for being poor.
It's great these models are legible and open. Looks like civil service doing a job as well as possible.
The other half of the story is how, in practice these models are interpreted and what actions follow. Are further modification of the model a direct result of that and how iterative is it?
Doing that aggressively changes a bare "model" into an investigative tool, not simply a thresholding utility. Sorry if I missed it, but I don't remember reading anything about how it feeds back. But what I did read was unsettling - entering people's homes at random, quizzing neighbours, pretty fascist stuff. am I wrong?
I guess my point is that if you look at a bare data set, or even an algorithm, sure you can probably infer a lot about biases and intent that might be built in, but you can't see the bigger model within which this functions - and that's the real story.
Is it a persecutory investigative tool?
Like Hicks said we could use cruise missiles to drop food into the mouths of hungry people... a benefit system that could identify people who are struggling (of which crime is an indicator), and, I dunno, give them some money and help? That would be preachy, and quite possible imho.
- sam_lowry_ 2 years ago
  
  > a benefit system that could identify people who are struggling > (of which crime is an indicator), and, I dunno, give them some > money and help?
  This is how the system should function, but then you get public outcry when the worst of us are found abusing the system of subsidies while perpetrating their crimes. I think specifically of Marc Dutroux here.
  On a side note, after the mass influx of Ukrainian refugees and wide public support, it is much more obvious that those in the greatest need tend to demand less, not more.
  
  nonrandomstring 2 years ago
  
  Just returning a context here [0]. Interesting case of which I was unaware, Thanks.
  [0] https://en.wikipedia.org/wiki/Marc_Dutroux
VoodooJuJu 2 years ago

I have limited resources for investigating robberies. I need to allocate these resources efficiently. 5x more robberies are reported in zone A than zone B. Why would I allocate an equal amount of resources to each zone?
I have limited resources for investigating fraud. I need to allocate those resources efficiently. If profile A is more likely to commit fraud than profile B, why would I allocate an equal amount of resources for investigating fraud between them?
I'm going on a hunting trip and have limited time and resources for hunting. I need to plan my trip accordingly. If deer are more abundant for hunting in Colorado, why would I go looking for them in the Sahara?
Am I wrong to look for deer in Colorado rather than the Sahara? Am I biased for investigating fraud where it's more likely to occur? Am I evil for investigating robberies where they're reported?
I have little time and resources on this planet and want to allocate my time and resources as efficiently as I can in order to do the most Good. Why would I allocate my time and resources inefficiently?
- kamma4434 2 years ago
  
  They actually do both - they spend three quarters of their time in Colorado, but they also sample random places and use that data to decide where is the next Colorado.
Biologist123 2 years ago

The model is never the reality.
- godelski 2 years ago
  
  This is often stated but I think often it is forgotten what is a model. It's not just "the algorithm." The metric is a model (the Wasserstein distance between norm layers of a discriminator is not image quality. Entropy isn't language). The data is a model (scraping all the internet isn't human language nor thoughts). It's important to always remember that these things too are models. Often proxies for things that are intractable. They are fine to use, but not fine to forget that they are not perfectly aligned. They are maps and to use another clique, the map is not the territory (no matter how detailed that map is).
- raybb 2 years ago
  
  But some are useful
  https://en.m.wikipedia.org/wiki/All_models_are_wrong
troupo 2 years ago

And this is the actual danger of AI, not the doom and gloom of "AI will wipe us out".
- ilkke 2 years ago
  
  You don't need AI, just a government willing to abuse the most vulnerable. Check out the Australian "robodebt" fiasco [1]
  [1]https://en.m.wikipedia.org/wiki/Robodebt_scheme
  
  quickthrower2 2 years ago
  
  Post Office Scandal
  https://en.m.wikipedia.org/wiki/British_Post_Office_scandal
  
  godelski 2 years ago
  
  > You don't need AI
  No one thinks this. People that are worried are worried __because__ you don't need AI. AI is just more powerful tools.
  
  repelsteeltje 2 years ago
  
  The thing with enforcement and AI (or more generally: automation) is that it speeds things up and amplifies biases. The silver lining of these mass profiling accidents might be that it mirrors and reveals the design flaws in law and lore that might stay under the radar when only applied individual public servants. It lifts "incidents" to "categorical failures".
  
  bonton89 2 years ago
  
  AI potentially adds "Computer says no" to an already bad situation. And since it is blackboxy most people can't even pick it apart when it is wrong.
  
  godelski 2 years ago
  
  It's mostly about scale, but there are many issues tbh
  - Enables significant increase in surveillance
  - Enables highly specific targeted action (i.e. micro-targeted advertising, but not necessarily ads)
  - Enables said action to be performed at high speeds with low latency and for cheap.
  - Obfuscates decision making processes making it harder to point to who is doing wrong and even how they are doing wrong (because burden of proof is on those being wronged. More black box = more better)
  - Enables easier deflection and laziness as one can say it is just math and pretend that math is objective and not subject to usage. You can deflect claims of targeting specific groups because you do not use a variable explicitly defining those groups but you use some or all of the strongly correlating variables.
  There's plenty more here too. Of course this isn't all even necessary. I do work in ML and I do have a passion for it. These tools can be great and do a lot to make human lives better. I really do think with them we are capable of doing amazing feats such as reaching a mostly post-scarce society within our lifetimes. You could have them grow the food, pick the food, transport the food, stock the shelves, be the cashiers, and even transport the people (or products) to their desired locations. We could liberate most humans of most labor. That is on the horizon. But even such an amazing outcome which would allow humans to be more human than they've ever had the ability to be in all of history -- allowing them to pursue arts, sciences, community development, and all sorts of things without needing to worry about sustaining one's survival via one's labor. The transition to this post scarce world is not just technological challenges as those jobs won't be homogeneously displaced and as we must adapt to this post world. But I'm sure everyone would absolutely love robot butlers (that are non-sentient. Sentient machines shouldn't be forced laborers, but that's a whole other conversation because sentience isn't required for this. Narrow AI can do many of these already at high competency levels). So even the purely good side is still treacherous.
  I'm just saying, we need to think deep and carefully. The nuance is not just digging in the weeds, it is the critical aspect. Ignoring it isn't "good enough" when ignoring the details is what specifically leads to trouble. But it's common to hear that these are claims of pedanticism.
  
  pixl97 2 years ago
  
  https://en.wikipedia.org/wiki/Computers_Don%27t_Argue
  
  DeathArrow 2 years ago
  
  How does an investigation equal abuse? If you did no wrong, you don't have to fear any investigation about what you did with the money, and whether you deserve it or not.
  By requesting welfare money, you agree that the government can investigate you.
  
  danielheath 2 years ago
  
  Robodebt wasn't "an investigation". It was "Our algorithm (which we know is very frequently wrong) guessed you might have money, so we've retroactively cancelled your benefits and will come after you for fraud until you can prove your innocence via an appeal process that's been deliberately obfuscated".
  Governments own lawyers advised them that their plans were illegal, in advance.
  
  DeathArrow 2 years ago
  
  That sounds nasty. I was talking about the context about France's profiling algo.
  
  joshxyz 2 years ago
  
  guilty until proven innocent, yikes
  
  quadcore 2 years ago
  
  If you did no wrong, you don't have to fear any investigation about what you did with the money, and whether you deserve it or not.
  First-hand experience here. Doesnt work like that.
  How does an investigation equal abuse?
  It does not in theory but in practice that is the vehicle. You have no idea what Ive been through lately via KYC in France. I was worried Id never see my money while they were making a fool out of me. It's the vehicle and if one's not a discrimination target indeed it's invisible.
  
  dragoncrab 2 years ago
  
  Being target of an investigation can have enormous costs.
  I've been there several times.
  I needed to provide ample of documents that took time to assemble and I needed to take part on in person hearings, once in a town 200 km away from my residence.
  Yes, they pay for the travel and by law my employer needs to provide me extra vacation for those days, but being the guy who needs to take 2 more days again doesn't necessarily boost your career and I'm sure my family would had appreciated if I spend the weekends with them instead of reading up legalisation.
  Neither case I've done anything wrong and I was always very cooperative.
  I broke, when I was fined for tax evasion, presented evidence on 3 hearings that they were wrong and after they revoked the fine, I still needed to pay interest as according to law I should had paid the fine while the appeal was in progress and since I didn't, I need to pay a surcharge for the delay. Given the entire case was a huge government error I could had appeal against the surcharge, but I didn't want to go through 3 more hearings.
  You can absolutely bankrupt entire families both financially and emotionally with this shit.
  
  pixl97 2 years ago
  
  Ok, I'm going to investigate you for child pornography as the state.
  Don't worry about it when I ask your friends and coworkers about your internet browsing habits.
  And oh, after all that I'll not tell anyone "Sorry, we had a mistake in the algorithm and DeathArrow had nothing to do with it at all".
- quickthrower2 2 years ago
  
  They are both arguably dangers. They are very different types of risks.
- xjay 2 years ago
  
  Indirection is the root of all manipulation.
networkchad 2 years ago

[dead]

dr_kiszonka 2 years ago

I am curious why they used a logistic regression. Given that their "IVs” are likely highly correlated, the interpretability of the model is not great, so they could have selected something more non-linear and possibly of higher predictive power. However, they used SAS, which - in the US - would suggest to me that some sort of a statistician was involved, so they would have known about the interpretability issue. (Or maybe I need to read up on logistic regression; it has been a few years.)

> An important limitation of our approach is that we do not know the relationship between the different variables that we analysed. For example, having a child between 12 and 18 (which increases risk scores) is likely correlated to being above 34 (which decreases risk scores).

Doesn't France publish demographic data that would help determine these relationships?

Anyway, it was a good read and many things were very clearly explained, which I appreciated.

extr 2 years ago

It's most certainly done for interpretability reasons. A non-linear model would provide huge lift here. Another option could be to run a non-linear model off to the side and understand the most powerful interaction terms and incorporate that into the linear model. This same problem is faced by insurance companies vs insurance regulators.
- dr_kiszonka 2 years ago
  
  Right, but they can't really interpret this model, even if it does a good job at prediction. Two references, in case they ever read this :)
  1. https://www.tandfonline.com/doi/abs/10.1080/09720502.2010.10... [1051 citations]
  2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4888898/ [1032 citations]
  
  pas 2 years ago
  
  for those less blessed with a priory knowledge or time (and/or motivation to chew through these articles) can I ask you for a tl;dr? maybe could you show an example where the linear model fails? and what's an non-linear model that would overcome this problem?
  (also I have no idea whether it's related, but ... my very sparse knowledge of this field just regurgitates the Statmodeling blog (by Andrew Gelman and co.), which is basically something something monte carlo + hierarchical models + calibration... but of course it's easy to shout buzzwords.)
  
  extr 2 years ago
  
  The tl;dr is multicollinearity is a big problem for interpreting models like this. Aka you have variables in the model that are highly correlated and because of this the model does not necessarily assign weights to the variables in a way that means anything. Eg: In the article they show "Age: 34 or Older" = -.11 and also that "Married" = -.49. Being over 34 and married is -.60. But as far as we know the model doesn't account for the fact that people older than 34 are more likely to be married in the first place. Perhaps the model would have worked just as well if "Age: 34 or Older" = -.30 and also that "Married" = -.30 (still -.60). And so on with every possible combination of attributes (they are all correlated).
  This isn't the end of the story. In situations where interpretability really matters or you have the time/resources, it's possible to mostly prevent this by building the model slowly and diagnosing multicollinearity issues as they arise. Perhaps you purposefully leave out variables because you see they are already highly correlated with existing ones and leave the model weights unstable. It requires exhaustive analysis and necessarily involves subjectivity, though. The researchers here could have done something like this, it's certainly done in industry for the aforementioned financial regulatory environments.
  Non-linear models like trees don't have multicollinearity problems because they implicitly model interactions between variables with the branching behavior. Instead the problem is interpretation, it's hard to sum up all those branches in a way that is intuitive and meaningful.
addcommitpush 2 years ago

> I am curious why they used a logistic regression.
CNAF is not a competitive employer.
French public service is essentially split in two: the State, and the various "Social Security" bodies which manages most of the welfare system (among them, the Caisse Nationale des Allocations Familiales which manages part of social & family welfare).
The "State" side has a very competitive track for stats/econ/ML employees, namely the national statistics office members (INSEE). As the name does _not_ indicate, they work everywhere in the government, not just at INSEE.
So if you're good at stats and in public service, the odds you end up in the CNAF team instead of the statistiscal office of the Social Ministry is about zero.
Hence SAS. Hence logit. And so forth.
benob 2 years ago

Could it be that they are mandated by law, like the French banking sector for credit scoring, to use an interpretable method?
- fmajid 2 years ago
  
  The EU as a whole requires models used for automated decision-making to be interpretable, and logistic regression is highly so.
  Benefits fraud is a big problem in France, and this investigation shows an agency doing its job properly.
  
  orwin 2 years ago
  
  The social fraud in France, combining all type of social fraud, is estimated at 1.1 billions (probably more, because Healthcare fraud is massively underestimated and growing, due to private clinics helping a lot with producing misleading claims and documents, but considering The total healthcare budget, it cannot be that much higher).
  Corporate tax fraud is 27 billions
  Social security contribution fraud is estimated at 14 billions (to compare to 200millions for the share of social security fraud)
  Income tax fraud is 17 billions (and wages are automatically declared, so this fraud is mostly on non-work income).
  
  CalRobert 2 years ago
  
  Perhaps it's related to visibility and a feeling of "fairness". I completely understand all of the things you wrote, and the corporate tax fraud is a MUCH bigger problem.
  But for some reason when my neighbour told me she'd been lying about not being able to work for the last 12 years so she could collect Jobseeker's allowance (basically "unemployment insurance" for those in the US) it really stung and felt like I, personally, was paying for her.
  I got a lot more conservative after I moved to the countryside and realized how casual fraud and tax evasion were (I was routinely asked to pay for several-thousand-euro jobs in cash, and tradespeople balked at me for wanting a receipt)
  
  orwin 2 years ago
  
  Ahah, i'm from the countryside, I know what you're talking about.
  Weirdly, social contribution fraud and income tax fraud ('travail au noir') is present in really big cities (mostly because the cost of getting caught is quite low compared to the margin of construction business in Paris/Lyon), and in rural areas, because no one ever check (it's changing, slowly).
  >> felt like I, personally, was paying for her.
  You also helped found Vlc, and probably half of the French startups that exist today. I agree that this is not fair (I do not ask for insurance when I'm in between jobs, because I have the means to ignore it now), but also, it will last two year at most.
  
  CalRobert 2 years ago
  
  Well, I was in Ireland, not France, but it sounds similar.
  I think the main difference is that when I hear my neighbour committing fraud, I think "well why the hell am I not doing that too?", but when I hear about a big company doing it, I don't feel like I could be doing the same.
  Funny enough, I _did_ recently move to the Netherlands and found myself as the director of both an Irish and a Dutch company at the same time. I briefly looked in to Base Erosion and Profit Shifting (Dutch Sandwich, etc.) but suffice it to say as the only employee of both of these companies I didn't have the scale to make it worth the trouble.
teruakohatu 2 years ago

The sad state of the industry is that terrible compromised models are delivered all the time by people who should know better.
I sat through a presentation on a national govt AML machine learning system, they discovered after building it that the training data was all wrong. Luckily it hasn’t been put into production yet.
- alexey-salmin 2 years ago
  
  I don't at all see from the article how it's "terrible compromised".
ramraj07 2 years ago

Probably a physics PhD data scientist :)

Onavo 2 years ago

> Finally, the raw score is run through a squasher function that outputs a probabilistic risk score between 0 and 1, where 1 is the highest risk of fraud. A raw score of 0 corresponds to a risk score of 0.5.

I vote that machine learning practitioners shall henceforth refer to the softmax formula as the "squasher function". It's so much more descriptive, kudos to the French!

blt 2 years ago

It's a sigmoid function, not softmax. Softmax is for multi-class classification.

sylware 2 years ago

So, they concluded that the people most likely to commit fraud are those under the most economic pressure?

They need an algorithm to do that? As far as I know this is expected behavior of normal human beings. Geniuses all over the board.

We all know the core of the issue is the following balance:

- expecting money to circulate in a sane society, namely where there is not a massive amount of BS jobs, that to give a decent living to all households, is just utopia. Thinking otherwise is actually dangerous.

- one of the bad sides of human nature: some "nasty" and/or really "hard" "essential" jobs have to be done, and most people will try to avoid them (many won't be able to do them anyway). Not to mention, for a "developped" country, the level of efficacy is very high and only a small amount of people rotating on those jobs (but they need to build experience anyway) is actually "needed" to make things work properly.

pas 2 years ago

the problem is that this is an important system (because it directs ~80% of investigations) without transparency, audits, and on top of this when these groups started to ask questions the attitude of the government was - surprise surprise - to try to deny their requests for transparency
and on top of (or beneath) all this there's the question of model performance, calibration, etc. and cost-benefit of this whole investigation department.
as in the recent case of California welfare fraud ~90% (or more) was perpetrated by organized crime groups. (which is absolutely not surprising, after all it requires a big operation to file hundreds of thousands of completely fake but seemingly-valid claims.) their claim that their model is targeting ~600 or more EUR/month frauds, which covers ~98% of the cases (based on some past historic data), but there's no mention of how much sense does this make. (it's quite likely that they could get ~90% of fraud value with drastically less effort spent on investigating those who are already likely to turn out to be rightful recipients, etc.)
so all in all there ought to be a very strong bias against the classic "looking for the keys under the streetlamp" mistake, which in this case means there's likely a real negative cost-benefit to spending most investigative resources on those who are already very likely to be rightful recipients of welfare.
(not to mention the usual completely avoidable - and thus totally infuriating and idiotic - discontinuity problems, like single parent with a kid just turned 18, or whatever the limit is, suddenly there's a huge drop in income, huge motivation to try to smudge the numbers, and somehow get more months of payments, etc.)
of course, hard data is always welcome, I'm just a regular keyboard warrior
- sylware 2 years ago
  
  Organized crime brings this issue to another level.
  "Basic controllers" will prefer to control not dangerous people (assuming they can tell them appart), to avoid to expose themselves to obvious danger. Because dealing with organized crime requires a whole other apparatus: the justice departement with the help of "services" (ahem...). Usually, those have to deal with the real nasty: killers, human traffickers, etc. They are very limited in human work force with often not enough means to do that work: so the "could you nuke those people because they are frauding the welfare system", ahem, maybe once in a while for posture. That makes me think about the movie and music industry asking for state-grade mass-monitoring of internet for their own benefit (in my country, they got that, hence you cannot trust the gov at not resisting such unreasonable lobbying).
  To try to explain the shadows around this: with fraud and organized crime, beautiful ideological ideas do not work anymore: this is the hard reality, and it is often ugly. Only one thing has to really monitored: if the reality induced bias-es are not significantly self-maintaining/increasing that reality with some sort of feedback loop, presuming it is even possible to quantify that... Everybody wishes to never be on the bad sides of those reality induced bias-es. That's for the the "good" part, the "nasty" part is when it is not reality induced bias-es anymore but transforms into something ideological.

DeathArrow 2 years ago

Well, if investigations are triggered only by algorithms, if you intend to commit fraud, you can run their statistical model against your future data and see if alarms fire off. If so, just massage the data until they don't.

a_dabbler 2 years ago

I don't think those likely to commit welfare fraud are going to be worrying about the statistical model and most of these variables aren't easily modifiable (except months since last emailed) assuming the source of information is reliable
makomk 2 years ago

The big practical obstacle to this is that the things this flags people as potentially committing fraud based on look to be the same things an actual fraudster would manipulate in order to get payments they're not actually entitled to - stuff line income, living situation, kids etc. So the obvious route to not setting off fraud alarms would likely involve not committing fraud, or at least not the most obvious and profitable kinds.
pas 2 years ago

there are these random sampling checks (I guess to get a reference baseline)
> The CNAF trains its models on households who were randomly investigated as part of the Paiement à bon droit Fraudes (PBDF), an annual survey

bertil 2 years ago

I have serious reservation about the system, but the key ones are:

- It’s wrongly presented as a tool against fraud: it’s a tool to identify complex cases that are more likely to need human review. It is presented as unfair because files with mistakes can lead to asking people to pay back the money they have received in error, but it’s a big moral step to see that as the state abusing its power.

- There’s no attempt to match that risk to the actual complexity of cases and decide if the model is fair. It feels unprofessional not to connect the model with observations and bias: a model where the Police makes arrest perpetuates bias because it doesn’t take crimes without arrests in un-patrolled neighborhoods. If that were the case, the article would focus on the weight in training of a sample audited at random, or about the value of families suspected to be confused that were not (less than zero).

The argument “they are targeting confused people in need of help rather than big fraud cases” doesn’t make much sense: those are never very large sums. Significant fraud cases would have to be spread around many recipients told how to abuse the system, and from whom fraudsters collect.

It is fair to make that point against large tax frauds, but

1. That's a different administration and

2. You don’t want a model estimating if the recipient is confused. Fraudsters are not confused; they know exactly what they are doing and why.

anonyme-honteux 2 years ago

I'm french and I clicked on this link, worried to learn about mass surveillance programs, NSA or communist china style.

But no, this time, it was about the welfare state doing its job, doing it mostly well, with details that were more impressive than I anticipated, and also opening its data to external analysts to continue improve it.

That was quite an emotional ride !

nfeutry 2 years ago

They did not want to open it, they were forced to by the "Commission d'Accès aux Document Administratifs". And the published code is probably outdated.
- anonyme-honteux 2 years ago
  
  I missed that context, thanks!
boudin 2 years ago

It's not the whole truth though, it's thanks to associations like La Quadrature du Net https://www.laquadrature.net/ that they had to share their algorithm.
It also shows a bias towards single familly with low income, adding pressure to families already in state of precarity. It doesn't mean it's intentional but it is still an issue.
The fact that there are no proper audit of those algorithms before allowing those to be used looks quite bad in my opinion. This can be quite impactful on people's life.
- anonyme-honteux 2 years ago
  
  I missed the context where it was associations that put the issue on the table. In the end we have an audit that show a bias against single family with low income. The bias itself is bad, but having an audit is good, it's the first step to improve things somewhat
smallpipe 2 years ago

Did you read the article? The code was obtained using a FOI request, and the model targets vulnerable making mistakes in claiming benefits rather rather than actual fraud.
Which bit is the one you're impressed with ? The part where you personally will be fine?
- logicalmonster 2 years ago
  
  > the model targets vulnerable making mistakes in claiming benefits rather rather than actual fraud
  Please keep in mind that this algorithm doesn't actually punish anybody for wrongdoing, it's just a starting point for a human-led investigation.
  Personally, I'd be more interested to know how a human-led investigation works. Is the process fair and reasonable? Do they go after poor people for minor problems with their following of the rules or do they go after them harshly? Is there a presumption of innocence and do they have an opportunity to appeal and have a fair shake in arguing their case?

switch007 2 years ago

If only the welfare departments employed the same rigour and determination when it comes to ensuring people receive benefits that they are entitled to but not claiming.

Will they use they spend the same effort to ensure nobody goes hungry because of a benefit they're entitled to but aren't claiming ...? I doubt it

——

Going after welfare claimants is wildly popular in the UK thanks to years of propaganda on TV dedicated to shows about people cheating the system. There is a widespread view that anyone claiming benefits is lazy and a cheat. This gives the departments carte Blanche to introduce inhumane claims processes, algorithms etc

skywal_l 2 years ago

In the conclusion of the article:
> It also pointed out that some investigations started by the model conclude that the CNAF actually owes the beneficiary money.
- switch007 2 years ago
  
  That’s a defensive statement from CNAF when they were asked to comment.
  What do they actually do with that data in those circumstances?
  It’s a completely different endeavour to look for fraud and to look for under-claimed benefits. I can’t believe the same project by the same people would be equally concerned with both tasks

kamma4434 2 years ago

If, as they say, “ The CNAF trains its models on households who were randomly investigated”, then the model is fair in assessing the risks of fraud.

The fact that street junkie vs Google engineer have a different propensity in mugging elderly ladies is definitely partially explaine by variable household_income is not economic discrimination.

You’d have a problem if you did regressions on a sample selected by the results of previous regressions, but this is not the case. And in this case you are looking for a biased sample of high-risk individuals.

Roark66 2 years ago

Sadly, all benefit systems that pay out on certain conditions are subject to fraud. For me as long as there is a human involved in the decision and the benefits recipient has the right to appeal or sue there is no problem in using ML models for "finding most likely candidates for fraud".

I don't know about France, but I assume it is similar to my country Poland where one can sue the state very easily with minimal expense. My parents did when I was a child and we were quite poor (self represented and all that).

AndrewKemendo 2 years ago

These kind of profile based forecasting systems *literally* encode the social biases embedded in the law into a measurement system that perfectly alienates individuals.

The laws and structure of society ENSURE this is the outcome of any hierarchical capital/property based system.

So it’s not the algorithm that’s the problem. The problem is having a system like this to begin with.

skrebbel 2 years ago

> the social biases embedded in the law
I’m not sure what you mean by this, do you have any examples?
- AndrewKemendo 2 years ago
  
  Example:
  Redlining was a racist social bias embedded in the city infrastructure planning and implementation process.
  Or
  “slaves are legally 3/5th of a human” was encoded into the US constitution
  By encoding into the legal structure a social bias, and then using that legal precedent as the goal state, implemented via orthogonal behavior measures, you are creating a measurement system that will enforce whatever goal condition the law encoded , including any biases that were encoded in the original lawmaking process
  [1] https://en.m.wikipedia.org/wiki/Redlining
  [2] https://en.m.wikipedia.org/wiki/Three-fifths_Compromise
  
  skrebbel 2 years ago
  
  To be fair I think it’s a bit odd that you only quote American examples (and century old ones at that) in a discussion about a profiling system in France. I can vaguely imagine that France may have laws, today, that are biased too, but I struggle to imagine that they’re anywhere near as bad as the blatantly racist laws America grew up with.
  Agree that it dehumanizes citizens though.
  
  AndrewKemendo 2 years ago
  
  I find it odd you find it odd. This is a website primarily used by Americans, which I am, and therefore have the most experience in that context.
  I am less familiar with French laws and you agree that there are likely laws which encoded bias.
  So we agree that the examples I used of bias encoding are invariant to the source and you accept the premise
  So the only thing you response does is give a snide jab at what you perceive as relative foundational inequities in the law.
  Was that your intent? To simply shit on America?

logicalmonster 2 years ago

Anybody notice that all of the word choices in the headline make this sound particularly dark and ominous?

Wouldn't a more accurate headline be something like "Here is the math behind how France's fraud detection system works"?

advisedwang 2 years ago

But it's not a fraud detection system. It's a "poor/disabled/divorced/etc" detector. Even if it is true that statistically more disabled people have recieved benefit overpayments, subjecting disabled people to more investigation is not OK.
and the training data is all overpayments, even if it is not fraud
- logicalmonster 2 years ago
  
  > subjecting disabled people to more investigation is not OK
  We all feel sympathy for poor disabled people who have a harder lot in life than most. But investigating a poor disabled person for fraud is only really a problem if the investigation is malicious, biased, unfair, disrupts their life, presumes their guilt, etc. How are these investigations actually run? Are they run fairly or not? That's what I'm really curious about.
  So personally, I think you're conflating 2 different things. Simply a basic check for fraud in and of itself is a separate thing from a malicious investigation that's run horribly.
  PS: Let's not forget that many poor people pay various kinds of taxes. Being a wise steward of their tax dollars and trying to fight fraud is not in and of itself anti-poor disabled people.
pyrale 2 years ago

As it should be.
This system targets these people and not other types of fraud which are a larger source of fraud in terms of lost money. This correlates strongly with political discourse, in terms of who is accused of being "problematic".
- logicalmonster 2 years ago
  
  > This system targets these people
  Let's keep in mind that this algorithm isn't in and of itself punishing people. It's just the starting point for a human-led investigation. Isn't that a proper application of technology?
  My bigger concern is that any human-led investigation is fair and reasonable and doesn't go hog-wild trying to punish people for minor misunderstandings of the rules and policies that are in place. (Somebody in this thread said that the rules just for how housing allowances can be used are like 80 pages long, so it's probably the case that it's pretty much impossible for any random citizen to know all of these rules)
  > other types of fraud which are a larger source of fraud in terms of lost money
  Surely you're not suggesting that welfare fraud shouldn't be investigated so long as say Military-Industrial-Complex fraud or Big-Pharma fraud is even greater?
  
  pyrale 2 years ago
  
  > Surely you're not suggesting that welfare fraud shouldn't be investigated so long as say Military-Industrial-Complex fraud or Big-Pharma fraud is even greater?
  No, I'm talking about other types of fraud effecting the social system funds. Some of them by individuals, some of them by companies.
  > Isn't that a proper application of technology?
  It's hard to apply technology properly when you don't start with good intents.
  
  logicalmonster 2 years ago
  
  Maybe we can just agree to disagree, but I'm not seeing how looking into the possibility of welfare fraud, so long as it's done in a fair and reasonable manner, isn't a good intention.

extr 2 years ago

Honestly impressive depth into the model. I also liked that the page didn't really politicize the issue.

DeathArrow 2 years ago

> I also liked that the page didn't really politicize the issue.
Are we looking at the same page?
>For vulnerable recipients who struggle to navigate the complex rules of the French social security system, the risk of their benefits being reclaimed or terminated is high — even if they unintentionally committed mistakes.
- pas 2 years ago
  
  how would you present this finding without "politicization"?

gafage 2 years ago

[flagged]

thriftwy 2 years ago

The whole idea of welfare-driven society is nuts and untenable.

Welfare-driven countries should deleverage their economies and depromote price growth until they no longer need to pay welfare on mass scale due to people being able to afford housing without much strain.