andrzejn: (Default)
[personal profile] andrzejn
На Осокорках тихо, у мене все гаразд.

Чекаю на перемогу.

Любопытно...

March 15th, 2026 07:04 am
silent_gluk: (pic#4742428)
[personal profile] silent_gluk
Большому покрывалу - 105 рядов, идет 106, до очередной примерки - 5 рядов. А всего, по моим расчетам, должно быть 150 рядов, чтобы разница с шириной составляла 75 см. И небольшая каемка. Но эти расчеты как раз примерками проверяться и будут.
[personal profile] nz
На сцені ваєнний завод.
На стіні заводу висить годинник.
Годинник показує 16:58.
Під годинником на низькому старті стоїть купка мудил.
Мудили дивляться то на годинник, то на двері.

МУДИЛО В ПУХОВИКУ
Та пашлі уже.

МУДИЛО У ВУШАНЦІ
Да ну, нєпаложена, давай падаждьом на всякій случай.

МУДИЛО В ПУХОВИКУ
(Закатує очі)
Та какая разніца, пака да прахадной дайдьом уже пяць будєт.

МУДИЛО В ОКУЛЯРАХ
У мєня так с брата прємію снялі.

Тривала тиша, під час якої чути нетерпляче перетоптування мудил на місці.

Годинник з особим цинізмом так саме показує 16:58.

МУДИЛО В ПУХОВИКУ
Да што такоє, может часи сламалісь?
(Звіряється з телефоном)

МУДИЛО В ОКУЛЯРАХ
Што, спокойно стаять нє можетє?

Годинник показує 16:59.

(ЗАВІСА)

(no subject)

March 14th, 2026 04:48 pm
cali4nickation: (Default)
[personal profile] cali4nickation
At this rate the Trump administration will release
the full unredacted Epstein files to distract from Iran.

На самой облезлой заправке регулярный галлон за $5.50. На Шелл напротив $5.99. Гойда!
[syndicated profile] bor_odin_feed
A1.jpg

Забавная байка про известного советского и американского киноактёра, танцора и хореографа Бориса Сичкина (1922 - 2002). Рассказал эту историю писатель и журналист Сергей Довлатов (1941 - 1990).

Знаменитый артист Борис Сичкин жил в русской гостинице «Пайн» около Монтиселло. Как-то мы встретились на берегу озера. Я сказал:
- Мы с женой хотели бы к вам заехать.

- Отлично. Когда?

- Сегодня вечером. Только как мы вас найдем?

- Что значит – как вы меня найдёте? – возмутился Сичкин, - В чём проблема?

- Да ведь отель, - говорю, - большой.

Сичкин еще больше поразился:

- Это как прийти в Мавзолей и спросить: «Где здесь находится Владимир Ильич Ленин?

(На заглавном фото: Борис Сичкин в роли Бубы Касторского к/ф «Новые приключения неуловимых» (1968))


Друзья, подписывайтесь на мой канал в Telegram. Это одна из немногих соцсетей, где можно говорить правду и которая при этом всем доступна. Там много интересного.


(no subject)

March 14th, 2026 04:22 pm
cali4nickation: (Default)
[personal profile] cali4nickation
You already live in Rome post-collapse. 
The coins are worth nothing and no one
is fixing the concrete. The people are
foreign and the land is being looted. 
The news just hasn’t reached our village yet.

"Boris Cherny, creator of Claude Code, ships 20-30 pull requests per day. Major code changes, not typo fixes. He runs five parallel AI instances, each on a separate branch. Compare that to a traditional engineer : 3 PRs per week. Cherny isn’t 10% more productive. He’s 30x more productive."

Меня терзают смутные сомнения что не только агенты интереса во мне не вызывают но и если придется то у меня не получится и близко к этому. И тоска по этому поводу ежедневная. Я могу такое теоретически представить в стартапе из двух девелоперов осуществляющих фуллстэк. Никогда и близко не видел таких позиций в серьезных корпорациях с 401К, RSUs, и 3+ недели отпуска.

И еще кстати. В параллельной вселенной где начальственное ворье за 30 лет не развалило все до фундамента и под конец не спалило в войне оставшееся. ClaudCode был бы не то Украинским не то отечественным продуктом союза РСФСР/УССР/БССР.

#killmenow

Fool's Gold

March 14th, 2026 04:01 pm
cali4nickation: (Default)
[personal profile] cali4nickation
Работа-работа, перейди на чат-бота.

"Technology companies are adding a fourth component to engineering compensation : salary, bonus, options, & inference costs. Levels.fyi pegs the 75th percentile software engineer salary at $375k. Add $100k in inference & the fully loaded cost is $475k. That’s 21% in tokens.

The question CFOs will pose : what am I getting for all this inference spend? Can I do it cheaper? If the metric for a new cloud is gross profit per GPU hour, the employee equivalent is : productive work per dollar of inference.

For me, the answer is 31 tasks a day at $12k annually. The engineer still burning $100k? They’d better be 8x more productive! Will you be paid in tokens? In 2026, you likely will start to be.
"

Кстати говоря. Ну я могу представить Linux core contributor или еще какого специалиста по выжиманию микросекунд из FPGA в HFT. Или там пишут distributed database engine. Сколько таких, один процент даже в FAANGе? Откуда такие зарплаты вообще и тем более в разгар агентов? Что эти люди такое уникальное умеют? То место где я краем глаза видел такие деньги совершенно ничего такого не требовало. Половину сразу выгнать, второй снизить зарплату в два раза до рыночного уровня - глядишь и оффшор будет не так выгоден.
[personal profile] toi_samyi
Все знают, что немцы воевали на синтетическом бензине. Все знают, что дизельных танков у немцев не было. А почему? Обычный (в тырнете) ответ: "Синтетическую соляру они делать не могли", или как-то так. Следующая итерация - "Синтез по Фишеру-Тропшу наладили слишком поздно, а "гидрогенизацией угля" дизель не получишь". Теперь на warspot срывают покровы - оказывается дизельные танки у немцев были, но в серию они их не пустили. Разоблачают мифы о недоступности в Германии дизельного топлива, но делают это без должного уважения понимания. Либо же сознательно фальсифицируют - трудно понять.
Конечно, мало кто из авторов на warspot любит химию, а химтехнологию и подавно. Да, в Wiki есть хорошие статьи и по методу Бергиуса и по Фишеру-Тропшу, но там слишком много букв. Трудно понять, тем более, что "сейчас" это не "тогда" и не в Германии. Поэтому, только цифры:

The United States Strategic Bombing Survey: Summary Report (European War) 1945

Бомбардировки заводов начались в мае 1944. Первый квартал 1944 - последний, когда они работали более-менее нормально.

Гидрогенизацией (по методу Бергиуса) в начале 44-го года получили в семь раз (7,44 раза) больше продукции, чем по методу Фишера-Тропша. Да, метод Бергиуса более старый, но и и по Фишеру-Тропфу первый завод заработал ещё в 1935 году. А к 1944 году было уже 9 таких предприятий. Метод Бергиуса куда более сложный и энегроёмкий. По-просту говоря - он много дороже. Почему же этот "старый" метод использовали в таких масштабах? Ответ в таблице.

По Бергиусу получалось (от общего) 53,2% авиационного бензина, а по Фишеру его совсем не получалось.
Дальше можно было бы и не продолжать, но тем не менее,  продолжу :-)

Выход:
По Бергиусу автомобильный - 14,6%, по Фишеру - 52%
По Бергиусу дизель - 16%, по Фишеру - 20,5% (мизерная разница)
По Бергиусу мазут - 4%, по Фишеру - нет
По Бергиусу смазочное масло - 1,2%, по Фишеру - 2,4%
По Бергиусу синтез-газ - 10,3%, по Фишеру - 9,4%
По Бергиусу другое - 0,4%, по Фишеру - 15,7% (не знаю что это, предполагаю - кислородсодержащая органика типа жирных кислот)

Ели вам не нужен авиационный бензин - ваш выбор метод Фишера.
А соляры и там и там получается почти одинаково.
С другой стороны, в качестве автомобильного топлива (и весьма хорошего) можно использовать синтетический бензол и спирт. Что немцы и делали.
[syndicated profile] r_bloggers_feed

Posted by rprogrammingbooks

[This article was first published on Blog - R Programming Books, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Bayesian linear regression is one of the best ways to learn Bayesian modeling in R because it combines familiar regression ideas with a more realistic treatment of uncertainty. Instead of estimating a single fixed coefficient for each parameter, Bayesian methods estimate full probability distributions. That means we can talk about uncertainty, prior beliefs, posterior updates, and credible intervals in a way that is often more intuitive than classical statistics.

In this tutorial, you will learn how to fit a Bayesian linear regression model in R step by step. We will start with the theory, build a dataset, choose priors, fit a model with brms, inspect posterior distributions, evaluate diagnostics, perform posterior predictive checks, and generate predictions for new observations. We will also look at several R packages that belong to a practical Bayesian workflow.

What you will learn in this tutorial:
  • How Bayesian linear regression works
  • How priors and posteriors differ from classical estimates
  • How to fit a model in R using brms
  • How to inspect convergence and model quality
  • How to use related packages such as tidybayes, bayesplot, and rstanarm

Why Bayesian Linear Regression Matters

In classical linear regression, the model estimates coefficients such as the intercept and slope as fixed unknown values. In Bayesian regression, those same coefficients are modeled as random variables with prior distributions. Once data is observed, those priors are updated into posterior distributions.

This gives us several advantages:

  • We can incorporate prior knowledge into the model
  • We get full uncertainty distributions, not just point estimates
  • Predictions naturally include uncertainty
  • Bayesian methods scale well into multilevel and hierarchical models
  • The interpretation of intervals is often more direct

If you are working in predictive analytics, experimental analysis, or sports modeling, this framework is especially useful because it lets you update beliefs as new data arrives.

The Bayesian Formula Behind Linear Regression

A simple linear regression can be written as:

y = β0 + β1x + ε

Where:

  • y is the response variable
  • x is the predictor
  • β0 is the intercept
  • β1 is the slope
  • ε is the error term, typically assumed to be normally distributed

In the Bayesian version, we add priors:

β0 ~ Normal(0, 10)
β1 ~ Normal(0, 5)
σ  ~ Student_t(3, 0, 2.5)

After seeing the data, we compute:

Posterior ∝ Likelihood × Prior

That one line is the core of Bayesian inference.

R Packages You Should Know for Bayesian Regression

Before fitting models, it helps to understand the ecosystem. Bayesian modeling in R is not just about one package. It is usually a workflow involving model fitting, posterior extraction, diagnostics, and visualization.

brms

High-level Bayesian regression modeling with formula syntax similar to lm() and glm().

rstanarm

Bayesian applied regression with an interface that feels familiar to many R users.

tidybayes

Extracts and visualizes posterior draws in a tidy format for easy analysis and plotting.

bayesplot

Useful for trace plots, posterior predictive checks, and MCMC diagnostics.

posterior

Helpful for working with draws, summaries, and posterior diagnostics in a standardized way.

cmdstanr

R interface to CmdStan, useful for users who want more direct Stan workflows and model control.

loo

Widely used for approximate leave-one-out cross-validation and model comparison.

ggplot2

Still essential for custom data exploration and clean visualization of posterior summaries.

Installing the Required Packages

For this tutorial, we will focus on brms for model fitting, while also using a few companion packages for diagnostics and visualization.

install.packages(c(
  "brms",
  "tidyverse",
  "tidybayes",
  "bayesplot",
  "posterior",
  "loo",
  "rstanarm"
))

Then load the packages:

library(brms)
library(tidyverse)
library(tidybayes)
library(bayesplot)
library(posterior)
library(loo)
library(rstanarm)
Tip: If you want a lower-level interface to Stan, you can also explore cmdstanr. For most readers, however, brms is a better starting point because it keeps the syntax concise while still giving access to advanced Bayesian models.

Creating a Simple Dataset

To make the tutorial reproducible, we will simulate a small dataset where one predictor explains a continuous response.

set.seed(123)

n <- 120

advertising_spend <- rnorm(n, mean = 15, sd = 4)

sales <- 20 + 3.5 * advertising_spend + rnorm(n, mean = 0, sd = 8)

df <- data.frame(
  advertising_spend = advertising_spend,
  sales = sales
)

head(df)

In this synthetic example, higher advertising spend tends to increase sales. The true slope used in the simulation is 3.5, but in a real modeling situation we would not know that value ahead of time.

Exploring the Data First

It is always a good idea to inspect the data visually before fitting any Bayesian model.

summary(df)

ggplot(df, aes(x = advertising_spend, y = sales)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE) +
  theme_minimal()

This scatterplot helps confirm that the relationship is roughly linear. Even when using Bayesian methods, the basic logic of exploratory data analysis still applies.

A Quick Classical Baseline with lm()

Before fitting the Bayesian model, it is useful to compare it with a standard linear regression.

lm_model <- lm(sales ~ advertising_spend, data = df)
summary(lm_model)

This gives us a baseline estimate for the intercept and slope. Later, we will compare it to Bayesian posterior summaries.

Choosing Priors

Priors are a defining part of Bayesian modeling. A prior reflects what we believe about a parameter before seeing the current data. Priors can be weakly informative, informative, or strongly regularizing depending on the context.

In many practical applications, weakly informative priors are a good default.

priors <- c(
  prior(normal(0, 20), class = "Intercept"),
  prior(normal(0, 10), class = "b"),
  prior(student_t(3, 0, 10), class = "sigma")
)

priors

This prior specification says:

  • The intercept is centered around 0 with broad uncertainty
  • The slope is also centered around 0 with a wide standard deviation
  • The residual standard deviation is positive and given a weakly informative prior

In real projects, priors should reflect domain knowledge whenever possible. For example, in marketing, finance, or sports analytics, prior expectations often come from previous seasons, experiments, or historical model performance.

Fitting the Bayesian Linear Regression Model

Now we can fit the Bayesian model using brm().

bayes_model <- brm(
  formula = sales ~ advertising_spend,
  data = df,
  prior = priors,
  family = gaussian(),
  chains = 4,
  iter = 4000,
  warmup = 2000,
  seed = 123
)

Here is what the most important arguments mean:

  • chains = 4: run four independent Markov chains
  • iter = 4000: total iterations per chain
  • warmup = 2000: burn-in samples used for adaptation
  • family = gaussian(): assume a normal likelihood for the response

Reading the Model Summary

summary(bayes_model)

The summary output typically reports:

  • Posterior mean and standard error for each parameter
  • Credible intervals
  • R-hat values for convergence
  • Effective sample sizes

A good sign is when R-hat values are close to 1.00. That suggests the MCMC chains mixed well and converged.

Understanding the Posterior Output

Suppose the slope posterior is centered near 3.4 with a 95% credible interval from 3.0 to 3.8. In Bayesian terms, that means the model assigns high posterior probability to the slope being in that interval. This is one reason many analysts find Bayesian intervals easier to interpret.

In practical language, we could say:

Based on the model and the observed data, higher advertising spend is strongly associated with higher sales, and the posterior distribution indicates that the effect is likely positive and substantial.

Extracting Posterior Draws

One of the strengths of Bayesian modeling is that you can work directly with posterior draws.

draws <- as_draws_df(bayes_model)
head(draws)

This lets you explore parameter distributions, uncertainty, and custom probabilities.

mean(draws$b_advertising_spend > 0)

The code above estimates the posterior probability that the slope is greater than zero. That is a very natural Bayesian quantity.

Visualizing Posterior Distributions

plot(bayes_model)

The default plot gives a quick view of posterior densities and chain behavior. You can also visualize intervals more explicitly:

mcmc_areas(
  as.array(bayes_model),
  pars = c("b_Intercept", "b_advertising_spend", "sigma")
)

This is where bayesplot becomes especially useful.

Checking Convergence with Trace Plots

Trace plots help determine whether the MCMC chains have mixed properly.

mcmc_trace(
  as.array(bayes_model),
  pars = c("b_Intercept", "b_advertising_spend", "sigma")
)

Healthy trace plots should look like fuzzy horizontal bands rather than trending lines or stuck sequences.

Posterior Predictive Checks

Posterior predictive checks are one of the most important parts of a Bayesian workflow. They compare the observed data to data simulated from the fitted model.

pp_check(bayes_model)

If the simulated data looks broadly similar to the observed data, that is a sign the model captures the main structure reasonably well.

You can also try more specific checks:

pp_check(bayes_model, type = "dens_overlay")
pp_check(bayes_model, type = "hist")
pp_check(bayes_model, type = "scatter_avg")

Using tidybayes for Tidy Posterior Workflows

The tidybayes package is extremely useful when you want to extract posterior draws into tidy data frames and build custom visualizations with ggplot2.

tidy_draws <- bayes_model %>%
  spread_draws(b_Intercept, b_advertising_spend, sigma)

head(tidy_draws)

For example, you can visualize the slope distribution:

tidy_draws %>%
  ggplot(aes(x = b_advertising_spend)) +
  geom_density(fill = "steelblue", alpha = 0.4) +
  theme_minimal()

This makes posterior analysis much more flexible than relying only on built-in summary output.

Generating Predictions for New Data

One of the biggest reasons to use regression is prediction. Bayesian models make this especially valuable because predictions come with uncertainty intervals.

new_customers <- data.frame(
  advertising_spend = c(10, 15, 20, 25)
)

predict(bayes_model, newdata = new_customers)

You can also generate expected values without residual noise:

fitted(bayes_model, newdata = new_customers)

The difference is important:

  • predict() includes outcome uncertainty
  • fitted() focuses on the expected mean response

Visualizing the Regression Line with Uncertainty

conditional_effects(bayes_model)

This is a quick way to visualize the fitted relationship and credible intervals. It is particularly useful when presenting results to readers who are new to Bayesian modeling.

Comparing the Classical and Bayesian Models

Aspect Classical lm() Bayesian Model
Parameter estimates Single point estimate Full posterior distribution
Intervals Confidence intervals Credible intervals
Prior knowledge Not included directly Included through priors
Predictions Often point-centered Naturally uncertainty-aware
Interpretability Frequentist framework Probability-based framework

Alternative Approach with rstanarm

If you want a very approachable alternative to brms, you can fit a similar model with rstanarm.

rstanarm_model <- stan_glm(
  sales ~ advertising_spend,
  data = df,
  family = gaussian(),
  chains = 4,
  iter = 4000,
  seed = 123
)

print(rstanarm_model)

This package is especially attractive for users who want Bayesian estimation with minimal syntax changes from familiar regression workflows.

Model Comparison with loo

For more advanced workflows, model comparison is often done with approximate leave-one-out cross-validation.

loo_result <- loo(bayes_model)
print(loo_result)

This becomes particularly useful when comparing multiple Bayesian models with different predictors or structures.

Common Beginner Mistakes in Bayesian Regression

  • Using priors without thinking about the scale of the data
  • Ignoring convergence diagnostics such as R-hat and trace plots
  • Skipping posterior predictive checks
  • Confusing credible intervals with classical confidence intervals
  • Treating Bayesian modeling as only a different fitting function rather than a full workflow

When Bayesian Linear Regression Is a Great Choice

Bayesian linear regression is especially useful when:

  • You want to express uncertainty directly
  • You have prior knowledge from previous studies or historical data
  • Your sample size is not huge and regularization helps
  • You plan to expand into hierarchical or multilevel models later
  • You need probabilistic predictions rather than just fitted coefficients

From Linear Regression to Real-World Prediction

Once you understand Bayesian linear regression, you can move into more realistic applications such as multilevel models, logistic regression, time series forecasting, and domain-specific predictive systems. In practice, many analysts first learn Bayesian methods through regression, then extend them into richer workflows for decision-making and forecasting.

If your interest goes beyond introductory examples and into real prediction workflows, Bayesian methods are especially valuable in sports modeling, where uncertainty, updating, and probabilistic forecasts matter a lot.

Those kinds of projects often build on the same foundations covered here: priors, posterior updating, uncertainty-aware prediction, and iterative model improvement.

Conclusion

Bayesian linear regression in R is one of the best entry points into Bayesian statistics because it combines familiar regression ideas with a much richer treatment of uncertainty. Instead of asking only for a coefficient estimate, you ask for a distribution of plausible values. Instead of pretending uncertainty is secondary, you put it at the center of the analysis.

In this tutorial, we covered the full process:

  • Building a dataset
  • Understanding priors
  • Fitting a model with brms
  • Inspecting posterior summaries
  • Checking convergence and fit
  • Generating predictions
  • Using additional packages from the Bayesian R ecosystem

Once you are comfortable with these steps, the next natural move is to explore Bayesian logistic regression, hierarchical models, and domain-specific forecasting systems.

The post Bayesian Linear Regression in R: A Step-by-Step Tutorial appeared first on R Programming Books.

To leave a comment for the author, please follow the link and comment on their blog: Blog - R Programming Books.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Bayesian Linear Regression in R: A Step-by-Step Tutorial

Current views on generative AI

March 13th, 2026 11:00 pm
[syndicated profile] r_bloggers_feed

Posted by Fran&#231;oisn - f@briatte.org

[This article was first published on R / Notes, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post contains my current views on generative artificial intelligence, and Large Language Models in particular. The context is mostly academia, which is about research and teaching.

Personal context

Generative AI is slowly creeping into my professional workflow, not because I am using it myself (I don’t, although I guess that I will, at some point), but because everyone around me is.

My students use ChatGPT and other tools like, I believe, NotebookLM and Perplexity Comet. My RSS news feed (that’s how old I am) recently had an article on Claude, and I use use Google applications, so I keep getting passively-aggressively asked to use Gemini, which I might do one day via Scholar Labs.

My workplace, which is a university, has taken a very basic stance on generative AI: unless stated otherwise, students are to follow LSE Position 1 (no use of generative AI in graded work), which I suppose goes both ways (no use of generative AI in grading, either).

I do not know of any equivalent position on generative AI in research. It seems like everyone wants to discuss the topic and play around with whatever is available for free online, but no one wants to make hard decisions about it yet, possibly due to upcoming EU-level regulations.

Risks for teaching and learning

From a teaching perspective, generative AI is only useful to me if it helps students going through the following process:

  1. Learn
  2. Draft
  3. Revise
  4. Submit
  5. Defend

Part of what I teach is code, and code is the topic of this blog. As it happens, generative AI is already very good with code, and I am confident that it can be put to good use to go through Steps 1–3 of the process above.

There are, however, at least four reasons why I am currently taking ‘LSE Position 1’ on using generative AI in graded work that relies on code:

  1. Many students are using AI to bypass the learning process, rather than enhance it. This creates security risks, and violates academic ethics in the same way that hiring an external party would. This comes on top of other breaches of students ethics, such as plagiarism.
  2. The two issues mentioned in the previous point cannot be defended against at my level, at least not with my current resources. I can spot security risks, but I cannot reliably detect AI-generated code, which is neither watermarked or scannable through anti-plagiarism tools.
  3. The software that I use in class is mostly open-source, and reproducibility is part of the core principles that I teach in class. As far as I understand, and unless proven otherwise, the kind of generative AI technology used by my students does not enforce these principles.
  4. To make things worse, most generative AI also violates intellectual property, rather than reconfigure it around the ‘copyleft’ and ‘creative commons’ principles that many of us have spent years defending and advocating within fields such as academic publishing.

I have not been exposed to any argument that makes any attempt at solving the ethical, logistical, moral and eventually legal issues that I have outlined above. Until I do, I will treat generative AI as a form of doping, and will keep banning it.

The analogy above with doping is not an innocent one. There is, in my view, a very real rhetorical arc that goes from generative AI to the Enhanced Games. Higher education does not approve of students taking Adderall, and neither do I.

Risks for scientific research

From a research perspective, generative AI is only useful to me if it helps me going through the following process:

  1. Compile existing evidence
  2. Collect meaningful data
  3. Produce meaningful measures
  4. Formulate correct interpretations
  5. Enhance existing knowledge

There is no doubt that generative AI can help with every step above, especially perhaps at the level of data collection and, in the case of ‘big data’ or whatever people call it today, classification. I am also very interested in what it can contribute with regards to compiling scientific studies, in the same way that it is already helping with mathematical problems.

The risks that I have heard about so far when it comes to generative AI and social science research (which is what I do) are the following:

  1. Generative AI can poison the evidence base (Bail 2024) through the mass production of low-quality academic output, or by compromising data such as online surveys (Westwood 2025, Westwood and Frederick 2026). This is already happening.
  2. Generative AI does not yet produce reliable data annotations for the kind of data that I am interested in (Yang et al. 2025), and even if its coding reliability improves, it will require additional effort to mitigate related issues (Baumann et al. 2025).
  3. Relatedly, generative AI cannot improve organically if it maintains its human bias towards evidence produced in the Global North (Ramirez-Ruiz and Senninger 2025), mostly by ‘WEIRD’ individuals (Atari et al. 2023). This will be hard and slow to solve.
  4. Last but not least, generative AI will be used to erode scientific authority at the profit of those who are interested in attacking the contribution that scientific (and higher education) institutions make to society. This is of course far from a trivial issue.

The issues listed are all real, hard to solve, and are controversial insofar as some people have a vested interest in seeing them not addressed, at least not in the short term.

None of these issues will stop me from installing and trying out ellmer one day. However, I do expect this to happen within a scientific environment that will have acknowledged each issue in one way or another, and formulated guidelines to address them.

Are we there yet?


This post was inspired by the /ai ‘manifesto’, which I discovered thanks to Andrew Heiss. I obtained some of the cited references through Jessica Hullman’s ‘New course on generative AI for behavioral science’ blog post.

To leave a comment for the author, please follow the link and comment on their blog: R / Notes.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Current views on generative AI

Profile

megapasha: (Default)
megapasha

January 2026

S M T W T F S
    123
456789 10
11121314151617
18192021222324
25262728293031

Most Popular Tags