Hacker Newsnew | past | comments | ask | show | jobs | submit | stevenae's commentslogin

The quantitative ux research team at Google was created for exactly this problem: a service which became popular before the right metrics existed, meaning metrics need to be derived first, then optimized. We would observe users (irl), read their logs, then generate experiments to improve the behavior as measured by logs, and return to see if the experiment improves irl experiences. There were not many of us and we are around :)


I worked with Boris in the past and in my experience, Boris cares deeply about the customer. I'd vouch that Boris really cares about the issue people are running into.


But no other user has yet come and said "I worked with ajma in the past ..." so how can we trust your judgement about Boris?


I saw this guy named Claude saying ajma is a genius!



Nice try boris


[flagged]


Anthropic can't win in this case.

They don't use Claude Code, they get accused that they don't even trust it themselves.

They use Claude Code, they get accused the code is shit because it's slop.

I think dogfooding is known to be a legitimate approach here.


The idea is that Claude Code is surprisingly buggy and unrefined for something created by the very tool and processes that are supposed to be replacing us as we speak.


The idea is that sculpted ideal code is rarely the best choice.


At the same time I'd say sloppy code (human or AI generated) is rarely the best choice. I'd say the best is in between.


I don’t see (nor do I care about) the code and how sculpted it is. I perceive the tool as buggy.


And they don't use our version of CC, or with our settings. They have flags for internal use only.


> Anthropic can't win in this case.

Sure they can. The solution is pretty simple and in your own post. Choose either:

* Make the product good to the point code is no longer slop and shit.

* Stop hyping the quality when it isn’t there.

* Do a hybrid approach. Use their own product but actually have competent humans in the loop to make the code good.

This is not hard. Be honest and humble and that criticism goes away. It’s no one’s fault but Anthropic’s that they hype up their product to more than it can do and use it carelessly to build itself. It’s not a no-win scenario if you’re the one causing your own obviously avoidable problems.


Google products ux is widely acknowledged to be a steaming pile of shit though, so I am not sure you should follow their example.

Many of the metrics they use are obviously actively user hostile.


Metrics and quantitative ux results in really bad software, making it rigid while optimizing for the wrong things.

The most obvious example is Google creating multiple steps for Login where you have to enter your password after you put in your user.

I wonder what metric lead to that decision or was it a political decision to make it seem like their "old" software has some new feature.


If you mean Google website login, that step is needed because the email address is used to determine which identity provider to use. E.g. I have three different accounts that branch off from that same initial login flow.

One is my person "gmail.com" account, and the other two go through enteprise identity providers related to my employment and their G-Suite licenses. So after I put in one of these three email addresses, I get prompted for the appropriate next step. Only one of them involves giving a password to a Google server. The other two are redirects to completely separate login systems operated by my employer.


I mean I get it logically makes sense. But it still seems like a waste of time for a small percentage of use cases.

Maybe a better approach is put in your login have it automatically detect if it requires an identity provider. Gray out the password to signal to the user password is not necessary and automatically redirect.

Less clicking, don't break flow and think of a smoother solution.


This helped me, coming from an ml background: https://randomrealizations.com/posts/xgboost-explained/


Others mentioned county data. If you can get that, you can build something like I did for DC -- https://colab.research.google.com/drive/1Kep_9j_PN_SxX85PYHE...


To clarify, you'd prefer rmsle?


Short answer: i use multiple metrics, never rely on just 1 metric.

Long answer: Is the metric for people with subject-matter knowledge? Then (Weighted)RMSSE, or the MASE alternative for a median forecast. WRMSSE is is very nice, it can deal with zeroes, is scale-invariant and symmetrical in penalizing under/over-forecasting.

The above metrics are completely uninterpretable to people outside of the forecasting sphere though. For those cases i tend to just stick with raw errors; if a percentage metric is really necessary then a Weighted MAPE/RMSE, the weighing is still graspable for most, and it doesn't explode with zeroes.

I've also been exploring FVA (Forecast Value Added), compared against a second decent forecast. FVA is very intuitive, if your base-measures are reliable at least. Aside from that i always look at forecast plots. It's tedious but they often tell you a lot that gets lost in the numbers.

RMSLE i havent used much. From what i read it looks interesting, though more for very specific scenarios (many outliers, high variance, nonlinear data?)


MAPE can be a problem also if you have a problem where rare excursions are what you want to predict and the cost of missing an event is much higher than predicting a non-event. A model that just predicts no change would have very low MAPE because most of the time nothing happens. When the event happens, however, the error of predicting status quo ante is much worse than small baseline errors.


My reading of this situation is that MAPE would do the opposite. Means are skewed towards outliers.


Thanks for the reply! I am outside the forecasting sphere.

RMSLE gives proportional error (so, scale-invariant) without MAPE's systematic under-prediction bias. It does require all-positive values, for the logarithm step.


> Lately, I just steal embeddings from big models and slap a dumb classifier on top. Works better, runs faster, less drama.

You may know this but many don't -- this is broadly known as "transfer learning".


Is it, even when applied to trivial classifiers (possibly "classical" ones)?

I feel that we're wrong to be focusing so much on the conversational/inference aspect of LLMs. The way I see it, the true "magic" hides in the model itself. It's effectively a computational representation of understanding. I feel there's a lot of unrealized value hidden in the structure of the latent space itself. We need to spend more time studying it, make more diverse and hands-on tools to explore it, and mine it for all kinds of insights.


For this and sibling -- yes. Essentially, using the output of any model as an input to another model is transfer learning.


I agree. Isn't this just utilizing the representation learning that's happened under the hood of the LLM?


How accurate is his claim that Augustus became emperor through (my paraphrasing) democratic means and promises to fix real problems for Romans?


I recommend "Augustus: First Emperor of Rome" by Adrian Goldsworthy.

Given that "democratic" didn't mean the same thing then it does now (with suffrage limited to a small group of the uber-rich), and that some of the problems he was fixing was "the threat of this army I happen to have" and "this war I actively participated in", I don't think it is wrong. He wasn't in Rome when the Senate awarded him power and the Vestal Virgins drank in his name, which isn't something that would be commanded.

After decades of war and strife and food shortages, peace under one warlord looked more appealing than having three who would likely eventually be at each other's throats.


Thank you!


Well, first, you have to consider that the Roman republic was never really democratic but was at the hand of a small aristocracy which had sometimes but rarely the interest of the Romans at all, let alone the non citizen inhabitants.

But even, then it is definitely not accurate. Augustus gained powers through the numerous conflicts which followed Caesar murder at a time when the Republic was already challenged thanks to legions he more or less inherited (oversimplification) from Caesar. He was given powers by the Senate through what we would call rubber stamping only after his military power was inescapable.


This still strikes me as escapism.


What are you escaping from in this case?


Awareness of your mortality


You have to do something with your life anyway, right? I always envied people who have a calling they are good at and work on essentially until they die (especially in academia and art), since I'm not sure if I have one and if I do (designing 4x God sim games?) I'm unlikely to be paid for it even if I was good, which is itself also unlikely.

Then there's also the case where following your passion is near impossible without a large organization, anything from space to medicine.

But even forgetting all that, there is no reason engineering challenges, team dynamics and sense of accomplishment at a work project can't be higher than for the personal projects you'd do by yourself. Granted, most jobs aren't like that (for myself or for most people) but some of my most challenging and exciting projects were at work.

If you're gonna spend time until you die doing tech things you might as well get paid for it. The less you need the latter the pickier you can be, with your own thing becoming /another option/ at some point.



There was a saying at Google, I code for free, they pay me for XYZ (literally everything else).


The 90ies just called- they want their lame it jokes back.


Disagree with the first piece about only using the top 0.1%. I grew up (through my 20's) shooting on a Pentax K1000, cheap workhorse of a camera, and I preferred its ergonomics to top-end mirrorless cameras I use today.


The K1000 is generally considered among the best film SLRs ever made, especially for the price, and easily falls into the top 0.1% category in my mind. There's a reason why it was in continuous production for 20 years with hardly any changes to its design.


I guess my quibble is with the percentage, then. A good, cheap, plentiful camera belies the idea that only the top 0.1% of cameras were good.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: