To whoever does that, I hope that there is a special place in hell where they force you to do type safe API bindings for a JSON API, and every time you use the wrong type for a value, they cave your skull in.
Sadly it doesn’t fix the bad documentation problem. I often don’t care that a field is special and either give a string or number. This is fine.
What is not fine, and which should sentence you to eternal punishment, is to not clearly document it.
Don’t you love when you publish a crate, have tested it on thousands of returned objects, only for the first issue be “field is sometimes null/other type?”. You really start questioning everything about the API, and sometimes you’d rather parse it as serde::Value and call it a day.
The worst thing is: you can’t even put an int in a json file. Only doubles. For most people that is fine, since a double can function as a 32 bit int. But not when you are using 64 bit identifiers or timestamps.
That’s an artifact of JavaScript, not JSON. The JSON spec states that numbers are a sequence of digits with up to one decimal point. Implementations are not obligated to decode numbers as floating point. Go will happily decode into a 64-bit int, or into an arbitrary precision number.
Unless you’re dealing with some insanely flexible schema, you should be able to know what kind of number (int, double, and so on) a field should contain when deserializing a number field in JSON. Using a string does not provide any benefits here unless there’s some big in your deserialzation process.
What’s the point of your schema if the receiving end is JavaScript, for example? You can convert a string to BigNumber, but you’ll get wrong data if you’re sending a number.
I’m not following your point so I think I might be misunderstanding it. If the types of numbers you want to express are literally incapable of being expressed using JSON numbers then yes, you should absolutely use string (or maybe even an object of multiple fields).
I am not sure what could be the example, my point was that the spec and the RFC are very abstract and never mention any limitations on the number content. Of course the implementations in the language will be more limited than that, and if limitations are different, it will create dissimilar experience for the user, like this: Why does JSON.parse corrupt large numbers and how to solve this
This is what I was getting at here programming.dev/comment/10849419 (although I had a typo and said big instead of bug). The problem is with the parser in those circumstances, not the serialization format or language.
I disagree a bit in that the schema often doesn’t specify limits and operates in JSON standard’s terms, it will say that you should get/send a number, but will not usually say at what point will it break.
This is the opposite of what C language does, being so specific that it is not even turing complete (in a theoretical sense, it is practically)
Then the problem is the schema being under specified. Take the classic pet store example. It says that the I’d is int64. petstore3.swagger.io/#/store/placeOrder
If some API is so underspecified that it just says “number” then I’d say the schema is wrong. If your JSON parser has no way of passing numbers as arbitrary length number types (like BigDecimal in Java) then that’s a problem with your parser.
I don’t think the truly truly extreme edge case of things like C not technically being able to simulate a truly infinite tape in a Turing machine is the sort of thing we need to worry about. I’m sure if the JSON object you’re parsing is some astronomically large series of nested objects that specifications might begin to fall apart too (things like the maximum amount of memory any specific processor can have being a finite amount), but that doesn’t mean the format is wrong.
And simply choosing to “use string instead” won’t solve any of these crazy hypotheticals.
As if I had a choice. Most of the time I’m only on the receiving end, not the sending end. I can’t just magically use something else when that something else doesn’t exist.
Heck, even when I’m on the sending end, I’d use JSON. Just not bullshit ones. It’s not complicated to only have static types, or having discriminant fields
You HAVE to. I am a Rust dev too and I’m telling you, if you don’t convert numbers to strings in json, browsers are going to overflow them and you will have incomprehensible bugs. Json can only be trusted when serde is used on both ends
This is understandable in that use case. But it’s not everyday that you deal with values in the range of overflows. So I mostly assumed this is fine in that use case.
Well, apart from float numbers and booleans, all other types can only be represented by a string in JSON. Date with timezone? String. BigNumber/Decimal? String. Enum? String. Everything is a string in JSON, so why bother?
Well, the issue is that JSON is based on JS types, but other languages can interpret the values in different ways. For example, Rust can interpret a number as a 64 bit int, but JS will always interpret a number as a double. So you cannot rely on numbers to represent data correctly between systems you don’t control or systems written in different languages.
No problem with strings in JSON, until some smart developer you get JSONs from decides to interchangeably use String and number, and maybe a boolean (but only false) to show that the value is not set, and of course null for a missing value that was supposed to be optional all along but go figure that it was
I do this constantly. undefined: not retrieved yet. null: Error when retrieving. Makes it easy to reason about what the current state of the data is without the need for additional status flags.
Sure, Java can tell the difference. But that doesn’t mean that the guy writing the API cares whether or not he adds a key to the dictionary before yeeting it to the client.
That’s the thing though, isn’t it? The devs on either side are entering into a contract (the API) that addresses this issue, even if by omission. Whoever breaks the contract must rightfully be ejected into the stratosphere.
That’s exactly not the thing, because nobody broke the contract, they simply interpret it differently in details.
Having a null reference is perfectly valid json, as long as it’s not explicitly prohibited. Null just says “nothing in here” and that’s exactly what an omission also communicates.
The difference is just whether you treat implicit and explicit non-existence differently. And neither interpretation is wrong per contract.
Omission means it’s not there and I’m not telling you anything about it.
There is a world of difference between those two statements. It’s the difference between telling someone you’re single or just sitting there and saying nothing.
If there’s a clear definition that there can be something, implicit and explicit omission are equivalent. And that’s exactly the case we’re talking about here.
At the (SQL) database level, if you are using null in any sane way, it means “this value exists but is unknown”. Conflating that with “this value does not exist” is very dangerous. JavaScript, the closest thing there is to a reference implementation for json serialization, drops attributes set to undefined, but preserves null. You seem to be insisting that null only means “explicit omission”, but that isn’t the case. Null means a variety of subtly different things in different contexts. It’s perfectly fine to explicitly define null and missing as equivalent in any given protocol, but assuming it is not.
Is SQL an API contract using JSON? I hardly think so.
Java does not distinguish between null and non-existence within an API contract. Neither does Python. JS is the weird one here for having two different identifiers.
Why are you so hellbent on proving something universal that doesn’t apply for the case specified above? Seriously, you’re the “well, ackshually” meme in person. You are unable or unwilling to distinguish between abstract and concrete. And that makes you pretty bad engineers.
If your SQL model has nulls, and you don’t have some clear way to conserve them throughout the data chain, including to the json schema in your API contract, you have a bug. That way to preserve them doesn’t have to be keeping nulls distinct from missing values in the json schema, but it’s certainly the most straightforward way.
The world has more than three languages, and the way Java and Python do things is not universally correct. I’m not up to date on either of them, but I’m also guessing that they both have multiple libraries for (de) serialization and for API contract validation, so I am not really convinced your claims are universal even within those languages.
I am not the other person you were talking to, I’ve only made one comment on this, so not really “hellbent”, friend.
Yes, I am pretty sure I read the comments, although you’re making me wonder if I’m missing one. What specific comment, what “case specified above” are you referring to? As far as I can see, you are the one trying to say that if a distinction between null and a non-existent attribute is not specified, it should universally be assumed to be meaningless and fine to drop null values. I don’t see any context that changes that. If you can point it out, specifically, I’ll be glad to reassess.
At the (SQL) database level, if you are using null in any sane way, it means “this value exists but is unknown”.
Null at the SQL means that the value isn’t there, idk where you’re getting that from. SQL doesn’t have anything like JS’s undefined, there’s no other way to represent a missing value in sql other than null (you could technically decide on certain values for certain types, like an empty string, but that’s not something SQL defines).
I (think, at least) the point they’re making is that unless the API contract specifically differentiates between “present and null” and “absent” then there is no difference. (Specifically for field values.)
The point I’m making is kind of the opposite, unless the contract explicitly states that they’re the same they should not be treated as the same, because at a fundamental level they are not the same thing even if Java wants to treat them as such.
Kinda, I guess we all can agree it’s more typical to deserialize into POJO where theres is no such thing as missing field. Otherwise why would you choose Java if you don’t use types. This great precondition for various stupid hacks to achieve „patching” resources, like blank strings or negative numbers for positive-only fields or even Optional as a field.
Also, I like how this problem had a really simple solution all along
There really isn’t anything we can do to prevent memory safety vulnerabilities from happening if the programmer doesn’t want to write their code in a robust manner.
Yeah, totally, it’s all those faulty programmers fault. They should’ve written good programmes instead of the bad ones, but they just refuse to listen
Right, those devs with 20+ years C experience don’t know shit about the language and are just lazy. They don’t want to catch up with the times and write safe C. It’s me, the dude with 5 years of university experience who will set it straight. Look at my hello world program, not a single line of vulnerable code.
Yeah, for sure. Human error is involved in C and inertia too. New coding practices and libraries aren’t used, tests aren’t written, code quality sucks (variable names in C are notoriously cryptic), there’s little documentation, many things are rewritten (seems like everybody has rewritten memory allocation at least once), one’s casual void * is another’s absolute nono, and so on.
It has nothing to do with knowing the language and everything to do with what’s outside of the language. C hasn’t resembled CPUs for decades and can’t be reasonably retrofitted for safety.
json doesn’t have ints, it has Numbers, which are ieee754 floats. if you want to precisely store the full range of a 64 bit int (anything larger than 2^53 -1) then string is indeed the correct type
json doesn’t have ints, it has Numbers, which are ieee754 floats.
No. numbers in JSON have arbitrary precision. The standard only specifies that implementations may impose restrictions on the allowed values.
This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available.
Well, you’re right. I wasn’t getting it, but I’ve also never seen any piece of software that would treat a single leading zero as octal. That’s just a recipe for disaster, and it should use 0o116 to be unambiguous
(I am a software engineer, but was assuming you meant it was hardcoded to parse as octal, not some weird auto-detect)
It’s been a long time, but I’m pretty sure C treats a leading zero as octal in source code. PHP and Node definitely do. Yes, it’s a bad convention. It’s much worse if that’s being done by a runtime function that parses user input, though. I’m pretty sure I’ve seen that somewhere in the past, but no idea where. Doesn’t seem likely to be common.
If the input string, with leading whitespace and possible +/- signs removed, begins with 0x or 0X (a zero, followed by lowercase or uppercase X), radix is assumed to be 16 and the rest of the string is parsed as a hexadecimal number.
If the input string begins with any other value, the radix is 10 (decimal).
You seem to have missed the important phrase “in source code”, as well as the entire second part of my comment discussing that runtime functions that parse user input are different.
I guess this is one of the reasons that some linters now scream if you don’t provide base when parsing numbers. But then again good luck finding it if it happens internally. Still, I feel like a ZIP should be treated as a string even if it looks like a number.
Yep. Much like we don’t treat phone numbers like a number. The rule of thumb is that if you don’t do any arithmetic with it, it is not a “number” but numeric.
Well, we don’t, but every electonic tables software out in the wild on the other hand…
/jYes, I know that you can force it to become text by prepending ’ to the phone, choose an appropriate format for the cells, etc, etc The point is that this often requires meddling after the phone gets displayed as something like 3e10
Who tf decided that a 0 prefix means base 8 in the first place? If a time machine was invented somehow I’m going to cap that man, after the guy that created JavaScript.
What’s wrong with having a some year old software? Does it do what you need? Yes. Then what? I have all I need on Debian. Why should I care of new updates. Security? Yes we have Debian security because of that. Look, y’all had the xyz backdoor package in your systems because it was new. Me as a Debian stable user I didn’t have to deal with it. Did I lose something by not having the latests software? No. Well maybe less crashes.
Most privative software also gets weekly updates. Does it make it better? No. You may prefer that.
Also I don’t get the point about the version numbering of Debian packages. Every team uses the versioning they want.
From my experience software that updates a lot tends to break old features a lot too.
Debian suporting freesoftware projects or other stuff doesn’t look as a relevant argument. I mean if you prefer using privative stuff and using that kind of software. Do whatever you like with your Google/Facebook/Apple friends.
But don’t come intoxicate the community with this bullshit.
If you’re branching logic due to the existence or non-existence of a field rather than the value of a field (or treating undefined different from null), I’m going to say you’re the one doing something wrong, not the Java dev.
These two things SHOULD be treated the same by anybody in most cases, with the possible exception of rejecting the later due to schema mismatch (i.e. when a “name” field should never be defined, regardless of the value).
It gets more fun if we’re talking SQL data via C API: is that 0 a field with 0 value or an actual NULL? Oracle’s Pro*C actually has an entirely different structure or indicator variables just to flag actual NULLs.
Zalando explicitly forbids it in their RESTful API Guidelines, and I would say their argument is a very good one.
Basically, if you want to provide more fine-grained semantics, use dedicated types for that purpose, rather than hoping every API consumer is going to faithfully adhere to the subtle distinctions you’ve created.
There’s a huge difference between checking whether a field is present and checking whether it’s value is null.
If you use lazy loading, doing the wrong thing can trigger a whole network request and ruin performance.
Similarly when making a partial change to an object it is often flat out infeasible to return the whole object if you were never provided it in the first place, which will generally happen if you have a performance focused API since you don’t want to be wasting huge amounts of bandwidth on unneeded data.
The semantics of the API contract is distinct from its implementation details (lazy loading).
Treating null and undefined as distinct is never a requirement for general-purpose API design. That is, there is always an alternative design that doesn’t rely on that misfeature.
As for patches, while it might be true that JSON Merge Patch assigns different semantics to null and undefined values, JSON Merge Patch is a worse version of JSON Patch, which doesn’t have that problem, because like I originally described, the semantics are explicit in the data structure itself. This is a transformation that you can always apply.
Tell me how you change the name without knowing the age. You fundamentally cannot, meaning that you either have to shuttle useless information back and forth constantly so that you can always patch the whole object, or you have to create a useless and unscalable number of endpoints, one for every possible field change.
As others have roundly pointed out, it is asinine to generally assume that undefined and null are the same thing, and no, it flat out it is not possible to design around that, because at a fundamental level those are different statements.
Good practice in API design is to permissively accept either undefined or null to represent optionality with same semantics (except when using JSON Merge Patch, but JSON Patch linked above should be preferred anyway).
I.e. waste a ton of bandwidth sending a ridiculous amount of useless data in every request, all because your backend engineers don’t know how to program for shit.
It’s about making APIs more flexible, permissive, and harder to misuse by clients. It’s a user-centric approach to API design. It’s not done to make it easier on backend. If anything, it can take extra effort by backend developers.
But you’d clearly prefer vitriol to civil discourse and have no interest in actually learning anything, so I think my time would be better spent elsewhere.
Except, if you use any library for deserialization of JSONs there is a chance that it will not distinguish between null and absent, and that will be absolutely standard compliant. This is also an issue with protobuf that inserts default values for plain types and enums. Those standards are just not fit too well for patching
Bruh, there’s a difference between the one or two serializing packages used in each language, and the thousands and thousands and thousands of developers who miscode contracts after that point.
Only if using JSON merge patch, and that’s the only time it’s acceptable. But JSON patch should be preferred over JSON merge patch anyway.
Servers should accept both null and undefined for normal request bodies, and clients should treat both as the same in responses. API designers should not give each bespoke semantics.
JSON patch is a dangerous thing to use over a network. It will allow you to change things inside array indices without knowing whether the same thing is still at that index by the time the server processes your request. That’s a recipe for race conditions.
I am definitely guilt for that, but I find this approach really productive. We use small bug fixes as an opportunity to improve the code quality. Bigger PRs often introduce new features and take a lot of time, you know the other person is tired and needs to move on, so we focus on the bigger picture, requesting changes only if there is a bug or an important structural issue.
I always try to review the code anyway. There’s no guarantee that what they wrote is doing what you want it to do. Sometimes I find the person was told to do something and didn’t realize it actually needs to do Y and not just X, or visa versa.
I like to shoot for the middle ground: skim for key functions and check those, run code locally to see if it does roughly what I think it should do and if it does merge it into dev and see what breaks.
Small PRs get nitpicked to death since they’re almost certainly around more important code
So you’re always behind, patching up small bits of code that don’t comply with your guidelines, while letting big changes with, by deduction, worse code quality through?
I know this is a joke, but it you did that I would reject the pr with the reason of too many things at once. Reopen separate PR to refactor variable names. I actually constaly get people doing this and it’s dangerous exactly for the reason you’re joking about. Makes it easier for errors to slip in.
This will lead to change fatigue. People will rather not cleanup as they go anymore and just get the work done, with worse and worse code quality as a result.
I know you’re playing the straight man to a joke, but actually you can apply a linter, then tell GitHub to ignore the implied ownership history for the purposes of blame from that reclining pr. All such prs are massive and yet by virtue of the replayability of the linter it’s also very easy to ensure errors didn’t slip in when reviewing.
I know the original comment was about renaming all the variables, but that’s obviously deliberately absurd, so I’m using here a completely realistic example instead.
In my first programming job, I would actually do code reviews by pausing my own work, pulling their branch and building it locally, then using debug mode to step through every changed or added line of code looking for bugs, unaccounted for edge cases, and code quality issues.
…I dont do that anymore, I now go “looks good to me” even on 10 line reviews.
programmer_humor
Active
This magazine is from a federated server and may be incomplete. Browse more on the original instance.