kxyvr 5 years ago

A similar issue has come up for codes that I write. Among other things, I write low level mathematical optimization codes that need fast linear algebra to run effectively. While there's a lot of emphasis on BLAS/LAPACK, those libraries work on dense linear algebra. In the sparse world, there are fewer good options. For things like sparse QR and Choleski, the two fastest codes that I know about are out of SuiteSparse and Intel MKL. I've not tried it, but the SuiteSparse routines will probably work fine on ARM chips, but they're dual licensed GPL/commercial and the commercial license is incredibly expensive. MKL has faster routines and is completely free, but it won't work on ARM. Note, it works fantastically well on AMD chips. Anyway, it's not that I can't make my codes work on the new Apple chips, but I'd have to explain to my commercial clients that there's another $50-100k upcharge due to the architecture change and licensing costs due to GPL restrictions. That's a lot to stomach.

  • bachmeier 5 years ago

    > I'd have to explain to my commercial clients that there's another $50-100k upcharge due to the architecture change and licensing costs due to GPL restrictions.

    Your complaint is kind of strange. You're blaming "GPL restrictions" but the cost is for a commercial license.

    • coldtea 5 years ago

      Well, if the FOSS license used was e.g. MIT he wouldn't have to buy a commercial license, that's the parent's point. With GPL, he does, because else his clients have to make their own code/project conformant...

      • bachmeier 5 years ago

        Oh, so it's terrible to pay for software? How awful! Especially ironic because I'm sure the parent isn't working for free.

        • kxyvr 5 years ago

          We all pay for software, but it's the amount that really shapes decisions. Most organizations have a dollar limit where we can just charge a purchase card and when they have to seek approval. In this particular case, the software costs are higher than what can likely go onto a p-card, so now it becomes a real pain to acquire. In fact, the software is so expensive that it's cost would like eclipse the cost of the computer itself. So basically, we're looking at a decision where the client can use a more performant library and save $100k as long as they stay off of Apple silicon.

          That's really the point I'm trying to make and not to criticize anyone for using a GPL license. Moving to these new chips, in many cases, will be a much larger cost to an organization than just the cost of the computer.

        • coldtea 5 years ago

          >Oh, so it's terrible to pay for software?

          Compared to not paying for it? Yes.

          >Especially ironic because I'm sure the parent isn't working for free.

          So? Who said that when you get paid yourself it stops being awful to have to pay for things?

      • kxyvr 5 years ago

        Yes, that's correct. I write open source software as well and I don't begrudge anyone for licensing under GPL. And, I'm perfectly willing to obtain a commercial license, but I'm going to pass that cost on to my customers. In this particular case, though, the question for them is whether they want Apple silicon bad enough to pay an additional $50-100k in software licensing costs to keep their code private or to just buy an Intel or AMD chip. I know where I'd spend my money.

        • bachmeier 5 years ago

          You were pretty specific that it was entirely the fault of the GPL:

          > I'd have to explain to my commercial clients that there's another $50-100k upcharge due to the architecture change and licensing costs due to GPL restrictions.

          • epistasis 5 years ago

            What point are you trying to make here? The poster has been very clear on the mechanics, which are quite understandable, but I don't understand what you are trying to say. Is it just that you think it does not put the GPL in a positive enough light? I don't mean to put words in your mouth but that's my current best guess

          • chipotle_coyote 5 years ago

            It seems clear enough from context that the "GPL restrictions" are that if they used the GPL-licensed compiler, the commercial clients might run into legal issues with their use of it, necessitating that they purchase the commercial license. It's not uncommon for businesses to have a prohibition against using GPL software in not only their shipping products but anywhere in their toolchain. (You can argue that's a counterproductive prohibition, but "your legal department just needs to change their mind on this" may not be an argument a vendor can effectively make.)

            • foolmeonce 5 years ago

              I would not make an argument even if I thought a client would accept it. If they are incompetent they will decide to use the GPL code with sloppy oversight, violate the terms of the GPL, then they will hold a little grudge against you for the advice that got them in trouble. Sloppy companies have no internal accountability, so it's your fault.

              I use GPL code all the time at home and I would license many things GPL, but there's no reason to push GPL software at corporations. They should have limited options and spend money, possibly expanding MIT code, possibly just raising the price of engineers by keeping engineers occupied.

          • adrianmonk 5 years ago

            Apple forced them into a situation that gives them fewer options. That isn't a statement about how good or bad each option is. It's a statement about the consequences that Apple's choices have for developers.

            If I'm a travel agent and an affordable hotel near a travel destination closes down, I might have to book my clients in a nicer but more expensive hotel. Their trip will be a bit more expensive. Or maybe they'll travel to a different city. It doesn't mean I dislike the nicer hotel.

          • ajford 5 years ago

            No, he was pretty clear that it was due to needing to use that solver due to it being the only one that works on ARM right now. The dual licensing was only relevant in that the client would have to pay for the commercial license (due to the GPL restrictions).

            > MKL has faster routines and is completely free, but it won't work on ARM

        • Wowfunhappy 5 years ago

          How do these types of licenses deal with software updates in general? Presumably, at some point they'll need to buy a new license anyway, and the issue will be moot, right?

          And Rosetta will probably be around for a while...

          • lmm 5 years ago

            > How do these types of licenses deal with software updates in general? Presumably, at some point they'll need to buy a new license anyway, and the issue will be moot, right?

            It sounds like Intel produces an implementation of this thing that works on Intel and makes it available for free, whereas ARM don't (although another comment suggests Apple actually do), so you have to buy an expensive third-party implementation instead. That's not a difference that'll go away in the short term, and you can see why a processor company might legitimately choose one or the other approach.

          • kn0where 5 years ago

            Apple released the first Intel Macs to consumers in 2006, and in 2011 removed Rosetta from Mac OS X, so I guess it depends on what you mean by a while.

      • bawolff 5 years ago

        That's still pretty silly. If the thing wasn't open source at all, you would still have to buy a license.

        If your complaint is boo hoo, some people charge for software...well consider me unsympathetic.

    • smnrchrds 5 years ago

      I imagine the conversation with the clients will go like this:

      - Here is a quote for 100k for adding SuiteSparse to the code.

      - 100k‽ But I have found on the internet that SuiteSparse is free! Justify your quote.

      At that point, they will have to explain to the client what GPL is and why they cannot use the free version.

  • coldtea 5 years ago

    >MKL has faster routines and is completely free, but it won't work on ARM.

    It will probably be ported though, if there's a demand...

    • stabbles 5 years ago

      MKL is heavily optimized for Intel microarchs and purposely crippled on AMD (I believe dgemm is fast, sgemm slow). I don't think MKL benefits from optimizing it for Apple Silicon, especially considering Apple ditched Intel's hardware.

    • fxtentacle 5 years ago

      No it won't. Mkl is an Intel toolkit, so they will surely not support Apple's move to dump Intel processors.

    • pinewurst 5 years ago

      In what world will Intel port MKL - Intel intellectual property - to ARM? The whole purpose of Intel's software tools is as an enabler and differentiator for their architecture and specifically their parts.

      • coldtea 5 years ago

        In a world where Intel already had licensed ARM and built it in the past:

        https://newsroom.intel.com/editorials/accelerating-foundry-i...

        • chipotle_coyote 5 years ago

          That linked article from 2016 is about Intel's Custom Foundry program, which I'm fairly sure is for building chips under contract to other companies. It promotes that they have "access to ARM Artisan IP," but doesn't specifically mention an ARM version of MKL that I see. The list of compatible hardware Intel's page on MKL itself lists compatible processors and ARM is conspicuously absent:

          https://software.intel.com/content/www/us/en/develop/tools/m...

          And, this question on Intel's own forums from 2016 at least suggests that there wasn't an MKL version for ARM in the time frame of the article you're linking to, either:

          https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Libr...

          So, from what I can tell, while Intel is an ARM licensee and made ARM CPUs in the past, they haven't made their own ARM CPUs for years and there's no sign they ever made MKL for any ARM platform. Never say never, but I think the OP is basically right -- there's not a lot of incentive for Intel to produce one.

          • dfox 5 years ago

            Intel had sold most of the relevant ARM IP and product lines to Marvell in 2006.

      • Fnoord 5 years ago

        I don't know about this proprietary technology specifically, but Intel is a huge company with some FOSS friendliness. USB 4 is based on Thunderbolt 3, so I guess they licensed that one.

    • loosescrews 5 years ago

      Maybe, but note that this is the Intel MKL. A library developed and maintained by Intel. It is not a secret that Intel does this to support their ecosystem and have been caught intentionally crippling support for AMD processors in the past [1]. Intel has recently been adding better support for AMD processors [2], but many suspect that is intended to help x86 as a whole better compete with ARM. If it does get ported, it is highly unlikely to have competitive performance.

      [1] https://news.ycombinator.com/item?id=24307596

      [2] https://news.ycombinator.com/item?id=24332825

      • kxyvr 5 years ago

        Thanks for the links. If anyone is wondering about some of the hoops that need to be jumped through to make it work, here's another guide [1].

        One question in case you or anyone else knows: What's the story behind AMD's apparent lack of math library development? Years ago, AMD and ACML as their high-performance BLAS competitor to MKL. Eventually, it hit end of life and became AOCL [2]. I've not tried it, but I'm sure it's fine. That said, Intel has done steady, consistent work on MKL and added a huge amount of really important functionality such as its sparse libraries. When it works, AMD has also benefited from this work as well, but I've also been surprised that they haven't made similar investments.

        Also, in case anyone is wondering, ARM's competing library is called the Arm Performance Libraries. Not sure how well it works and it's only available under a commercial license. I just went to check and pricing is not immediately available. All that said, it looks to be dense BLAS/LAPACK along with FFT and no sparse.

        [1] https://www.pugetsystems.com/labs/hpc/How-To-Use-MKL-with-AM...

        [2] https://developer.amd.com/amd-aocl/

        • microtonal 5 years ago

          Eventually, it hit end of life and became AOCL [2]. I've not tried it, but I'm sure it's fine.

          It's ok. I did some experiments with transformer networks using libtorch. The numbers on a Ryzen 3700X were (sentences per second, 4 threads):

          OpenBLAS: 83, BLIS: 69, AMD BLIS: 80, MKL: 119

          On a Xeon Gold 6138:

          OpenBLAS: 88, BLIS: 52, AMD BLIS: 59, MKL: 128

          OpenBLAS was faster than AMD BLIS. But MKL beats everyone else by a wide margin because it has a special batched GEMM operation. Not only do they have very optimized kernels, they actively participate in the various ecosystems (such as PyTorch) and provide specialized implementations.

          AMD is doing well with hardware, but it's surprising how much they drop the ball with ROCm and the CPU software ecosystem. (Of course, they are doing great work with open sourcing GPU drivers, AMDVLK, etc.)

          • gnufx 5 years ago

            If you care about small matrices on x86_64, you should look at libxsmm, which is the reason MKL now does well in that regime. (Those numbers aren't representative of large BLAS.)

        • gnufx 5 years ago

          You just run MKL from the oneapi distribution, and it gives decent performance on EPYC2, but basically only for double precision, and I don't remember if that includes complex.

          ACML was never competitive in my comparisons with Goto/OpenBLAS on a variety of opterons. It's been discarded, and AMD now use a somewhat enhanced version of BLIS.

          BLIS is similar to, sometimes better than, ARMPL on aarch64, like thunderx2.

        • rurban 5 years ago

          > What's the story behind AMD's apparent lack of math library development?

          I don't see a story. AMD supports a proper libm for gcc and llvm, has its own libm, BLAD, LAPACK, ... at https://developer.amd.com/amd-aocl/

          Just their rdrand intrinsic is broken on most ryzens if you didn't patch it. Fedora firmware doesn't patch it for you.

  • brundolf 5 years ago

    Would their workflow allow just keeping a server on hand to do the number crunching, and still getting to use Apple Silicon on a relatively thin client?

  • semi-extrinsic 5 years ago

    Have you tried PETSc? It does sparse (and dense) LU and Cholesky, plus a wide variety of Krylov methods with preconditioners.

    It can be compiled to use MKL, MUMPS, or SuiteSparse if available, but also has its own implementations. So you could easily use it as a wrapper to give you freedom to write code that you could compile on many targets with varying degree of library support.

    • kxyvr 5 years ago

      I like PETSc, but how do its internal algorithms compare on shared memory architectures? I'd be curious if anyone has updated benchmarks between the libraries. I suppose I ought to run some in my copious amount of free time.

      Sadly, the factorization I personally need the most is a sparse QR factorization and PETSc doesn't really support that according to their documentation [1]. Or, really, if anyone knows a good rank-revealing factorization of A A'. I don't really need Q in the QR factorization, but I do need the rank-revealing feature.

      [1] https://www.mcs.anl.gov/petsc/documentation/linearsolvertabl...

      • jedbrown 5 years ago

        PETSc developer here. You're correct that we don't have a sparse QR. I'm curious about the shapes in your problem and how you use the rank-revealed factors.

        If you're a heavy user of SuiteSparse and upset about the license, you might want to check out Catamari (https://gitlab.com/hodge_star/catamari), which is MPLv2 and on-par to faster than CHOLMOD (especially in multithreaded performance).

        As for PETSc's preference for processes over threads, we've found it to be every bit as fast as threads while offering more reliable placement/affinity and less opportunity for confusing user errors. OpenMP fork-join/barriers incur a similar latency cost to messaging, but accidental sharing is a concern and OpenMP applications are rarely written to minimize synchronization overhead as effectively as is common with MPI. PETSc can share memory between processes internally (e.g, MPI_Win_allocate_shared) to bypass the MPI stack within a node.

        • kxyvr 5 years ago

          I'll have a look at Catamari and thanks for the link. Maybe you'll have a better idea, but essentially I need a generalized inverse of AA' where A has more columns than rows (short and fat.) Often, A becomes underdetermined enough where AA' no longer has full-rank, but I need a generalized inverse nonetheless. If A' was full rank, then the R in the QR factorization of A' is upper triangular. If A' is not full rank, but we can permute the columns, so that the R in the QR factorization of A' has the form [RR S] where RR is upper triangular and S is rectangular, we can still find the generalized inverse. As far as I know, the permutation that ensures this form requires a rank-revealing QR factorization.

          For dense matrices, I believe GEQP3 in LAPACK pivots so that the diagonal elements of R are decreasing, so we can just threshold and figure out when to cut things off. For sparse, the only code I've tried that's done this properly is SPQR with its rank-revealing features.

          In truth, there may be a better way to do this, so I might as well ask: Is there a good way to find the generalized inverse of AA' where A is rank-deficient as well as short and fat?

          As far as where they come from, it's related to finding minimum norm solutions to Ax=b even when A is rank-deficient. In my case, I know the solution exists for a given b, even though the solution may not exist in general.

          • jedbrown 5 years ago

            If you have one (or a small number of) right-hand sides, I would try to make LSQR work. It can find a minimum norm solution even if A is rank-deficient, and you can use preconditioning.

            Also, if your problem is a good fit for a method like this, it could be impetus to add it to PETSc. https://epubs.siam.org/doi/pdf/10.1137/120866580

            • kxyvr 5 years ago

              Unfortunately, in my case, the generalized inverse of AA' is the preconditioner for the system, which is why I need the factorization of A'. Essentially, I take this factorization and then run it through my own iterative method. When I run tests in MATLAB, SPQR scales fine for matrices of at least a few hundred thousand rows and columns. For larger, it would be nice to essentially have an incomplete Q-less QR factorization, which I don't think exists, but should be an extension of the incomplete Choleski work.

              But, yes, LSQR or more fitting LSMR solves a similar problem, but they're the iterative solver and I need the preconditioner, which I'm using the factorization for.

  • roseway4 5 years ago

    Apple's own Accelerate Framework offers both BLAS/LAPACK and a set of sparse solvers that include Cholesky and QR.

    https://developer.apple.com/documentation/accelerate/sparse_...

    Accelerate is highly performant on Apple hardware (the current Intel arch). I expect Apple to ensure same for their M-series CPUs, potentially even taking advantage of the tensor and GPGPU capabilities available in the SoC.

    • kxyvr 5 years ago

      Huh, this actually may end up solving many of my issues, so thanks for finding that! Outside of their documentation being terrible, they do claim the correct algorithms, so it's something to at least investigate.

      By the way, if anyone at Apple reads this, thanks for the library, but, you know, calling conventions, algorithm, and options would really help on pages like this:

      https://developer.apple.com/documentation/accelerate/sparsef...

      • stephencanon 5 years ago

        That's the documentation page for an enumeration value, not a factorization routine (hence there are no calling conventions, etc, to document; it's just a constant).

        Start here: https://developer.apple.com/documentation/accelerate/solving... and also watch the WWDC session from 2017 https://developer.apple.com/videos/play/wwdc2017/711/ (the section on sparse begins around 21:00).

        There is also _extensive_ documentation in the Accelerate headers, maintained by the Accelerate team rather than a documentation team, which should always be considered ground truth. Start with Accelerate/vecLib/Sparse/Solve.h (for a normal Xcode install, that's in the file system here):

            /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Frameworks/Accelerate.framework/Frameworks/vecLib.framework/Headers/Sparse/Solve.h
    • m0zg 5 years ago

      Accelerate is also available (and highly performant) on ARM as well. I was not able to beat it with anything on ARM, including hand-coded assembly, at least for sgemm and simple dot products, which are bread and butter of deep learning. It actually baffles me that Microsoft is not offering linear algebra and DSP acceleration in Windows out of the box. This creates friction, and most devs don't give a shit, so Windows users end up with worse perf on essentially the same hardware.

    • mailslot 5 years ago

      It worked well on PowerPC too and helped with the Intel transition.

    • mattip 5 years ago

      Numpy and SciPy reject use of Accelerate due to faulty implementations of some routines. https://github.com/scipy/scipy/wiki/Dropping-support-for-Acc... We have never received any feedback from Apple about these bugs.

      • roseway4 5 years ago

        I noticed that SciPy has dropped support. I believe it wasn't only related to bugs, but also an very dated LAPACK implementation (circa 2009). I can't tell from Apple's developer docs whether this has changed.

        My sense is that Apple's focus is less on scientific computing and more so on enabling developers to build computation-heavy multimedia applications.

  • gnufx 5 years ago

    I've made the point that GCC and free linear algebra is infinitely faster on platforms of interest (geometric mean of x86_64, aarch64, ppc64le) while still having similar performance on x86_64. I thought MKL used suitesparse, or is that just matlab?

    • kxyvr 5 years ago

      As far as I know, MKL has its own implementation. As some evidence of this, here's an article comparing their sparse QR factorization to SPQR, which is part of SuiteSparse [1]. As far as MATLAB, I believe it uses both. I've a MATLAB license and it definitely contains a copy of MKL along with the other libraries. At the same time, their sparse QR factorization definitely uses SPQR, which is part of SuiteSparse. In fact, there are some undocumented options to tune that algorithm directly from MATLAB such as spparms('spqrtol', tol). As a minor aside, this is actually one of the benefits of a MATLAB license since they have purchased the requisite commercial licenses for SuiteSparse codes, it makes it easier to deal with some commercial clients who need this capability at a lower price than a direct license itself. This, of course, means using MATLAB and not calling the library directly. It's one of the challenges to using, for example, Julia, which I believe does not bundle with the commercial license, but instead relies on GPL.

      https://software.intel.com/content/www/us/en/develop/article...

      • mturmon 5 years ago

        Just a note in support of Matlab's sparse capabilities. For the last couple of years, I used Matlab successfully on large, sparse multiplication and factorization problems. A friend who was using R simply could not approach the scale I was able to work at, and I assume it's due to weak sparse support.

        I was multiplying and inverting sparse triangular matrices of size 650K x 650K with Matlab, on a laptop. Just amazing.

        • gnufx 5 years ago

          I'm surprised there doesn't seem to be anything in CRAN using SuiteSparse. It could presumably run at petascale, similarly to the dense support, if someone did similar work.

      • gnufx 5 years ago

        I doubtless mis-remembered about MKL, thanks.

        I'm baffled why there would be a problem with commercial users running a free software program like Julia or GNU Octave+SuiteSparse; that's Freedom 0. (And commercial /= proprietary, of course.)

        • kxyvr 5 years ago

          Most of the time, you're absolutely right especially with how Octave or Julia code is normally distributed. The code is delivered to the client and the client runs the code on their system. No GPL violations have occurred.

          That said, I believe it gets trickier once we start compiling the code. Say I want to develop a piece of software for my client and I don't want them to have the source, Octave doesn't really have a way to do this, but MATLAB does and since MATLAB has purchased all of the requisite licenses, we're good to go. Julia makes me more uncomfortable. We can make binaries with PackageCompiler.jl, but if we do, we should be subject to the provisions in the GPL. That's no different than any other piece of software, but Julia, Octave, and MATLAB all use these libraries and most people don't know that something like the chol command hooks into SuiteSparse in the backend.

          • eigenspace 5 years ago

            Yeah, the Julia devs are quite interested in removing our last few GPL dependencies and replacing them with something in pure julia. It'll take time though.

  • jjgreen 5 years ago

    SuiteSparse switched from GPL to LGPL about a year ago if that makes a difference (for the couple of components I was looking at anyway).

    • kxyvr 5 years ago

      Very cool and thanks for the heads up. I just went and checked and here's where it's at:

        SLIP_LU: GPL or LPGL
        AMD: BSD3
        BTF: LGPL
        CAMD: BSD3
        CCOLAMD: BSD3
        CHOLMOD Check: LGPL
        CHOLMOD Cholesky: LGPL
        CHOLMOD Core: LGPL
        CHOLMOD Demo: GPL
        CHOLMOD Include: Various (mostly LGPL)
        CHOLMOD MATLAB: GPL
        CHOLMOD MatrixOps: GPL
        CHOLMOD Modify: GPL
        CHOLMOD Partition: LGPL
        CHOLMOD Supernodal: GPL
        CHOLMOD Tcov: GPL
        CHOLMOD Valgrind: GPL
        CHOLMOD COLAMD: BSD3
        CPsarse: LGPL
        CXSparse LGPL
        GPUQREngine: GPL
        KLU: LGPL
        LDL: LGPL
        MATLAB_Tools: BSD3
        SuiteSparseCollection: GPL
        SSMULT: GPL
        RBio: GPL
        SPQR: GPL
        SuiteSparse_GPURuntime: GPL
        UMFPACK: GPL
        CSparse/ssget: BSD3
        CXSparse/ssget: BSD3
        GraphBLAS: Apache2
        Mongoose: GPL
      

      There's probably a bunch of mistakes in there, but that's what I found scraping things moderately quickly. Selfishly, I'd love SPQR to be LGPL, but everyone is free to choose a license as they see fit.

  • neolog 5 years ago

    > optimization codes

    I'm curious do people in numerical specialties say "codes" (instead of "code")? I don't often hear it that way but I'm not in that specialty.

    • xiii1408 5 years ago

      Yes. e.g., "I work on multiphysics codes."

      software => codes

    • samatman 5 years ago

      Yes, this is a Fortran-ism which persists unto the present day.

    • mturmon 5 years ago

      Really common usage in science/numerical computing.

      I was trying to identify when, in normal usage, you'd say "numerical codes" rather than "numerical software" or just "numerical code". It seems a bit slippery!

      Some contexts where it's prevalent: supercomputing, Fortran, national labs, large or multifaceted software. I also associate it with manager-speak ("our team has ported 77% of the simulation codes to HPSS").

willglynn 5 years ago

> The ARM architecture floating point units (VFP, NEON) support RunFast mode, which includes flush-to-zero and default NaN. The latter means that payload of NaN operands is not propagated, all result NaNs have the default payload, so in R, even NA * 1 is NaN. Luckily, RunFast mode can be disabled, and when it is, the NaN payload propagation is friendlier to R NAs than with Intel SSE (NaN + NA is NA). We have therefore updated R to disable RunFast mode on ARM on startup, which resolved all the issues observed.

Hmm. ELF object files for Arm can represent this with build attributes [1]:

    Tag_ABI_FP_denormal, (=20), uleb128
        0  The user built this code knowing that denormal numbers might be flushed to (+) zero
        1  The user permitted this code to depend on IEEE 754 denormal numbers
        2  The user permitted this code to depend on the sign of a flushed-to-zero number being
           preserved in the sign of 0

    Tag_ABI_FP_number_model, (=23), uleb128
        0  The user intended that this code should not use floating point numbers
        1  The user permitted this code to use IEEE 754 format normal numbers only
        2  The user permitted numbers, infinities, and one quiet NaN (see [RTABI32_])
        3  The user permitted this code to use all the IEEE 754-defined FP encodings

Seems like their code should be tagged Tag_ABI_FP_denormal = 1, Tag_ABI_FP_number_model = 3 if it were an ELF .o, .so, or executable, in which case <waves hands> some other part of the toolchain or system would automatically configure the floating point unit to provide the required behavior.

Does Mach-O have a similar mechanism?

[1] https://github.com/ARM-software/abi-aa/blob/master/addenda32...

  • johncolanduoni 5 years ago

    I wonder what happens if you `dlopen` a shared object that wants stricter behavior than the current executable and loaded shared objects. Does it somehow coordinate changing the state for all existing threads?

    • loeg 5 years ago

      From GP's link:

      > Procedure call-related attributes describe compatibility with the ABI. They summarize the features and facilities that must be agreed in an interface contract between functions defined in this relocatable file and elsewhere.

      Seems like it might be reasonable to reject mismatched combinations.

  • Dylan16807 5 years ago

    Does that second setting imply that those NaNs need to be propagated? If not, then those settings aren't great. Sure, there are lots of chips where denormal behavior and NaN preservation are the same setting, but those could and probably should be split up in the future.

  • tieze 5 years ago

    Note that build attributes are only supported on AArch32, not AArch64.

BooneJS 5 years ago

FORTRAN can be compiled on ARM Macs, but only commercially for now. https://www.nag.com/news/first-fortran-compiler-apple-silico...

  • sgt 5 years ago

    Can the R project use that compiler to build binaries and then upload the release to the public?

  • pdpi 5 years ago

    GCC/gfortran should soon follow, no?

    • mistrial9 5 years ago

      It is possible that Apple Inc will forcefully eliminate GCC from their platforms, replaced with Clang/LLVM and other non-GPL tools only.

      • pdpi 5 years ago

        They're certainly well on their way to remove all GPL code from macOS itself if they haven't yet, but there's not much they can do to prevent you from installing GPL software yourself (nor much of a motivation to do so for that matter).

      • fedorareis 5 years ago

        MacOS already doesn’t include GCC. The GCC command on MacOS is an alias for Clang unless you install gcc separately (unless there is a hidden GCC install I’m unaware of).

    • jabl 5 years ago

      Soon and soon. Problem is that macOS/ARM64 has a new ABI [1], and nobody has implemented that in GCC. A couple of people are working on it apparently on their own time, but it's a fairly significant undertaking. Might be ready for GCC 11 which if history is a guide should be released in spring 2021. Or then it might not be ready.

      [1] Why not use the standard ARM64 ABI as published by ARM? Well shits and giggles.

      • saagarjha 5 years ago

        It’s the same as the iOS ABI.

  • wodenokoto 5 years ago

    I am probably the last person to talk about the difference between fortran version, but isn't the linked compiler for FORTRAN 2003 and 2008, whereas R needs a FORTRAN 90 compiler?

    • pjmlp 5 years ago

      Which means they support everything from Fortran 90 and then some.

    • na85 5 years ago

      Fortran versions are additive. I.e. F03 is a strict superset of F90 and thus an F08 compiler can do F03 and F90

      • gnufx 5 years ago

        Only roughly additive. Some obsolete features have been dropped, though that doesn't mean compilers have dropped them.

CoolGuySteve 5 years ago

R doesn't even work that well on Intel, at least in Ubuntu. Recompiling the package with AVX support often leads to a 30% performance increase on modern CPUs.

IMO the R base package should dynlink different shared libraries for different processors since vector extensions are mostly tailored to the kind of floating point numerical work that R does.

  • physicsguy 5 years ago

    That's deliberate though, when you distribute software you choose the lowest common denominator in general, and that's SSE2 for 64 bit machines.

    • johncolanduoni 5 years ago

      Many linear algebra libraries will handle this at runtime for you (OpenBLAS & MKL do this for example). You generally only need to use specialized builds of these if you don't want to have to ship extra code paths you won't use.

    • _ea1k 5 years ago

      See, this is why Gentoo is the right way to manage an OS. /s

      • gnufx 5 years ago

        The SIMD makes little difference to the bulk of the system, but you want dynamic dispatch where it does, at least on a typical HPC system, which isn't heterogeneous five years down the line, and in things like libc generally.

    • bobbylarrybobby 5 years ago

      When you install Python's numpy, I'm pretty sure it chooses a pre-built package based on your hardware, and if it doesn't have one I think it's pretty easy to get it to build the best one from scratch.

      • physicsguy 5 years ago

        It generally installs MKL if you install a wheel (i.e. 'pip install numpy') these days, which dynamic dispatches based on the processor. It's been criticised a bit though because MKL doesn't perform as well on AMD hardware without setting some environment variables in the older versions, although it looks like they've added kernels that target AMD hardware recently.

  • curiousgal 5 years ago

    Same can be said for Python, tensorflow jumps to mind.

  • gnufx 5 years ago

    The thing that makes a significant difference is BLAS, and it's easy to substitute. There are some old numbers at https://loveshack.fedorapeople.org/blas-subversion.html#_add... Most of it is unlikely to benefit much from -mavx and vectorization, but I have no numbers. -fno-semantic-interposition is probably a better candidate, which I've not got round to trying.

analog31 5 years ago

Ask HN: Does the FORTRAN issue also affect Numpy/Scipy for Python?

melling 5 years ago

How popular is R in general?

I started learning it because I want to make an attempt to do some projects on Kaggle. Most people use Pandas, Seaborn, etc, which I will also use.

However, to me R appears like a little better Swiss Army Knife to do initial analysis. ggplot2, tidyverse, ...

Any help leveling up would be appreciated.

  • kickout 5 years ago

    Probably the most popular non-software engineer language for working with data.

    Millions and millions of users that have no idea what this blog post is technically about (but is interesting nonetheless)

    • NegativeLatency 5 years ago

      Second after Excel?

      • kickout 5 years ago

        Fair, although I would say Excel and R are serving two separate purposes. But yes, Excel is of course #1

        • wtetzner 5 years ago

          > Fair, although I would say Excel and R are serving two separate purposes.

          It's true that they target different use cases overall (obviously with some overlap), but Excel tends to be used for lots of things that would be better handled with a different tool, because it's what people know.

    • ISL 5 years ago

      Matlab has a substantial userbase, too.

      • kickout 5 years ago

        Agreed, but IMO R >> MatLab and SAS over the past ~10 years. Both Matlab (physics??) and SAS (pharma/financial) seem to have further sunk into deep niches.

        • mjn 5 years ago

          Matlab is still big in engineering as well. Matlab+Simulink in particular seems to have a fairly entrenched niche.

    • ImaCake 5 years ago

      I think this nails it. Given both python and R, these people will pick R because the motivation for using R is very clear, it does data analysis and statistics. Whereas python kinda does everything and that makes it a bit more tricky to understand.

  • J253 5 years ago

    In my experience I’ve seen R used in more exploratory/ad hoc type analysis and algorithm development by “non-developers”—-statisticians, scientists, etc. usually without performance consideration—-and that code is then turned into production code with the dev team using Python or C or something more performant or maintainable.

    • pyromine 5 years ago

      R is a nasty nasty language for productionalizing things, honestly it's just too flexible and let's you do the craziest things.

      But being so flexible makes it really expressive for doing ad-hoc analysis where you really don't know what you're looking for yet.

      • laichzeit0 5 years ago

        It’s not just the language/runtime though. It’s the entire ecosystem around that that’s required to productionize it in any modern sense of the word. It’s hard to get right even in Python. I mean Flask (even with restx) still doesn’t even generate Swagger 3 documentation.

  • jhfdbkofdcho 5 years ago

    Extremely popular and widely used. Pandas etc are Python implementations of R constructs

  • ekianjo 5 years ago

    Very popular in academia, moderately popular in industry when it comes to data science/analysis. In any case, very powerful while Python has certainly numerous advantages over it.

  • vmchale 5 years ago

    Lots of scientists like it, psychologists and statisticians and such.

  • FranzFerdiNaN 5 years ago

    I work with people who mostly have a background in the social sciences or humanities and who work in R pretty much every day. They dont see themselves as programmers and Python is complete gibberish for them, while R just makes sense. When i meet people from other companies in roughly the same space (i work in healthcare doing data analysis), it's mostly the same. I actually meet more people who use SAS/SPSS than Python.

    For data analysis, R is in my opinion better than Python. It's when you have to integrate it in existing workflows that Python quickly becomes a better choice.

    • lottin 5 years ago

      It's not so much that Python is gibberish but that is written, as far as I know, predominantly by engineers, who aren't really experts in statistics, or experts in science for that matter. A scientist will tend to trust more code written by another scientist than code written by an engineer. At least, I would.

      • milesvp 5 years ago

        I find this statement interesting. Historically scientists have a reputation for writing relatively poor code. Code that runs really slowly due to things like unintended nested loops, or striding values (x,y vs y,x). And code that doesn't handle non-happy path cases very well.

        Are you saying that you trust the code more because the domain knowledge make it more likely to get the right answer then? Has general knowledge increased such that scientists' code isn't as painful as it was 20 years ago?

        • lottin 5 years ago

          Yeah, exactly, even though it might be terribly inefficient, I still trust scientific code more when it's written by scientists than when it's written by non-scientists, in terms of getting the right answer.

    • bigger_cheese 5 years ago

      Similar story for me. I am an Engineer (the non software type). I work at an industrial plant. We use SAS pretty extensively for data analysis, time series analysis, multivariate regressions etc. As well as for BI type stuff (reports, graphing, adhoc queries).

      For a while R was being pushed pretty heavily as a SAS alternative. My org paid R training courses etc. I found R and SAS pretty comparable at least the R packages we looked at (dpylr, ggplot2 etc).

      I know about Python, the programming language I used PyGTK back in the day to build GUI apps. But it would not be my first thought for doing data analysis work. Does Python even offer something like R studio/ SAS Enterprise Guide and does it have a trending package?

  • minimaxir 5 years ago

    As a data scientist who is proficient in both Python and R ecosystems, in my opinion R/tidyverse is substantially better for ad hoc EDA and data visualization.

    However, Python is better for nearly everything else in the field (namely, working with nontabular data, external APIs, deep learning, and productionization).

    It's about knowing which tool to use.

    • omarhaneef 5 years ago

      Ditto (not that I am proficient, but my experience matches).

      However, because the rest is easier in python, and my mental gears grind when I switch from one to the other, I end up using Python for the adhoc EDA and viz, and with Spyder, it is a pretty decent experience.

    • disgruntledphd2 5 years ago

      > (namely, working with nontabular data, external APIs, deep learning, and productionization).

      I agree with all of that except for productionisation. I would have agreed before dealing with issues around getting consistent versions of python + libraries to run.

      The issues I see with Python are as follows:

      - pip doesn't actually check to make sure your dependencies are compatible, which causes real problems with numpy et al

      - conda isn't available by default, and running it on remote boxes is non-trivial (I spent a whole week figuring out how to get it running in a remote non-login shell).

      - This makes it really, really difficult to actually get a standard set of libraries to depend upon, which is really important for production.

      R, on the other hand, actually resolves dependencies in its package manager, and the R CMD BUILD for packages, while super annoying helps you produce (more) portable code (did you know that conda doesn't provide cross-platform yml files unless invoked specifically?).

      In terms of handing it over to engineering/non data science people though, Python is much much much better.

      tl;dr Python's an ace language with a terrible production story.

      • ekianjo 5 years ago

        > R, on the other hand, actually resolves dependencies in its package manager

        And you have packages like renv which also help isolate specific versions of packages to make portable environments even more reliable.

        • klmr 5 years ago

          ‘renv’ (only) does more or less what pyenv does. Contrary to what the parent comment says, R doesn’t actually do any dependency resolution at all, and the official package repository (CRAN) doesn’t even archive many old versions (though MRAN does).

          I strongly prefer R for data science, but its dependency management story is poor, even compared to Python’s (which, in turn, is poor compared to Rust/Ruby/…).

          • disgruntledphd2 5 years ago

            That's simply not true. R doesn't store old versions, which is actually brilliant because your code breaks when your dependencies rot.

            Python will silently upgrade numpy as a transitive dependency and break everything, which is much worse. MRAN also has daily snapshots which is normally how i handle stuff that will never be updated.

            I also specified building an R package which does handle dependencies versus the python equivalent which does not.

            I'm not saying R is good, I'm just saying Python is way worse.

          • guitarbill 5 years ago

            > even compared to Python’s (which, in turn, is poor compared to Rust/Ruby/…)

            Compared to Rust? Sure. Compared to Ruby? Maybe in the way that a lockfile isn't automatically generated when using pip.

            Hating on Python's dependency management is a meme at this point. You could do a lot worse than the current pip + venv, and upgrading to something like poetry or pipenv is pretty painless. I'm pretty sure 99% of problems occur because people don't pin stuff.

      • hobofromabroad 5 years ago

        I agree and disagree. We deploy our models via Cloud Foundry which has support for Anaconda.

        Model building is done in AWS with access to Anaconda.

        Usually we have an environment.yml for the REST API and one for model building.

        This makes modeling -> deployment cycle fairly easy, if not perfect.

        You can also use pip and env, but you have to make sure that all important dependencies are specified sufficiently specific. But that's also the case for Anaconda. (For instance, we had a problem in the API with a x.x.y release of greenlet or gevent since we only specified x.x)

        For R, well use packrat. R IMHO has the problem of many different algorithms with different APIs. Yes, there are tools like caret, but 'you' will run into problems with the underlying implementations eventually. sklearn makes things easier here, at least most of the time.

        I would also prefer R for EDA. But I don't like splitting eda and modeling that way, since there can be subtle differences in how data is read which can lead to hard to find problems later on. (Yes, you could use something like feather)

        I also thing that tooling for python is much nicer, pytest, black, VSCode python integration just seem more mature.

      • laichzeit0 5 years ago

        And what libraries do you use in R to generate a REST api that has Swagger 3 documentation? Authentication with JWT tokens? Monitoring, e.g. ApplicationInsights?

        • disgruntledphd2 5 years ago

          For that, R would not be the right choice. I stand by my comments on how difficult it is to productionise python ML applications.

          I think all of those things you mentioned are JVM stuff, right? There's a version of R called renjin that could be used in that scenario.

          Don't get me wrong, I'd love if this was better in python but right now it is far more difficult than it needs to be.

          • laichzeit0 5 years ago

            Nope, none of it is JVM stuff. It's pretty standard stuff if you want to ship an API into a production environment and expect other developers/services to interact with your model. How do you know your model is failing/slow to serve requests? You need monitoring/logging. How do add security? I'm talking API security, like JWT tokens with scopes and claims.

            Maybe we mean different things by "productionising ML applications" but building a docker container with an R runtime and the correct package versions is not all, or even half, of what's required for production.

            • pbowyer 5 years ago

              Why would you have any of this tightly coupled to your model?

              Set up a separate API gateway, which covers all your points (REST endpoints, monitoring, security) - there's plenty of off-the-shelf options. Route authenticated requests to the backend that runs your model.

            • disgruntledphd2 5 years ago

              Depends on your model. Mine score users daily, so I don't need to worry about building an API.

              Logging is pretty available in both (though better in Python to be fair).

              I don't really see how building my model in Python would make it easier to add this API functionality either, so it's a bit irrelevant. Like my docker container (which appears to be almost essential in Python but nice in R) can call predict in any language, and then pass through to the API using the tools noted above.

    • curiousgal 5 years ago

      You forgot time series analysis where Python is years behind R. Robust regression methods too. But most of all, Shiny! Python's Dash for creating interactive data web apps is absolutely horrible compared to Shiny.

      Basically if you were to go down any unbeaten path when it comes to statistical models, you're better off using R. But if your main goal is pushing to something prod, then you're better off with Python. The only exception being Shiny, they've put a lot of effort into making it production-ready.

      • jerjerjer 5 years ago

        > You forgot time series analysis where Python is years behind R

        What does R offer?

        In Python there's SARIMAX and Prophet, interested in what R has to offer.

        Also interested in a decent Grid Search for time series.

  • ryanar 5 years ago

    I love using R for exploratory work. Hadley Wickham's TidyVerse of packages make everything so ergonomic to use.

  • currymj 5 years ago

    think of it like shell scripting for statistics, although not nearly as limited as bash is compared to other programming languages.

    it works best if it's used semi-interactively, as a glue language between statistical packages which may be written in other languages. or to write simple "batch" scripts that basically just run a bunch of procedures in a row.

    RStudio makes the whole experience much nicer in terms of plotting, and RMarkdown is great for preparing documents.

    of course like shell scripting you can write fairly complicated programs in it, and sometimes people do, but due to backwards compatibility and weird design choices meant to make interactive use easier, programming "in the large" can get weird.

    the analogy works for Python too -- it is definitely reasonable to use Python for shell scripting, but using Python interactively to pipe things from one program to another is slightly more frustrating than doing it in the shell, although might be preferred due to its other advantages.

  • totalperspectiv 5 years ago

    More popular than I wish it was. It is the bash of the data science world. Totally ubiquitous and kind of a dumpster fire.

    • dcolkitt 5 years ago

      I disagree, the language is extremely powerful for interactive data exploration. A terse one-liner is all it takes to compute something like "what's the correlation between number of children and home size for people over 45 who live in counties with income variance at the 90th percentile weighted by population".

      Not that pandas/scipy/numpy don't make an admirable job. You can do something like this, but it's nowhere near as ergonomic as it is R. At the end of the day, R is fundamentally a language for data exploration, whereas with python those facilities are bolted on top of a general purpose environment.

    • tharne 5 years ago

      This is the best description of R that I've ever come across, and I say that as someone who learned R as their first programming language.

      Big mistake, btw. It took me years to unlearn all of the terrible habits I picked up from the R world. Do yourself a favor and start with python, if only to learn proper programming practices and techniques before diving into R.

      • uomopertica 5 years ago

        While I agree that Python is better than R for programming etiquette, I would argue that proper programming practices and techniques are better learned in languages with static typing and proper variable scoping. Do yourself another favor and also look into C#, Swift or even Java.

        • coward8675309 5 years ago

          If R is a reasonable tool for a given problem, C# or Swift or Java almost certainly will not be. The realistic alternatives to R are other numerical analysis packages, Julia, and Python. “The” answer for any given person or project is likely to be a function of your colleagues and peer group, your problem domain, your library needs.

          One of course is allowed to learn more than one thing. Maybe play with a bondage and discipline language to expose yourself to the concepts the parent comment is advocating for.

          • FridgeSeal 5 years ago

            They're not saying use Swift/C# for those problems, they're saying learn good programming practices from those languages and tools and then go do things in R/Python with that expertise under your belt.

            • coward8675309 5 years ago

              A lot of people don’t have the luxury of doing both those things. They’re confronted with a problem and need to solve it, and solving it requires choosing and learning how to use a tool. If you have plenty of free time, choosing C#, Swift, and Java seem like odd choices for a pedagogic programming language. For learning about type safety, spending a couple weeks playing with SML or Haskell would be a good idea, though they’re both functional.

              As a student I constantly complained that we were being taught these useless languages. As a grownup I realize that while some of the Comp Sci faculty may’ve been out of touch, their goal was not teaching us commercially viable skills. They were endeavoring to teach us how to think. Once you know how to think you can express those thoughts in nearly any language, no matter how hostile to those thoughts it may be.

              But maybe you just want to get things done, and if that’s so, the answer for data problems is basically one or more of R, Python, Julia, etc.

    • hyperbovine 5 years ago

      R is a but of a horror show under the hood, I agree, but if you're just an end user doing data analysis, consider:

          flights.iloc[0:10, flights.columns.get_indexer(['year', 'month', 'day'])])
      

      versus

          flights %>% select("year", "month", "day") %>% head(10)
      

      I could go on...

      • totalperspectiv 5 years ago

        I totally agree that it is a very efficient and powerful tool for ad-hoc data analysis. It's just not what I would view as a responsible choice for production / publication code.

      • kgwgk 5 years ago

        You could have done

            flights[['year', 'month', 'day']].head(10)
        

        which is not so different from standard R

            head(flights[c("year", "month", "day")], 10)
        

        but it's true that the following may be nicer

            flights[1:10, c("year", "month", "day")]
        

        (by the way using head(10) is not the same as indexing 1:10 if there are less than 10 rows)

      • superbatfish 5 years ago
            flights[["year", "month", "day"]].head(10)
      • _Wintermute 5 years ago

        That's a very dishonest example. Yes, terribly written pandas code looks terrible.

      • nojito 5 years ago

        flights[1:10, .(year, month, day)]

        for the data.table fans

        Which is arguably the superior way to handle tabular data in 2020.

      • empthought 5 years ago

        It’s pretty sad that both are worse than

             SELECT year, month, day
               FROM flights
              LIMIT 10
        • Tarq0n 5 years ago

          SELECT before FROM isn't really a good thing.

    • acomjean 5 years ago

      As I programming language I don't love R. I didn't get it till I took a biostatistics class. (We could use Stata/Excel or R for the class). It really shines analyzing data. Its loved by statisticians and some programmable attributes too.

      Biologists like it for single cell analysis. They use Seurat and save the data as an object and load it up/ pass around around for analysis. Its actually kinda neat.

      R's ggplot2 library is top tier in making graphs.

      RStudio makes it very accessible.

  • goatinaboat 5 years ago

    However, to me R appears like a little better Swiss Army Knife to do initial analysis. ggplot2, tidyverse, ...

    R is far superior for interactive exploration/analysis and report writing. However Python is far superior if you are writing a program that does other things too.

    My rule of thumb is that if a Python program is 70% or more Numpy/Pandas/Matplotlib etc then it should be R. Whereas an R program does comparatively little analysis and a lot of logic and integration, it should be Python. No one size fits all.

  • coldtea 5 years ago

    >How popular is R in general?

    Very popular. To the point of even having quite a lot of Microsoft support, lots of books, etc.

  • williamstein 5 years ago

    For what it is worth (not at all clear), TIOBE ranked R as the 9th most popular programming language in the world this month: https://www.tiobe.com/tiobe-index/. For comparison, Python is ranked number 2.

  • sbassi 5 years ago

    In finance and fintech is pretty standard.

  • bayeslaw 5 years ago

    no self respecting person who calls themselves a date scientist ai researcher or ml engineer would touch R.. it's a toy for making pretty plots and fitting traditional stats models to small data.. it is not a proper programming language but a horrible old scripting whatever that was unfortunately saved by Hadley and his persistence on creating an eco system around it..

    • data_ders 5 years ago

      clicked on link expecting R shitposting. was not disappointed.

    • disgruntledphd2 5 years ago

      So I guess the authors of the Elements of Statistical Learning aren't "real" researchers then?

      For reference, the authors of that book (the best book about ML in general) were all involved in the development of S and R.

jp0d 5 years ago

I reckon they'd finally have to get R working natively on the new chip. I don't foresee Apple offering the fat binary support in the long term. It's probably only an intermediate solution for the transitional period. Also, does it mean the native version of R will finally work on the iPad? I know Apple doesn't allow compilers but there are a few examples like Pythonista and Apple's own Swift playground. It'd be cool to get R Studio on the iPad.

  • Wowfunhappy 5 years ago

    Just to be clear, PPC-Intel fat/universal binaries are still supported even on Big Sur, the PPC portion is just ignored. I don't expect Intel-Arm binaries to go away any time soon.

    I believe what you're really thinking of is Rosetta though. That, indeed, is sadly unlikely to be around forever. We have history as an indication of that.

    • jp0d 5 years ago

      Yes, you're right. I meant Rosetta2. It's good to have native binaries nevertheless.

  • RandallBrown 5 years ago

    When Apple transitioned from PowerPC to Intel the fat binary support (Rosetta) lasted 3 OS updates or about 3 years. Definitely won't be a super long term thing, but there's plenty of time I guess.

FullyFunctional 5 years ago

FWIW, RISC-V explicitly doesn't support NaN payload propagation so R will have a problem there as well.

Will_Do 5 years ago

Question: Would using R inside Docker on one of these Macs work somewhat well?

Previous benchmarks[0] show that the overhead on Intel Macbooks for the Docker Linux VM is quite low for scientific computing.

Would the x86 emulation hurt performance substantially or is there some other issue with this approach?

[0]: https://lemire.me/blog/2020/06/19/computational-overhead-due...

  • ryukafalz 5 years ago

    I would imagine that Docker for Mac will likely get native support for ARM macOS soon enough, in which case there'd be no x86 emulation involved and you could run the ARM Linux version of R in a container just fine.

    My understanding is Rosetta 2 does not support x86 virtualization.

  • mr_toad 5 years ago

    Why would you run R on a Mac in Docker? Docker isn’t an emulator. You’re still going to need ARM code.

    • dehrmann 5 years ago

      Not that this wasn't a known caveat of Docker, but I think a lot of people are going to realize this in the next year or two.

stevefan1999 5 years ago

Similarly, Matlab is also not initially available for Apple Silicon natively now, and they are preparing an update to let Matlab run in Rosetta 2 instead, until the development cycle of native version completes.

istvan60 5 years ago

Hi everyone, did any of you try out R or SPSS under a new M1 Macbook, do either of these work fine under Rosetta 2, as I suppose none has a native ARM version yet.

In addition, did anyone try CorelDraw as well?

I am asking these question, because I think a lot of us working in data science have second thoughts about moving to ARM, at least for the next year or so....

gnufx 5 years ago

Apple is obviously a contrast with the usual state of affairs where one of the first signs of a new CPU is in the GNU toolchain.

  • kzrdude 5 years ago

    Is this a new cpu in that sense? I thought this was ARM64?

    • gnufx 5 years ago

      Yes, though I don't know what version. Maybe I should have said new system, but it's a new micro-architecture, as I understand it, with an unsupported ABI.

gok 5 years ago

Is there a reason R can't use the BLAS/LAPACK implementation that comes with macOS in the Accelerate framework?

olliej 5 years ago

It should run fine under rosetta, if anyone encounters issues please submit bug reports.

ineedasername 5 years ago

An inability to use R Studio would be a deal breaker for me.

motorbreath 5 years ago

There's always R Cloud, accessible from a browser.

superbatfish 5 years ago

It sounds like R's design decision to use a non-standard NaN value to represent NA is an obscenely bad one. Wasn't it obvious that this would become a problem someday?

  • oddthink 5 years ago

    It's not a "non-standard NaN". It's just a particular one, out of many possible quiet NaN values. If the Apple silicon isn't propagating the payload of the input NaN value to output, that's a violation of IEEE 754.

    (IIUC, that is. It may be something like a "should" not a "must".)

    • superbatfish 5 years ago

      The article contradicts your assertion. Did you read it?

      • oddthink 5 years ago

        Wow, a little hostile here?

        My assertion, that R's NaN is not "non-standard", seems upheld by the article. It's a quiet NaN with a payload, which is well-defined by the IEEE 754 standard.

        As other posters pointed out, it's relying on a "should" behavior from the spec, which is risky but common. It sounds like disabling the "RunFast" mode cleared up their issues, which seems quite far from it being an "obscenely bad" design decision.

        It's not terribly unusual to require IEEE 754 compliance in numerical code, like the usual options for avoiding --ffast-math -style stuff.

        • superbatfish 5 years ago

          Fair enough. My snide question was uncalled for. Sorry. Thanks for the additional info.

    • brandmeyer 5 years ago

      It is a "should", not a "shall".

      Quoth the standard (emphasis mine):

      > For an operation with quiet NaN inputs, other than maximum and minimum operations, if a floating-point result is to be delivered the result shall be a quiet NaN which should be one of the input NaNs.

sbassi 5 years ago

Does R work on Amazon ARM chip (graviton)?

  • gnufx 5 years ago

    Yes, if it's available for the OS you run, like EL or Debian. It's aarch64.

racl101 5 years ago

Installing R on a MacBook, even the older 2019 ones, was nothing short of a fucking nightmare.

Ended up installing on a vagrant machine instead.

  • ImaCake 5 years ago

    I‘ve had few problems with R on my 2020 13’ Macbook but several of my coworkers have struggled with R on theirs. Some of them are very new to programming and likely get stumped by what I would consider “simple“ bugs.

    • psychometry 5 years ago

      Installing even common packages like data.table require mucking around with R's makevars. There's no common set of variables that "just work", since different packages need different compilers to install.

      • _2d30 5 years ago

        This is only true if you insist on installing with OpenMP support.

        I can think of maybe 3-5 packages, most relatively low use, that have intricacies required to install.

  • _2d30 5 years ago

    I think maybe once in the past 6yrs I’ve had an issue with `brew install R` and I’m a power R user (upgrade regularly).

    How we were you attempting to install? Build from source?

  • kristjansson 5 years ago

    Installing R through conda is a PITA esp. for packages that aren’t in conda-forge yet, I’ll give you that.

    Installing through homebrew or using the R project builds is very smooth in my experience

kevin_b_er 5 years ago

Will Apple even allow compilers that aren't Apple Compilers? They're not allowed on any other Apple Silicon.

  • andromeduck 5 years ago

    As long as they support LLVM it shouldn't it be mostly painless?

  • why_only_15 5 years ago

    Are you thinking of JITs? And yes those are allowed.

Thaxll 5 years ago

The question is why in 2020 R still uses a Fortran compiler?

  • _Wintermute 5 years ago

    Because it's R, it doesn't even have 64 bit integers yet.

    • rodonn 5 years ago

      It does with the bit64 package, but agree that I wish it was directly supported.

  • anbende 5 years ago

    This is answered in the article. Chunks of R are written in Fortran 90, which can’t be converted to C easily right now.

    You might ask why it’s written in Fortran at all. Probably has something to do with its history coming out of the S language at Bell labs in the 70s and 80s.

  • mmrezaie 5 years ago

    Fortran is domninant in HPC (Maybe not dominant, but there are a lot of software in HPC written in Fortran). R uses some performance oriented libraries which most likely implemented in Fortran.

  • uberman 5 years ago

    Fortan is berserkly fast.

  • bregma 5 years ago

    I'm sure someone will come along with a Rust rewrite any day now.

  • gh02t 5 years ago

    Tons of software are still using Fortran in the scientific/numerical world. NumPy and SciPy, for example, also make extremely heavy use of Fortran. As would many things that rely on BLAS/LAPACK.

    It's not just legacy code, either... Fortran is still very active in its own little niche in the numerical world.

    • danpalmer 5 years ago

      Yep, I think the issue with R is that they're using a customised version of BLAS/LAPACK – Python has been running these things on Raspberry Pis for ages now, I suspect using a more standard implementation.

      • em500 5 years ago

        No, the issues are different, please read the article. R also runs fine on ARM64 linux. But macOS is not linux, as mentioned in the article it has a different ABI and no free Fortran 90 compiler is available yet.

        The other issue is that R distinguishes between NA values and NaN values (NumPy doesn't), which are propagated differently on ARM64.

      • gh02t 5 years ago

        Based on the article I don't think those are the problem. I think the new Apple silicon is distinct enough that it needs a bit of porting effort to get a Fortran compiler running, along with the issue of quirks in handling NaN payloads and some other (seemingly rather minor) differences.

  • ryukafalz 5 years ago

    If it works well, why change it?

  • geofft 5 years ago

    Fortran is the standard implementation language for scientific computing in 2020. Try compiling NumPy and SciPy from source sometime.

  • baron_harkonnen 5 years ago

    Fortran is still a foundation of many important libraries for numeric computing. A fairly large number of implementations of BLAS are written in Fortran (including the reference implementation), LAPACK is written in Fortran.

    One of the first domains solved by programming was efficient implementations of common linear algebra computations. Fortran was the original language of choice for many of those projects. When you care about absolutely optimal performance for these computations you're not going to mess with fined tuned code that has been slowly tweaked and improved for over 50 years.

  • pdpi 5 years ago

    If I recall correctly, C and C++ allow some types of pointer aliasing that fortran forbids. If you're reading from one buffer and writing to another, those buffers can overlap in C or C++, but can't in fortran, so a fortran compiler is allowed more leeway with the way instructions are ordered (and maybe elided? Not sure). In compute-intensive workloads, every little bit helps.

    • rlkf 5 years ago

      Correct. Back in the days, it used to be that you could run Fortran 77 code through f2c and then compile with gcc using the assume-no-aliasing option, and you would get roughly the same performance as if you compiled with f77.

    • johncolanduoni 5 years ago

      It's also worth noting that although modern C/C++ has the `restrict` keyword, compilers for those languages are generally worse at actually using that information. For example, there's been a long running series of LLVM bugs that has (several times) required Rust to stop emitting that metadata because it would actually miscompile fairly simple vector operations that used `restrict`[1]. I'm hopeful that Flang (the Fortran LLVM compiler) will shake most of those out, since there's a large body of Fortran code that relies on good aliasing optimizations.

      [1]: https://github.com/rust-lang/rust/issues/54878

      • saagarjha 5 years ago

        Just C. The various “restrict equivalents” in C++ are non-standard.

  • ForHackernews 5 years ago

    Fortran is crazy fast and the standard language in many scientific domains where performance matters.

  • gspr 5 years ago

    > The question is why in 2020 R still uses a Fortran compiler?

    You will find that almost any scientific codebase of any size includes or relies on at least some Fortran code.