-
Notifications
You must be signed in to change notification settings - Fork 0
Numeric array as atom take 2 #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
6b65a80
607a53a
36ee614
2814dd5
9cb4da1
c2b0ace
088f68a
77b8555
21e7b31
9ab56ff
85b627d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -698,25 +698,24 @@ def eval(self, expr, evaluation: Evaluation, expression: Expression): | |
| if not hasattr(expr, "items"): | ||
| evaluation.message("First", "normal", Integer1, expression) | ||
| return | ||
| expr_len = len(expr.items) | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Generalized this to work with any Atom that defines .items. Suggestion: to simplify some of this code, add to Expression a method that returns .elements, .items, or None depending on which are defined. Maybe call it get_parts() as I think it is precisely what the Parts builtin operates on. Note that I did not make this generalization in Last, as it is awaiting DRYing. Suggestion: generalize eval_Part in a similar way (operate on .items or .elements by calling get_parts as I suggested above) and DRY both First and Last to just call eval_Part(..., 0) and eval_Part(..., -1) respectively.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd like to mull this over. My personal trepidation using the name "part" is that Part in Mathematica more closely corresponds to a Python or Golang slice as best I can tell. But is that what's meant here? FYI the common DRY'd part of First and Last would probably live under mathics.eval, not mathics.builtin. The current philosopy is to only do signature matching and argument checking in mathics.builtin. while mathics.core has the function part. By doing this, internal routines can call internal routines without having to go through the tedious and slow lookup process. Also, an instruction interpreter doesn't do OO, it is purely a function call kind of thing.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For List, NumericArray, and ByteArray, I think they are essentially slices, and I think they are exactly what Part with a single index will return. Maybe there are other kinds of things that have .elements or .items where this is not true? Maybe we can discuss more on Tuesday after you've had a chance to mull it over.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given how general Mathematica Part is compared to Python's and GoLang's slices, maybe "slice" is a better name since it better matches the more limited here? Personally, I like "slice" for this concept over "Part". There is something about Mathematica where the terms used while not wrong, can tend not to be the conventional terms. But if something strongly matches Mathematica, I have been advocating using the Mathematica term.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They also aren't slices in the full generality of what you can do with slices in Python* either. While it's true that Part can return more kinds of things, it still seems natural to me to call them parts because they are a kind of thing that can be returned by Part. But ultimately from my perspective it's your call. * Python slices for lists have start,stop,step. And for numpy arrays you can do slicing along any and all axes, e.g. for a three dimensional array you can do a[0, 3:5, [1,2,7]] which picks out index 0 on axis 0, indexes 3,4,5 on axis 1, and indexes 1,2,7 on axis 2. This works because Python just passes whatever is between the [] to __getitem__. It gets even funkier - you aren't limited to passing lists like [1,2,7] in, you can pass numpy arrays containing indexes; I think this inserts extra dimensions at that spot in the shape, iirc.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the information. You've convinced me. (Golang's slices are very restrictive; but that's golang.) |
||
| parts = expr.items | ||
| else: | ||
| expr_len = len(expr.elements) | ||
| parts = expr.elements | ||
|
|
||
| expr_len = len(parts) | ||
| if expr_len == 0: | ||
| evaluation.message("First", "nofirst", expr) | ||
| return | ||
|
|
||
| if isinstance(expr, ByteArray): | ||
| return expr.items[0] | ||
|
|
||
| if expr_len > 2 and expr.head is SymbolSequence: | ||
| if expr_len > 2 and expr.get_head() is SymbolSequence: | ||
| evaluation.message( | ||
| "First", "argt", SymbolFirst, Integer(expr_len), Integer1, Integer2 | ||
| ) | ||
| return | ||
|
|
||
| first_elem = expr.elements[0] | ||
| first_elem = parts[0] | ||
|
|
||
| if expr.head == SymbolSequence or ( | ||
| if expr.get_head() == SymbolSequence or ( | ||
| not isinstance(expr, ListExpression) | ||
| and len == 2 | ||
| and isinstance(first_elem, Atom) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| # -*- coding: utf-8 -*- | ||
| """Rules for working with NumericArray atoms.""" | ||
|
|
||
| from typing import Optional, Tuple | ||
|
|
||
| try: # pragma: no cover - numpy is optional at runtime | ||
| import numpy | ||
| except ImportError: # pragma: no cover - handled via requires attribute | ||
| numpy = None | ||
|
|
||
| from mathics.core.atoms import NumericArray, NUMERIC_ARRAY_TYPE_MAP, String | ||
| from mathics.core.builtin import Builtin | ||
| from mathics.core.convert.python import from_python | ||
| from mathics.core.symbols import Symbol, strip_context | ||
| from mathics.core.systemsymbols import SymbolAutomatic, SymbolFailed, SymbolNumericArray | ||
|
|
||
|
|
||
| # class name modeled on Complex_ to avoid collision with NumericArray atom | ||
| class NumericArray_(Builtin): | ||
|
|
||
| summary_text = "construct NumericArray" | ||
| name = "NumericArray" | ||
| rules = { | ||
| "NumericArray[list_List]": "NumericArray[list, Automatic]" | ||
| } | ||
| messages = { | ||
| "type": "The type specification `1` is not supported in NumericArray.", | ||
| } | ||
|
|
||
| # rule to convert NumericArray[...nested list...] expression to NumericArray atom | ||
| def eval_list(self, data, typespec, evaluation): | ||
| "System`NumericArray[data_List, typespec_]" | ||
|
|
||
| # get a string key from the typespec | ||
| if isinstance(typespec, Symbol): | ||
| key = strip_context(typespec.get_name()) | ||
| elif isinstance(typespec, String): | ||
| key = typespec.value | ||
| else: | ||
| evaluation.message("NumericArray", "type", typespec) | ||
| return SymbolFailed | ||
|
|
||
| # compute numpy dtype from key | ||
| if key == "Automatic": | ||
| dtype = None | ||
| else: | ||
| dtype = NUMERIC_ARRAY_TYPE_MAP.get(key, None) | ||
| if not dtype: | ||
| evaluation.message("NumericArray", "type", typespec) | ||
| return SymbolFailed | ||
|
|
||
| # compute array from data and dtype and wrap it in a NumericArray atom | ||
| python_value = data.to_python() | ||
| array = numpy.array(python_value, dtype=dtype) | ||
| atom = NumericArray(array, dtype) | ||
|
|
||
| return atom | ||
|
|
||
| # this doesn't work | ||
| # instead see Normal builtin in mathics/builtin/list/constructing.py | ||
| #def eval_normal(self, array, evaluation): | ||
| # "System`Normal[array_NumericArray]" | ||
| # return from_python(array.value.tolist()) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,10 +10,16 @@ | |
| import sympy | ||
| from sympy.core import numbers as sympy_numbers | ||
|
|
||
| try: # pragma: no cover - optional dependency handled at runtime | ||
| import numpy | ||
| except ImportError: # pragma: no cover - numpy is optional at import time | ||
| numpy = None | ||
|
|
||
| from mathics.core.element import BoxElementMixin, ImmutableValueMixin | ||
| from mathics.core.keycomparable import ( | ||
| BASIC_ATOM_BYTEARRAY_ELT_ORDER, | ||
| BASIC_ATOM_NUMBER_ELT_ORDER, | ||
| BASIC_ATOM_NUMERICARRAY_ELT_ORDER, | ||
| BASIC_ATOM_STRING_ELT_ORDER, | ||
| ) | ||
| from mathics.core.number import ( | ||
|
|
@@ -1098,6 +1104,125 @@ def is_zero(self) -> bool: | |
| } | ||
|
|
||
|
|
||
| # | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I propose to split this off (together with a couple minor changes in other files, but not the NumericArray builtin, Part, First, Normal, etc.) into a separate PR - what do you think?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Deferring Part, First, Normal, etc. is fine. As you've seen, it's okay to defer implementation when it's made explicit what's needs to be done (and that in of itself will be long and involved). I had in the back of my mind to split out the mathics.atom into mathics.atom.numeric, mathics.atom.array, mathics.atom.symbol, and mathics.atom.string (or something like that). But since that will be another revision here, I suppose I can defer until after NumericArray part 1.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. So here's what I think should go in Part 1 from this PR: How would you like to proceed?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Great!
Perfectly fine.
Up to you.
Up to you.
Will probably do in any event. |
||
| # NumericArray | ||
| # | ||
|
|
||
| if numpy is not None: | ||
| NUMERIC_ARRAY_TYPE_MAP = { | ||
| "UnsignedInteger8": numpy.dtype("uint8"), | ||
| "UnsignedInteger16": numpy.dtype("uint16"), | ||
| "UnsignedInteger32": numpy.dtype("uint32"), | ||
| "UnsignedInteger64": numpy.dtype("uint64"), | ||
| "Integer8": numpy.dtype("int8"), | ||
| "Integer16": numpy.dtype("int16"), | ||
| "Integer32": numpy.dtype("int32"), | ||
| "Integer64": numpy.dtype("int64"), | ||
| "Real32": numpy.dtype("float32"), | ||
| "Real64": numpy.dtype("float64"), | ||
| "ComplexReal32": numpy.dtype("complex64"), | ||
| "ComplexReal64": numpy.dtype("complex128"), | ||
| } | ||
| NUMERIC_ARRAY_DTYPE_TO_NAME = { | ||
| dtype: name for name, dtype in NUMERIC_ARRAY_TYPE_MAP.items() | ||
| } | ||
| else: # pragma: no cover - executed only when numpy is absent | ||
| NUMERIC_ARRAY_TYPE_MAP = {} | ||
| NUMERIC_ARRAY_DTYPE_TO_NAME = {} | ||
|
|
||
|
|
||
| # TODO: would it be useful to follow the example of Complex and parameterize by type? | ||
| # would that be array.dtype or the MMA type from the map above? | ||
| class NumericArray(Atom, ImmutableValueMixin): | ||
| """ | ||
| NumericArray provides compact storage and efficient access for machine-precision numeric arrays, | ||
| backed by NumPy arrays. | ||
| """ | ||
|
|
||
| class_head_name = "NumericArray" | ||
|
|
||
| def __init__(self, value, dtype=None): | ||
| if numpy is None: | ||
| raise ImportError("numpy is required for NumericArray") | ||
|
|
||
| # compute value | ||
| if not isinstance(value, numpy.ndarray): | ||
| value = numpy.asarray(value, dtype=dtype) | ||
| elif dtype is not None: | ||
| value = value.astype(dtype) | ||
| self.value = value | ||
|
|
||
| # check type | ||
| self._type_name = NUMERIC_ARRAY_DTYPE_TO_NAME.get(self.value.dtype, None) | ||
| if not self._type_name: | ||
| allowed = ", ".join(str(dtype) for dtype in NUMERIC_ARRAY_TYPE_MAP.values()) | ||
| message = f"Argument 'value' must be one of {allowed}; is {str(self.value.dtype)}." | ||
| raise ValueError(message) | ||
|
|
||
| # summary and hash | ||
| shape_string = "×".join(str(dim) for dim in self.value.shape) or "0" | ||
| self._summary_string = f"{self._type_name}, {shape_string}" | ||
| self._hash = None | ||
|
|
||
| # TODO: this is potentially expensive - what if we left it unimplemented? is hashing a numpy array reasonable? | ||
| # TODO: to make it less expensive only look at first 100 bytes - ok? needed? | ||
| def __hash__(self): | ||
| if not self._hash: | ||
| self._hash = hash(("NumericArray", self.value.shape, self.value.tobytes()[:100])) | ||
| return self._hash | ||
|
|
||
| def __str__(self) -> str: | ||
| return f"NumericArray[{self._summary_string}]" | ||
|
|
||
| def atom_to_boxes(self, f, evaluation): | ||
| return String(f"<{self._summary_string}>") | ||
|
|
||
| def do_copy(self) -> "NumericArray": | ||
| return NumericArray(self.value.copy()) | ||
|
|
||
| def default_format(self, evaluation, form) -> str: | ||
| return f"NumericArray[<{self._summary_string}>]" | ||
|
|
||
| @property | ||
| def items(self) -> Tuple: | ||
| from mathics.core.convert.python import from_python | ||
| if len(self.value.shape) == 1: | ||
| return tuple(from_python(item.item()) for item in self.value) | ||
| else: | ||
| return tuple(NumericArray(array) for array in self.value) | ||
|
|
||
| @property | ||
| def element_order(self) -> tuple: | ||
| return ( | ||
| BASIC_ATOM_NUMERICARRAY_ELT_ORDER, | ||
| self.value.shape, | ||
| self.value.dtype, | ||
| self.value.tobytes() | ||
| ) | ||
|
|
||
| @property | ||
| def pattern_precedence(self) -> tuple: | ||
| return super().pattern_precedence | ||
|
|
||
| def sameQ(self, rhs) -> bool: | ||
| return isinstance(rhs, NumericArray) and numpy.array_equal(self.value, rhs.value) | ||
|
|
||
| def to_sympy(self, **kwargs): | ||
| return None | ||
|
|
||
| # TODO: note that this returns a simple python list (of lists), | ||
| # not the numpy array - ok? | ||
| def to_python(self, *args, **kwargs): | ||
| return self.value.tolist() | ||
|
|
||
| # TODO: what is this? is it right? | ||
| def user_hash(self, update): | ||
| update(self._summary[2]) | ||
|
|
||
| def __getnewargs__(self): | ||
| return (self.value, self.value.dtype) | ||
|
|
||
|
|
||
| class String(Atom, BoxElementMixin): | ||
| value: str | ||
| class_head_name = "System`String" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,7 +5,12 @@ | |
|
|
||
| from typing import Any | ||
|
|
||
| from mathics.core.atoms import Complex, Integer, Rational, Real, String | ||
| try: # pragma: no cover - numpy is optional | ||
| import numpy | ||
| except ImportError: # pragma: no cover - optional dependency missing | ||
| numpy = None | ||
|
|
||
| from mathics.core.atoms import Complex, Integer, NumericArray, Rational, Real, String | ||
| from mathics.core.number import get_type | ||
| from mathics.core.symbols import ( | ||
| BaseElement, | ||
|
|
@@ -113,5 +118,7 @@ def from_python(arg: Any) -> BaseElement: | |
| from mathics.builtin.binary.bytearray import ByteArray | ||
|
|
||
| return Expression(SymbolByteArray, ByteArray(arg)) | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rocky Shouldn't this just return ByteArray(arg)?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch, Yes, it should. |
||
| elif numpy is not None and isinstance(arg, numpy.ndarray): | ||
| return NumericArray(arg) | ||
| else: | ||
| raise NotImplementedError | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,74 @@ | ||
| from mathics.core.atoms import Integer, NumericArray, String | ||
| from mathics.core.convert.python import from_python | ||
| from test.helper import check_evaluation, evaluate | ||
|
|
||
| import numpy as np | ||
| import pytest | ||
|
|
||
| # | ||
| # Python API tests | ||
| # | ||
|
|
||
| def test_numericarray_atom_preserves_array_reference(): | ||
| array = np.array([1, 2, 3], dtype=np.int64) | ||
| atom = NumericArray(array) | ||
| assert atom.value is array | ||
|
|
||
| def test_numericarray_atom_preserves_equality(): | ||
| array = np.array([1, 2, 3], dtype=np.int64) | ||
| atom = NumericArray(array, dtype=np.float64) | ||
| np.testing.assert_array_equal(atom.value, array) | ||
|
|
||
|
|
||
| def test_numericarray_expression_from_python_array(): | ||
| array = np.array([[1.0, 2.0], [3.0, 4.0]], dtype=np.float32) | ||
| atom = from_python(array) | ||
| assert isinstance(atom, NumericArray) | ||
| assert atom.value is array | ||
|
|
||
| def test_numericarray_hash(): | ||
| a = [[1, 2], [3, 4]] | ||
| array1a = np.array(a, dtype=np.float32) | ||
| atom1a = from_python(array1a) | ||
| array1b = np.array(a, dtype=np.float32) | ||
| atom1b = from_python(array1a) | ||
| array2 = np.array(a, dtype=np.float64) | ||
| atom2 = from_python(array2) | ||
| assert hash(atom1a) == hash(atom1b), "hashes of different arrays with same value should be same" | ||
| assert hash(atom1a) != hash(atom2), "hashes of arrays with different values should be different" | ||
|
|
||
|
|
||
| # | ||
| # WL tests | ||
| # | ||
|
|
||
| @pytest.mark.parametrize( | ||
| ("str_expr", "str_expected"), | ||
| [ | ||
| # TODO: remove this - was temp to see if Normal still worked on ByteArray after changes | ||
| #("Normal[ByteArray[{1,2}]]", "{1, 2}"), | ||
| ("NumericArray[{{1,2},{3,4}}]", "<Integer64, 2×2>"), | ||
| ("ToString[NumericArray[{{1,2},{3,4}}]]", "<Integer64, 2×2>"), | ||
| ("Head[NumericArray[{1,2}]]", "NumericArray"), | ||
| ("AtomQ[NumericArray[{1,2}]]", "True"), | ||
| ("First[NumericArray[{1,2,3}]]", "1"), | ||
| ("First[NumericArray[{{1,2}, {3,4}}]]", "<Integer64, 2>"), | ||
| # TODO: these do not work yet | ||
| # change was applied to First to make that work, | ||
| # but did not want to change Last because it is awaiting DRYing | ||
| #("Last[NumericArray[{1,2,3}]]", "3"), | ||
| #("Last[NumericArray[{{1,2}, {3,4}}]]", "<Integer64, 2>"), | ||
| ("Normal[NumericArray[{{1,2}, {3,4}}]]", "{{1, 2}, {3, 4}}"), | ||
| ] | ||
| ) | ||
| def test_basics(str_expr, str_expected): | ||
| check_evaluation(str_expr, str_expected, hold_expected=True) | ||
|
|
||
| def test_type_conversion(): | ||
| expr = evaluate("NumericArray[{1,2}]") | ||
| assert isinstance(expr, NumericArray) | ||
| assert expr.value.dtype == np.int64 | ||
| expr = evaluate('NumericArray[{1,2}, "ComplexReal32"]') | ||
| assert expr.value.dtype == np.complex64 | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generalized this to work with any atom that defines .items