Google Dataflow: global name is not defined - apache beam
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
In local I have this:
from shapely.geometry import Point
<...>
class GeoDataIngestion:
def parse_method(self, string_input):
Place = Point(float(values[2]), float(values[3]))
<...>
I run this, with python 2.7 and all goes well
After that, I try to test it with the dataflow runner but while running I got this error:
NameError: global name 'Point' is not defined
The pipeline:
geo_data = (raw_data
| 'Geo data transform' >> beam.Map(lambda s: geo_ingestion.parse_method(s))
I have read other post and I think this should work, but i'm not sure if there are something special with Google Dataflow in this
I also tried:
import shapely.geometry
<...>
Place = shapely.geometry.Point(float(values[2]), float(values[3]))
With the same result
NameError: global name 'shapely' is not defined
Any idea?
In Google Cloud, If I tried in my virtual enviroment, I can do it without any problem:
(env) ...@cloudshell:~ ()$ python
Python 2.7.13 (default, Sep 26 2018, 18:42:22)
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from shapely.geometry import Point
Var = Point(-5.020751953125, 39.92237576385941)
EXTRA:
Error using requirements.txt
Collecting Shapely==1.6.4.post1 (from -r req.txt (line 2))
Using cached https://files.pythonhosted.org/packages/7d/3c/0f09841db07aabf9cc387662be646f181d07ed196e6f60ce8be5f4a8f0bd/Shapely-1.6.4.post1.tar.gz
Saved c:<...>shapely-1.6.4.post1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "c:<...>temppip-download-kpg5caShapelysetup.py", line 80, in <module>
from shapely._buildcfg import geos_version_string, geos_version,
File "shapely_buildcfg.py", line 200, in <module>
lgeos = CDLL("geos_c.dll")
File "C:Python27Libctypes__init__.py", line 366, in __init__
self._handle = _dlopen(self._name, mode)
WindowsError: [Error 126] No se puede encontrar el m¢dulo especificado
Error using setup.py
Setup.py like this changing:
CUSTOM_COMMANDS = [
['apt-get', 'update'],
['apt-get', '--assume-yes', 'install', 'libgeos-dev'],
['pip', 'install', 'Shapely'],
['echo', 'Custom command worked!']
]
The result is like no packet would be installed, because I get the error from the beginning:
NameError: global name 'Point' is not defined
setup.py file:
from __future__ import absolute_import
from __future__ import print_function
import subprocess
from distutils.command.build import build as _build
import setuptools
class build(_build): # pylint: disable=invalid-name
sub_commands = _build.sub_commands + [('CustomCommands', None)]
CUSTOM_COMMANDS = [
['apt-get', 'update'],
['apt-get', '--assume-yes', 'install', 'libgeos-dev'],
['pip', 'install', 'Shapely']]
class CustomCommands(setuptools.Command):
def initialize_options(self):
pass
def finalize_options(self):
pass
def RunCustomCommand(self, command_list):
print('Running command: %s' % command_list)
p = subprocess.Popen(
command_list,
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# Can use communicate(input='yn'.encode()) if the command run requires
# some confirmation.
stdout_data, _ = p.communicate()
print('Command output: %s' % stdout_data)
if p.returncode != 0:
raise RuntimeError(
'Command %s failed: exit code: %s' % (command_list, p.returncode))
def run(self):
for command in CUSTOM_COMMANDS:
self.RunCustomCommand(command)
REQUIRED_PACKAGES = ['Shapely']
setuptools.setup(
name='dataflow',
version='0.0.1',
description='Dataflow set workflow package.',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
cmdclass={
'build': build,
'CustomCommands': CustomCommands,
}
)
pipeline options:
pipeline_options = PipelineOptions()
pipeline_options.view_as(StandardOptions).streaming = True
pipeline_options.view_as(SetupOptions).save_main_session = True
pipeline_options.view_as(SetupOptions).setup_file = 'C:<...>setup.py'
with beam.Pipeline(options=pipeline_options) as p:
The call:
python -m dataflow --project XXX --temp_location gs://YYY --runner DataflowRunner --region europe-west1 --setup_file C:<...>setup.py
The beginning log: (before dataflow wait for the data)
INFO:root:Defaulting to the temp_location as staging_location: gs://iotbucketdetector/test/prueba
C:Users<...>~1DesktopPROYEC~2envlibsite-packagesapache_beamrunnersdataflowdataflow_runner.py:816: DeprecationWarning: options is deprecated since First stable release.. References to <pipeline>.options will
not be supported
transform_node.inputs[0].pipeline.options.view_as(StandardOptions))
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pipeline.pb...
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pipeline.pb
INFO:root:Executing command: ['C:\Users\<...>~1\Desktop\PROYEC~2\env\Scripts\python.exe', 'setup.py', 'sdist', '--dist-dir', 'c:\users\<...>~1\appdata\local\temp\tmpakq8bs']
running sdist
running egg_info
writing requirements to dataflow.egg-inforequires.txt
writing dataflow.egg-infoPKG-INFO
writing top-level names to dataflow.egg-infotop_level.txt
writing dependency_links to dataflow.egg-infodependency_links.txt
reading manifest file 'dataflow.egg-infoSOURCES.txt'
writing manifest file 'dataflow.egg-infoSOURCES.txt'
warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md
running check
warning: check: missing required meta-data: url
warning: check: missing meta-data: either (author and author_email) or (maintainer and maintainer_email) must be supplied
creating dataflow-0.0.1
creating dataflow-0.0.1dataflow.egg-info
copying files to dataflow-0.0.1...
copying setup.py -> dataflow-0.0.1
copying dataflow.egg-infoPKG-INFO -> dataflow-0.0.1dataflow.egg-info
copying dataflow.egg-infoSOURCES.txt -> dataflow-0.0.1dataflow.egg-info
copying dataflow.egg-infodependency_links.txt -> dataflow-0.0.1dataflow.egg-info
copying dataflow.egg-inforequires.txt -> dataflow-0.0.1dataflow.egg-info
copying dataflow.egg-infotop_level.txt -> dataflow-0.0.1dataflow.egg-info
Writing dataflow-0.0.1setup.cfg
Creating tar archive
removing 'dataflow-0.0.1' (and everything under it)
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/workflow.tar.gz...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/workflow.tar.gz
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pickled_main_session...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pickled_main_session
INFO:root:Downloading source distribtution of the SDK from PyPi
INFO:root:Executing command: ['C:\Users\<...>~1\Desktop\PROYEC~2\env\Scripts\python.exe', '-m', 'pip', 'download', '--dest', 'c:\users\<...>~1\appdata\local\temp\tmpakq8bs', 'apache-beam==2.5.0', '--no-d
eps', '--no-binary', ':all:']
Collecting apache-beam==2.5.0
Using cached https://files.pythonhosted.org/packages/c6/96/56469c57cb043f36bfdd3786c463fbaeade1e8fcf0593ec7bc7f99e56d38/apache-beam-2.5.0.zip
Saved c:users<...>~1appdatalocaltemptmpakq8bsapache-beam-2.5.0.zip
Successfully downloaded apache-beam
INFO:root:Staging SDK sources from PyPI to gs://<...>-1120074505-586000.1542699905.588000/dataflow_python_sdk.tar
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/dataflow_python_sdk.tar...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/dataflow_python_sdk.tar
INFO:root:Downloading binary distribtution of the SDK from PyPi
INFO:root:Executing command: ['C:\Users\<...>~1\Desktop\PROYEC~2\env\Scripts\python.exe', '-m', 'pip', 'download', '--dest', 'c:\users\<...>~1\appdata\local\temp\tmpakq8bs', 'apache-beam==2.5.0', '--no-d
eps', '--only-binary', ':all:', '--python-version', '27', '--implementation', 'cp', '--abi', 'cp27mu', '--platform', 'manylinux1_x86_64']
Collecting apache-beam==2.5.0
Using cached https://files.pythonhosted.org/packages/ff/10/a59ba412f71fb65412ec7a322de6331e19ec8e75ca45eba7a0708daae31a/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
Saved c:users<...>~1appdatalocaltemptmpakq8bsapache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
Successfully downloaded apache-beam
INFO:root:Staging binary distribution of the SDK from PyPI to gs://<...>-1120074505-586000.1542699905.588000/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
INFO:root:Create job: <Job
createTime: u'2018-11-20T07:45:28.050865Z'
currentStateTime: u'1970-01-01T00:00:00Z'
id: u'2018-11-19_23_45_27-14221834310382472741'
location: u'europe-west1'
name: u'beamapp-<...>-1120074505-586000'
projectId: u'poc-cloud-209212'
stageStates:
steps:
tempFiles:
type: TypeValueValuesEnum(JOB_TYPE_STREAMING, 2)>
python google-cloud-dataflow apache-beam shapely.geometry
add a comment |
In local I have this:
from shapely.geometry import Point
<...>
class GeoDataIngestion:
def parse_method(self, string_input):
Place = Point(float(values[2]), float(values[3]))
<...>
I run this, with python 2.7 and all goes well
After that, I try to test it with the dataflow runner but while running I got this error:
NameError: global name 'Point' is not defined
The pipeline:
geo_data = (raw_data
| 'Geo data transform' >> beam.Map(lambda s: geo_ingestion.parse_method(s))
I have read other post and I think this should work, but i'm not sure if there are something special with Google Dataflow in this
I also tried:
import shapely.geometry
<...>
Place = shapely.geometry.Point(float(values[2]), float(values[3]))
With the same result
NameError: global name 'shapely' is not defined
Any idea?
In Google Cloud, If I tried in my virtual enviroment, I can do it without any problem:
(env) ...@cloudshell:~ ()$ python
Python 2.7.13 (default, Sep 26 2018, 18:42:22)
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from shapely.geometry import Point
Var = Point(-5.020751953125, 39.92237576385941)
EXTRA:
Error using requirements.txt
Collecting Shapely==1.6.4.post1 (from -r req.txt (line 2))
Using cached https://files.pythonhosted.org/packages/7d/3c/0f09841db07aabf9cc387662be646f181d07ed196e6f60ce8be5f4a8f0bd/Shapely-1.6.4.post1.tar.gz
Saved c:<...>shapely-1.6.4.post1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "c:<...>temppip-download-kpg5caShapelysetup.py", line 80, in <module>
from shapely._buildcfg import geos_version_string, geos_version,
File "shapely_buildcfg.py", line 200, in <module>
lgeos = CDLL("geos_c.dll")
File "C:Python27Libctypes__init__.py", line 366, in __init__
self._handle = _dlopen(self._name, mode)
WindowsError: [Error 126] No se puede encontrar el m¢dulo especificado
Error using setup.py
Setup.py like this changing:
CUSTOM_COMMANDS = [
['apt-get', 'update'],
['apt-get', '--assume-yes', 'install', 'libgeos-dev'],
['pip', 'install', 'Shapely'],
['echo', 'Custom command worked!']
]
The result is like no packet would be installed, because I get the error from the beginning:
NameError: global name 'Point' is not defined
setup.py file:
from __future__ import absolute_import
from __future__ import print_function
import subprocess
from distutils.command.build import build as _build
import setuptools
class build(_build): # pylint: disable=invalid-name
sub_commands = _build.sub_commands + [('CustomCommands', None)]
CUSTOM_COMMANDS = [
['apt-get', 'update'],
['apt-get', '--assume-yes', 'install', 'libgeos-dev'],
['pip', 'install', 'Shapely']]
class CustomCommands(setuptools.Command):
def initialize_options(self):
pass
def finalize_options(self):
pass
def RunCustomCommand(self, command_list):
print('Running command: %s' % command_list)
p = subprocess.Popen(
command_list,
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# Can use communicate(input='yn'.encode()) if the command run requires
# some confirmation.
stdout_data, _ = p.communicate()
print('Command output: %s' % stdout_data)
if p.returncode != 0:
raise RuntimeError(
'Command %s failed: exit code: %s' % (command_list, p.returncode))
def run(self):
for command in CUSTOM_COMMANDS:
self.RunCustomCommand(command)
REQUIRED_PACKAGES = ['Shapely']
setuptools.setup(
name='dataflow',
version='0.0.1',
description='Dataflow set workflow package.',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
cmdclass={
'build': build,
'CustomCommands': CustomCommands,
}
)
pipeline options:
pipeline_options = PipelineOptions()
pipeline_options.view_as(StandardOptions).streaming = True
pipeline_options.view_as(SetupOptions).save_main_session = True
pipeline_options.view_as(SetupOptions).setup_file = 'C:<...>setup.py'
with beam.Pipeline(options=pipeline_options) as p:
The call:
python -m dataflow --project XXX --temp_location gs://YYY --runner DataflowRunner --region europe-west1 --setup_file C:<...>setup.py
The beginning log: (before dataflow wait for the data)
INFO:root:Defaulting to the temp_location as staging_location: gs://iotbucketdetector/test/prueba
C:Users<...>~1DesktopPROYEC~2envlibsite-packagesapache_beamrunnersdataflowdataflow_runner.py:816: DeprecationWarning: options is deprecated since First stable release.. References to <pipeline>.options will
not be supported
transform_node.inputs[0].pipeline.options.view_as(StandardOptions))
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pipeline.pb...
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pipeline.pb
INFO:root:Executing command: ['C:\Users\<...>~1\Desktop\PROYEC~2\env\Scripts\python.exe', 'setup.py', 'sdist', '--dist-dir', 'c:\users\<...>~1\appdata\local\temp\tmpakq8bs']
running sdist
running egg_info
writing requirements to dataflow.egg-inforequires.txt
writing dataflow.egg-infoPKG-INFO
writing top-level names to dataflow.egg-infotop_level.txt
writing dependency_links to dataflow.egg-infodependency_links.txt
reading manifest file 'dataflow.egg-infoSOURCES.txt'
writing manifest file 'dataflow.egg-infoSOURCES.txt'
warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md
running check
warning: check: missing required meta-data: url
warning: check: missing meta-data: either (author and author_email) or (maintainer and maintainer_email) must be supplied
creating dataflow-0.0.1
creating dataflow-0.0.1dataflow.egg-info
copying files to dataflow-0.0.1...
copying setup.py -> dataflow-0.0.1
copying dataflow.egg-infoPKG-INFO -> dataflow-0.0.1dataflow.egg-info
copying dataflow.egg-infoSOURCES.txt -> dataflow-0.0.1dataflow.egg-info
copying dataflow.egg-infodependency_links.txt -> dataflow-0.0.1dataflow.egg-info
copying dataflow.egg-inforequires.txt -> dataflow-0.0.1dataflow.egg-info
copying dataflow.egg-infotop_level.txt -> dataflow-0.0.1dataflow.egg-info
Writing dataflow-0.0.1setup.cfg
Creating tar archive
removing 'dataflow-0.0.1' (and everything under it)
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/workflow.tar.gz...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/workflow.tar.gz
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pickled_main_session...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pickled_main_session
INFO:root:Downloading source distribtution of the SDK from PyPi
INFO:root:Executing command: ['C:\Users\<...>~1\Desktop\PROYEC~2\env\Scripts\python.exe', '-m', 'pip', 'download', '--dest', 'c:\users\<...>~1\appdata\local\temp\tmpakq8bs', 'apache-beam==2.5.0', '--no-d
eps', '--no-binary', ':all:']
Collecting apache-beam==2.5.0
Using cached https://files.pythonhosted.org/packages/c6/96/56469c57cb043f36bfdd3786c463fbaeade1e8fcf0593ec7bc7f99e56d38/apache-beam-2.5.0.zip
Saved c:users<...>~1appdatalocaltemptmpakq8bsapache-beam-2.5.0.zip
Successfully downloaded apache-beam
INFO:root:Staging SDK sources from PyPI to gs://<...>-1120074505-586000.1542699905.588000/dataflow_python_sdk.tar
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/dataflow_python_sdk.tar...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/dataflow_python_sdk.tar
INFO:root:Downloading binary distribtution of the SDK from PyPi
INFO:root:Executing command: ['C:\Users\<...>~1\Desktop\PROYEC~2\env\Scripts\python.exe', '-m', 'pip', 'download', '--dest', 'c:\users\<...>~1\appdata\local\temp\tmpakq8bs', 'apache-beam==2.5.0', '--no-d
eps', '--only-binary', ':all:', '--python-version', '27', '--implementation', 'cp', '--abi', 'cp27mu', '--platform', 'manylinux1_x86_64']
Collecting apache-beam==2.5.0
Using cached https://files.pythonhosted.org/packages/ff/10/a59ba412f71fb65412ec7a322de6331e19ec8e75ca45eba7a0708daae31a/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
Saved c:users<...>~1appdatalocaltemptmpakq8bsapache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
Successfully downloaded apache-beam
INFO:root:Staging binary distribution of the SDK from PyPI to gs://<...>-1120074505-586000.1542699905.588000/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
INFO:root:Create job: <Job
createTime: u'2018-11-20T07:45:28.050865Z'
currentStateTime: u'1970-01-01T00:00:00Z'
id: u'2018-11-19_23_45_27-14221834310382472741'
location: u'europe-west1'
name: u'beamapp-<...>-1120074505-586000'
projectId: u'poc-cloud-209212'
stageStates:
steps:
tempFiles:
type: TypeValueValuesEnum(JOB_TYPE_STREAMING, 2)>
python google-cloud-dataflow apache-beam shapely.geometry
Are you usingsave_main_session = True
for global context as in this example?
– Guillem Xercavins
Nov 16 '18 at 16:39
Yes:pipeline_options.view_as(SetupOptions).save_main_session = True
andpipeline_options.view_as(StandardOptions).streaming = True
– IoT user
Nov 19 '18 at 7:57
add a comment |
In local I have this:
from shapely.geometry import Point
<...>
class GeoDataIngestion:
def parse_method(self, string_input):
Place = Point(float(values[2]), float(values[3]))
<...>
I run this, with python 2.7 and all goes well
After that, I try to test it with the dataflow runner but while running I got this error:
NameError: global name 'Point' is not defined
The pipeline:
geo_data = (raw_data
| 'Geo data transform' >> beam.Map(lambda s: geo_ingestion.parse_method(s))
I have read other post and I think this should work, but i'm not sure if there are something special with Google Dataflow in this
I also tried:
import shapely.geometry
<...>
Place = shapely.geometry.Point(float(values[2]), float(values[3]))
With the same result
NameError: global name 'shapely' is not defined
Any idea?
In Google Cloud, If I tried in my virtual enviroment, I can do it without any problem:
(env) ...@cloudshell:~ ()$ python
Python 2.7.13 (default, Sep 26 2018, 18:42:22)
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from shapely.geometry import Point
Var = Point(-5.020751953125, 39.92237576385941)
EXTRA:
Error using requirements.txt
Collecting Shapely==1.6.4.post1 (from -r req.txt (line 2))
Using cached https://files.pythonhosted.org/packages/7d/3c/0f09841db07aabf9cc387662be646f181d07ed196e6f60ce8be5f4a8f0bd/Shapely-1.6.4.post1.tar.gz
Saved c:<...>shapely-1.6.4.post1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "c:<...>temppip-download-kpg5caShapelysetup.py", line 80, in <module>
from shapely._buildcfg import geos_version_string, geos_version,
File "shapely_buildcfg.py", line 200, in <module>
lgeos = CDLL("geos_c.dll")
File "C:Python27Libctypes__init__.py", line 366, in __init__
self._handle = _dlopen(self._name, mode)
WindowsError: [Error 126] No se puede encontrar el m¢dulo especificado
Error using setup.py
Setup.py like this changing:
CUSTOM_COMMANDS = [
['apt-get', 'update'],
['apt-get', '--assume-yes', 'install', 'libgeos-dev'],
['pip', 'install', 'Shapely'],
['echo', 'Custom command worked!']
]
The result is like no packet would be installed, because I get the error from the beginning:
NameError: global name 'Point' is not defined
setup.py file:
from __future__ import absolute_import
from __future__ import print_function
import subprocess
from distutils.command.build import build as _build
import setuptools
class build(_build): # pylint: disable=invalid-name
sub_commands = _build.sub_commands + [('CustomCommands', None)]
CUSTOM_COMMANDS = [
['apt-get', 'update'],
['apt-get', '--assume-yes', 'install', 'libgeos-dev'],
['pip', 'install', 'Shapely']]
class CustomCommands(setuptools.Command):
def initialize_options(self):
pass
def finalize_options(self):
pass
def RunCustomCommand(self, command_list):
print('Running command: %s' % command_list)
p = subprocess.Popen(
command_list,
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# Can use communicate(input='yn'.encode()) if the command run requires
# some confirmation.
stdout_data, _ = p.communicate()
print('Command output: %s' % stdout_data)
if p.returncode != 0:
raise RuntimeError(
'Command %s failed: exit code: %s' % (command_list, p.returncode))
def run(self):
for command in CUSTOM_COMMANDS:
self.RunCustomCommand(command)
REQUIRED_PACKAGES = ['Shapely']
setuptools.setup(
name='dataflow',
version='0.0.1',
description='Dataflow set workflow package.',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
cmdclass={
'build': build,
'CustomCommands': CustomCommands,
}
)
pipeline options:
pipeline_options = PipelineOptions()
pipeline_options.view_as(StandardOptions).streaming = True
pipeline_options.view_as(SetupOptions).save_main_session = True
pipeline_options.view_as(SetupOptions).setup_file = 'C:<...>setup.py'
with beam.Pipeline(options=pipeline_options) as p:
The call:
python -m dataflow --project XXX --temp_location gs://YYY --runner DataflowRunner --region europe-west1 --setup_file C:<...>setup.py
The beginning log: (before dataflow wait for the data)
INFO:root:Defaulting to the temp_location as staging_location: gs://iotbucketdetector/test/prueba
C:Users<...>~1DesktopPROYEC~2envlibsite-packagesapache_beamrunnersdataflowdataflow_runner.py:816: DeprecationWarning: options is deprecated since First stable release.. References to <pipeline>.options will
not be supported
transform_node.inputs[0].pipeline.options.view_as(StandardOptions))
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pipeline.pb...
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pipeline.pb
INFO:root:Executing command: ['C:\Users\<...>~1\Desktop\PROYEC~2\env\Scripts\python.exe', 'setup.py', 'sdist', '--dist-dir', 'c:\users\<...>~1\appdata\local\temp\tmpakq8bs']
running sdist
running egg_info
writing requirements to dataflow.egg-inforequires.txt
writing dataflow.egg-infoPKG-INFO
writing top-level names to dataflow.egg-infotop_level.txt
writing dependency_links to dataflow.egg-infodependency_links.txt
reading manifest file 'dataflow.egg-infoSOURCES.txt'
writing manifest file 'dataflow.egg-infoSOURCES.txt'
warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md
running check
warning: check: missing required meta-data: url
warning: check: missing meta-data: either (author and author_email) or (maintainer and maintainer_email) must be supplied
creating dataflow-0.0.1
creating dataflow-0.0.1dataflow.egg-info
copying files to dataflow-0.0.1...
copying setup.py -> dataflow-0.0.1
copying dataflow.egg-infoPKG-INFO -> dataflow-0.0.1dataflow.egg-info
copying dataflow.egg-infoSOURCES.txt -> dataflow-0.0.1dataflow.egg-info
copying dataflow.egg-infodependency_links.txt -> dataflow-0.0.1dataflow.egg-info
copying dataflow.egg-inforequires.txt -> dataflow-0.0.1dataflow.egg-info
copying dataflow.egg-infotop_level.txt -> dataflow-0.0.1dataflow.egg-info
Writing dataflow-0.0.1setup.cfg
Creating tar archive
removing 'dataflow-0.0.1' (and everything under it)
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/workflow.tar.gz...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/workflow.tar.gz
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pickled_main_session...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pickled_main_session
INFO:root:Downloading source distribtution of the SDK from PyPi
INFO:root:Executing command: ['C:\Users\<...>~1\Desktop\PROYEC~2\env\Scripts\python.exe', '-m', 'pip', 'download', '--dest', 'c:\users\<...>~1\appdata\local\temp\tmpakq8bs', 'apache-beam==2.5.0', '--no-d
eps', '--no-binary', ':all:']
Collecting apache-beam==2.5.0
Using cached https://files.pythonhosted.org/packages/c6/96/56469c57cb043f36bfdd3786c463fbaeade1e8fcf0593ec7bc7f99e56d38/apache-beam-2.5.0.zip
Saved c:users<...>~1appdatalocaltemptmpakq8bsapache-beam-2.5.0.zip
Successfully downloaded apache-beam
INFO:root:Staging SDK sources from PyPI to gs://<...>-1120074505-586000.1542699905.588000/dataflow_python_sdk.tar
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/dataflow_python_sdk.tar...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/dataflow_python_sdk.tar
INFO:root:Downloading binary distribtution of the SDK from PyPi
INFO:root:Executing command: ['C:\Users\<...>~1\Desktop\PROYEC~2\env\Scripts\python.exe', '-m', 'pip', 'download', '--dest', 'c:\users\<...>~1\appdata\local\temp\tmpakq8bs', 'apache-beam==2.5.0', '--no-d
eps', '--only-binary', ':all:', '--python-version', '27', '--implementation', 'cp', '--abi', 'cp27mu', '--platform', 'manylinux1_x86_64']
Collecting apache-beam==2.5.0
Using cached https://files.pythonhosted.org/packages/ff/10/a59ba412f71fb65412ec7a322de6331e19ec8e75ca45eba7a0708daae31a/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
Saved c:users<...>~1appdatalocaltemptmpakq8bsapache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
Successfully downloaded apache-beam
INFO:root:Staging binary distribution of the SDK from PyPI to gs://<...>-1120074505-586000.1542699905.588000/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
INFO:root:Create job: <Job
createTime: u'2018-11-20T07:45:28.050865Z'
currentStateTime: u'1970-01-01T00:00:00Z'
id: u'2018-11-19_23_45_27-14221834310382472741'
location: u'europe-west1'
name: u'beamapp-<...>-1120074505-586000'
projectId: u'poc-cloud-209212'
stageStates:
steps:
tempFiles:
type: TypeValueValuesEnum(JOB_TYPE_STREAMING, 2)>
python google-cloud-dataflow apache-beam shapely.geometry
In local I have this:
from shapely.geometry import Point
<...>
class GeoDataIngestion:
def parse_method(self, string_input):
Place = Point(float(values[2]), float(values[3]))
<...>
I run this, with python 2.7 and all goes well
After that, I try to test it with the dataflow runner but while running I got this error:
NameError: global name 'Point' is not defined
The pipeline:
geo_data = (raw_data
| 'Geo data transform' >> beam.Map(lambda s: geo_ingestion.parse_method(s))
I have read other post and I think this should work, but i'm not sure if there are something special with Google Dataflow in this
I also tried:
import shapely.geometry
<...>
Place = shapely.geometry.Point(float(values[2]), float(values[3]))
With the same result
NameError: global name 'shapely' is not defined
Any idea?
In Google Cloud, If I tried in my virtual enviroment, I can do it without any problem:
(env) ...@cloudshell:~ ()$ python
Python 2.7.13 (default, Sep 26 2018, 18:42:22)
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from shapely.geometry import Point
Var = Point(-5.020751953125, 39.92237576385941)
EXTRA:
Error using requirements.txt
Collecting Shapely==1.6.4.post1 (from -r req.txt (line 2))
Using cached https://files.pythonhosted.org/packages/7d/3c/0f09841db07aabf9cc387662be646f181d07ed196e6f60ce8be5f4a8f0bd/Shapely-1.6.4.post1.tar.gz
Saved c:<...>shapely-1.6.4.post1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "c:<...>temppip-download-kpg5caShapelysetup.py", line 80, in <module>
from shapely._buildcfg import geos_version_string, geos_version,
File "shapely_buildcfg.py", line 200, in <module>
lgeos = CDLL("geos_c.dll")
File "C:Python27Libctypes__init__.py", line 366, in __init__
self._handle = _dlopen(self._name, mode)
WindowsError: [Error 126] No se puede encontrar el m¢dulo especificado
Error using setup.py
Setup.py like this changing:
CUSTOM_COMMANDS = [
['apt-get', 'update'],
['apt-get', '--assume-yes', 'install', 'libgeos-dev'],
['pip', 'install', 'Shapely'],
['echo', 'Custom command worked!']
]
The result is like no packet would be installed, because I get the error from the beginning:
NameError: global name 'Point' is not defined
setup.py file:
from __future__ import absolute_import
from __future__ import print_function
import subprocess
from distutils.command.build import build as _build
import setuptools
class build(_build): # pylint: disable=invalid-name
sub_commands = _build.sub_commands + [('CustomCommands', None)]
CUSTOM_COMMANDS = [
['apt-get', 'update'],
['apt-get', '--assume-yes', 'install', 'libgeos-dev'],
['pip', 'install', 'Shapely']]
class CustomCommands(setuptools.Command):
def initialize_options(self):
pass
def finalize_options(self):
pass
def RunCustomCommand(self, command_list):
print('Running command: %s' % command_list)
p = subprocess.Popen(
command_list,
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# Can use communicate(input='yn'.encode()) if the command run requires
# some confirmation.
stdout_data, _ = p.communicate()
print('Command output: %s' % stdout_data)
if p.returncode != 0:
raise RuntimeError(
'Command %s failed: exit code: %s' % (command_list, p.returncode))
def run(self):
for command in CUSTOM_COMMANDS:
self.RunCustomCommand(command)
REQUIRED_PACKAGES = ['Shapely']
setuptools.setup(
name='dataflow',
version='0.0.1',
description='Dataflow set workflow package.',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
cmdclass={
'build': build,
'CustomCommands': CustomCommands,
}
)
pipeline options:
pipeline_options = PipelineOptions()
pipeline_options.view_as(StandardOptions).streaming = True
pipeline_options.view_as(SetupOptions).save_main_session = True
pipeline_options.view_as(SetupOptions).setup_file = 'C:<...>setup.py'
with beam.Pipeline(options=pipeline_options) as p:
The call:
python -m dataflow --project XXX --temp_location gs://YYY --runner DataflowRunner --region europe-west1 --setup_file C:<...>setup.py
The beginning log: (before dataflow wait for the data)
INFO:root:Defaulting to the temp_location as staging_location: gs://iotbucketdetector/test/prueba
C:Users<...>~1DesktopPROYEC~2envlibsite-packagesapache_beamrunnersdataflowdataflow_runner.py:816: DeprecationWarning: options is deprecated since First stable release.. References to <pipeline>.options will
not be supported
transform_node.inputs[0].pipeline.options.view_as(StandardOptions))
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pipeline.pb...
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pipeline.pb
INFO:root:Executing command: ['C:\Users\<...>~1\Desktop\PROYEC~2\env\Scripts\python.exe', 'setup.py', 'sdist', '--dist-dir', 'c:\users\<...>~1\appdata\local\temp\tmpakq8bs']
running sdist
running egg_info
writing requirements to dataflow.egg-inforequires.txt
writing dataflow.egg-infoPKG-INFO
writing top-level names to dataflow.egg-infotop_level.txt
writing dependency_links to dataflow.egg-infodependency_links.txt
reading manifest file 'dataflow.egg-infoSOURCES.txt'
writing manifest file 'dataflow.egg-infoSOURCES.txt'
warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md
running check
warning: check: missing required meta-data: url
warning: check: missing meta-data: either (author and author_email) or (maintainer and maintainer_email) must be supplied
creating dataflow-0.0.1
creating dataflow-0.0.1dataflow.egg-info
copying files to dataflow-0.0.1...
copying setup.py -> dataflow-0.0.1
copying dataflow.egg-infoPKG-INFO -> dataflow-0.0.1dataflow.egg-info
copying dataflow.egg-infoSOURCES.txt -> dataflow-0.0.1dataflow.egg-info
copying dataflow.egg-infodependency_links.txt -> dataflow-0.0.1dataflow.egg-info
copying dataflow.egg-inforequires.txt -> dataflow-0.0.1dataflow.egg-info
copying dataflow.egg-infotop_level.txt -> dataflow-0.0.1dataflow.egg-info
Writing dataflow-0.0.1setup.cfg
Creating tar archive
removing 'dataflow-0.0.1' (and everything under it)
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/workflow.tar.gz...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/workflow.tar.gz
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pickled_main_session...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/pickled_main_session
INFO:root:Downloading source distribtution of the SDK from PyPi
INFO:root:Executing command: ['C:\Users\<...>~1\Desktop\PROYEC~2\env\Scripts\python.exe', '-m', 'pip', 'download', '--dest', 'c:\users\<...>~1\appdata\local\temp\tmpakq8bs', 'apache-beam==2.5.0', '--no-d
eps', '--no-binary', ':all:']
Collecting apache-beam==2.5.0
Using cached https://files.pythonhosted.org/packages/c6/96/56469c57cb043f36bfdd3786c463fbaeade1e8fcf0593ec7bc7f99e56d38/apache-beam-2.5.0.zip
Saved c:users<...>~1appdatalocaltemptmpakq8bsapache-beam-2.5.0.zip
Successfully downloaded apache-beam
INFO:root:Staging SDK sources from PyPI to gs://<...>-1120074505-586000.1542699905.588000/dataflow_python_sdk.tar
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/dataflow_python_sdk.tar...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/dataflow_python_sdk.tar
INFO:root:Downloading binary distribtution of the SDK from PyPi
INFO:root:Executing command: ['C:\Users\<...>~1\Desktop\PROYEC~2\env\Scripts\python.exe', '-m', 'pip', 'download', '--dest', 'c:\users\<...>~1\appdata\local\temp\tmpakq8bs', 'apache-beam==2.5.0', '--no-d
eps', '--only-binary', ':all:', '--python-version', '27', '--implementation', 'cp', '--abi', 'cp27mu', '--platform', 'manylinux1_x86_64']
Collecting apache-beam==2.5.0
Using cached https://files.pythonhosted.org/packages/ff/10/a59ba412f71fb65412ec7a322de6331e19ec8e75ca45eba7a0708daae31a/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
Saved c:users<...>~1appdatalocaltemptmpakq8bsapache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
Successfully downloaded apache-beam
INFO:root:Staging binary distribution of the SDK from PyPI to gs://<...>-1120074505-586000.1542699905.588000/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
INFO:root:Starting GCS upload to gs://<...>-1120074505-586000.1542699905.588000/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl...
INFO:root:Completed GCS upload to gs://<...>-1120074505-586000.1542699905.588000/apache_beam-2.5.0-cp27-cp27mu-manylinux1_x86_64.whl
INFO:root:Create job: <Job
createTime: u'2018-11-20T07:45:28.050865Z'
currentStateTime: u'1970-01-01T00:00:00Z'
id: u'2018-11-19_23_45_27-14221834310382472741'
location: u'europe-west1'
name: u'beamapp-<...>-1120074505-586000'
projectId: u'poc-cloud-209212'
stageStates:
steps:
tempFiles:
type: TypeValueValuesEnum(JOB_TYPE_STREAMING, 2)>
python google-cloud-dataflow apache-beam shapely.geometry
python google-cloud-dataflow apache-beam shapely.geometry
edited Nov 20 '18 at 8:03
IoT user
asked Nov 16 '18 at 12:21
IoT userIoT user
15912
15912
Are you usingsave_main_session = True
for global context as in this example?
– Guillem Xercavins
Nov 16 '18 at 16:39
Yes:pipeline_options.view_as(SetupOptions).save_main_session = True
andpipeline_options.view_as(StandardOptions).streaming = True
– IoT user
Nov 19 '18 at 7:57
add a comment |
Are you usingsave_main_session = True
for global context as in this example?
– Guillem Xercavins
Nov 16 '18 at 16:39
Yes:pipeline_options.view_as(SetupOptions).save_main_session = True
andpipeline_options.view_as(StandardOptions).streaming = True
– IoT user
Nov 19 '18 at 7:57
Are you using
save_main_session = True
for global context as in this example?– Guillem Xercavins
Nov 16 '18 at 16:39
Are you using
save_main_session = True
for global context as in this example?– Guillem Xercavins
Nov 16 '18 at 16:39
Yes:
pipeline_options.view_as(SetupOptions).save_main_session = True
and pipeline_options.view_as(StandardOptions).streaming = True
– IoT user
Nov 19 '18 at 7:57
Yes:
pipeline_options.view_as(SetupOptions).save_main_session = True
and pipeline_options.view_as(StandardOptions).streaming = True
– IoT user
Nov 19 '18 at 7:57
add a comment |
2 Answers
2
active
oldest
votes
This is because you need to tell dataflow to install package you want.
Briefly documentation is here.
Simply speak, for PyPi package like shapely, you can do the following to ensure all dependencies installed.
pip freeze > requirements.txt
- Remove all unrelated package in
requirements.txt
- Run your pipline with
--requirements_file requirements.txt
Or even more, if you want to do something like install linux package by apt-get
or using your own python module. Take a look on this official example. You need to setup a setup.py
for this and change your pipeline command with
--setup_file setup.py
.
For PyPi module, use the REQUIRED_PACKAGES
in example.
REQUIRED_PACKAGES = [
'numpy','shapely'
]
If you are use pipeline options, then add setup.py as
pipeline_options = {
'project': PROJECT,
'staging_location': 'gs://' + BUCKET + '/staging',
'runner': 'DataflowRunner',
'job_name': 'test',
'temp_location': 'gs://' + BUCKET + '/temp',
'save_main_session': True,
'setup_file':'.setup.py'
}
options = PipelineOptions.from_dictionary(pipeline_options)
p = beam.Pipeline(options=options)
Thanks! I tried with requirements.txt, but I had a problem(added to the main post). So according to This I think I have to uso the setup.py to use apt-get, isn't it? What change should I do in the example? EditCUSTOM_COMMANDS = [ ['echo', 'Custom command worked!']]
?
– IoT user
Nov 19 '18 at 10:18
That's strange, though I use theREQUIRED_PACKAGES
(as edited), I can manage to install requiredpip
package, have you addsetup.py
into pipeline option?
– MatrixTai
Nov 20 '18 at 3:30
Quite strange, because even if I add that PipelineOptions with setup_file is like what I wrote in setup.py has no effect. I attach more information to see what could be wrong in what i'm doing
– IoT user
Nov 20 '18 at 8:05
@IoTuser, I will make a test case forshapely
.
– MatrixTai
Nov 21 '18 at 1:28
I will wait for your result and hope you can make it! Thanks!
– IoT user
Nov 21 '18 at 7:39
|
show 1 more comment
Import inside the function + setup.py:
class GeoDataIngestion:
def parse_method(self, string_input):
from shapely.geometry import Point
place = Point(float(values[2]), float(values[3]))
setup.py with:
REQUIRED_PACKAGES = ['shapely']
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53337828%2fgoogle-dataflow-global-name-is-not-defined-apache-beam%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
This is because you need to tell dataflow to install package you want.
Briefly documentation is here.
Simply speak, for PyPi package like shapely, you can do the following to ensure all dependencies installed.
pip freeze > requirements.txt
- Remove all unrelated package in
requirements.txt
- Run your pipline with
--requirements_file requirements.txt
Or even more, if you want to do something like install linux package by apt-get
or using your own python module. Take a look on this official example. You need to setup a setup.py
for this and change your pipeline command with
--setup_file setup.py
.
For PyPi module, use the REQUIRED_PACKAGES
in example.
REQUIRED_PACKAGES = [
'numpy','shapely'
]
If you are use pipeline options, then add setup.py as
pipeline_options = {
'project': PROJECT,
'staging_location': 'gs://' + BUCKET + '/staging',
'runner': 'DataflowRunner',
'job_name': 'test',
'temp_location': 'gs://' + BUCKET + '/temp',
'save_main_session': True,
'setup_file':'.setup.py'
}
options = PipelineOptions.from_dictionary(pipeline_options)
p = beam.Pipeline(options=options)
Thanks! I tried with requirements.txt, but I had a problem(added to the main post). So according to This I think I have to uso the setup.py to use apt-get, isn't it? What change should I do in the example? EditCUSTOM_COMMANDS = [ ['echo', 'Custom command worked!']]
?
– IoT user
Nov 19 '18 at 10:18
That's strange, though I use theREQUIRED_PACKAGES
(as edited), I can manage to install requiredpip
package, have you addsetup.py
into pipeline option?
– MatrixTai
Nov 20 '18 at 3:30
Quite strange, because even if I add that PipelineOptions with setup_file is like what I wrote in setup.py has no effect. I attach more information to see what could be wrong in what i'm doing
– IoT user
Nov 20 '18 at 8:05
@IoTuser, I will make a test case forshapely
.
– MatrixTai
Nov 21 '18 at 1:28
I will wait for your result and hope you can make it! Thanks!
– IoT user
Nov 21 '18 at 7:39
|
show 1 more comment
This is because you need to tell dataflow to install package you want.
Briefly documentation is here.
Simply speak, for PyPi package like shapely, you can do the following to ensure all dependencies installed.
pip freeze > requirements.txt
- Remove all unrelated package in
requirements.txt
- Run your pipline with
--requirements_file requirements.txt
Or even more, if you want to do something like install linux package by apt-get
or using your own python module. Take a look on this official example. You need to setup a setup.py
for this and change your pipeline command with
--setup_file setup.py
.
For PyPi module, use the REQUIRED_PACKAGES
in example.
REQUIRED_PACKAGES = [
'numpy','shapely'
]
If you are use pipeline options, then add setup.py as
pipeline_options = {
'project': PROJECT,
'staging_location': 'gs://' + BUCKET + '/staging',
'runner': 'DataflowRunner',
'job_name': 'test',
'temp_location': 'gs://' + BUCKET + '/temp',
'save_main_session': True,
'setup_file':'.setup.py'
}
options = PipelineOptions.from_dictionary(pipeline_options)
p = beam.Pipeline(options=options)
Thanks! I tried with requirements.txt, but I had a problem(added to the main post). So according to This I think I have to uso the setup.py to use apt-get, isn't it? What change should I do in the example? EditCUSTOM_COMMANDS = [ ['echo', 'Custom command worked!']]
?
– IoT user
Nov 19 '18 at 10:18
That's strange, though I use theREQUIRED_PACKAGES
(as edited), I can manage to install requiredpip
package, have you addsetup.py
into pipeline option?
– MatrixTai
Nov 20 '18 at 3:30
Quite strange, because even if I add that PipelineOptions with setup_file is like what I wrote in setup.py has no effect. I attach more information to see what could be wrong in what i'm doing
– IoT user
Nov 20 '18 at 8:05
@IoTuser, I will make a test case forshapely
.
– MatrixTai
Nov 21 '18 at 1:28
I will wait for your result and hope you can make it! Thanks!
– IoT user
Nov 21 '18 at 7:39
|
show 1 more comment
This is because you need to tell dataflow to install package you want.
Briefly documentation is here.
Simply speak, for PyPi package like shapely, you can do the following to ensure all dependencies installed.
pip freeze > requirements.txt
- Remove all unrelated package in
requirements.txt
- Run your pipline with
--requirements_file requirements.txt
Or even more, if you want to do something like install linux package by apt-get
or using your own python module. Take a look on this official example. You need to setup a setup.py
for this and change your pipeline command with
--setup_file setup.py
.
For PyPi module, use the REQUIRED_PACKAGES
in example.
REQUIRED_PACKAGES = [
'numpy','shapely'
]
If you are use pipeline options, then add setup.py as
pipeline_options = {
'project': PROJECT,
'staging_location': 'gs://' + BUCKET + '/staging',
'runner': 'DataflowRunner',
'job_name': 'test',
'temp_location': 'gs://' + BUCKET + '/temp',
'save_main_session': True,
'setup_file':'.setup.py'
}
options = PipelineOptions.from_dictionary(pipeline_options)
p = beam.Pipeline(options=options)
This is because you need to tell dataflow to install package you want.
Briefly documentation is here.
Simply speak, for PyPi package like shapely, you can do the following to ensure all dependencies installed.
pip freeze > requirements.txt
- Remove all unrelated package in
requirements.txt
- Run your pipline with
--requirements_file requirements.txt
Or even more, if you want to do something like install linux package by apt-get
or using your own python module. Take a look on this official example. You need to setup a setup.py
for this and change your pipeline command with
--setup_file setup.py
.
For PyPi module, use the REQUIRED_PACKAGES
in example.
REQUIRED_PACKAGES = [
'numpy','shapely'
]
If you are use pipeline options, then add setup.py as
pipeline_options = {
'project': PROJECT,
'staging_location': 'gs://' + BUCKET + '/staging',
'runner': 'DataflowRunner',
'job_name': 'test',
'temp_location': 'gs://' + BUCKET + '/temp',
'save_main_session': True,
'setup_file':'.setup.py'
}
options = PipelineOptions.from_dictionary(pipeline_options)
p = beam.Pipeline(options=options)
edited Nov 20 '18 at 3:32
answered Nov 19 '18 at 9:21
MatrixTaiMatrixTai
2,0341623
2,0341623
Thanks! I tried with requirements.txt, but I had a problem(added to the main post). So according to This I think I have to uso the setup.py to use apt-get, isn't it? What change should I do in the example? EditCUSTOM_COMMANDS = [ ['echo', 'Custom command worked!']]
?
– IoT user
Nov 19 '18 at 10:18
That's strange, though I use theREQUIRED_PACKAGES
(as edited), I can manage to install requiredpip
package, have you addsetup.py
into pipeline option?
– MatrixTai
Nov 20 '18 at 3:30
Quite strange, because even if I add that PipelineOptions with setup_file is like what I wrote in setup.py has no effect. I attach more information to see what could be wrong in what i'm doing
– IoT user
Nov 20 '18 at 8:05
@IoTuser, I will make a test case forshapely
.
– MatrixTai
Nov 21 '18 at 1:28
I will wait for your result and hope you can make it! Thanks!
– IoT user
Nov 21 '18 at 7:39
|
show 1 more comment
Thanks! I tried with requirements.txt, but I had a problem(added to the main post). So according to This I think I have to uso the setup.py to use apt-get, isn't it? What change should I do in the example? EditCUSTOM_COMMANDS = [ ['echo', 'Custom command worked!']]
?
– IoT user
Nov 19 '18 at 10:18
That's strange, though I use theREQUIRED_PACKAGES
(as edited), I can manage to install requiredpip
package, have you addsetup.py
into pipeline option?
– MatrixTai
Nov 20 '18 at 3:30
Quite strange, because even if I add that PipelineOptions with setup_file is like what I wrote in setup.py has no effect. I attach more information to see what could be wrong in what i'm doing
– IoT user
Nov 20 '18 at 8:05
@IoTuser, I will make a test case forshapely
.
– MatrixTai
Nov 21 '18 at 1:28
I will wait for your result and hope you can make it! Thanks!
– IoT user
Nov 21 '18 at 7:39
Thanks! I tried with requirements.txt, but I had a problem(added to the main post). So according to This I think I have to uso the setup.py to use apt-get, isn't it? What change should I do in the example? Edit
CUSTOM_COMMANDS = [ ['echo', 'Custom command worked!']]
?– IoT user
Nov 19 '18 at 10:18
Thanks! I tried with requirements.txt, but I had a problem(added to the main post). So according to This I think I have to uso the setup.py to use apt-get, isn't it? What change should I do in the example? Edit
CUSTOM_COMMANDS = [ ['echo', 'Custom command worked!']]
?– IoT user
Nov 19 '18 at 10:18
That's strange, though I use the
REQUIRED_PACKAGES
(as edited), I can manage to install required pip
package, have you add setup.py
into pipeline option?– MatrixTai
Nov 20 '18 at 3:30
That's strange, though I use the
REQUIRED_PACKAGES
(as edited), I can manage to install required pip
package, have you add setup.py
into pipeline option?– MatrixTai
Nov 20 '18 at 3:30
Quite strange, because even if I add that PipelineOptions with setup_file is like what I wrote in setup.py has no effect. I attach more information to see what could be wrong in what i'm doing
– IoT user
Nov 20 '18 at 8:05
Quite strange, because even if I add that PipelineOptions with setup_file is like what I wrote in setup.py has no effect. I attach more information to see what could be wrong in what i'm doing
– IoT user
Nov 20 '18 at 8:05
@IoTuser, I will make a test case for
shapely
.– MatrixTai
Nov 21 '18 at 1:28
@IoTuser, I will make a test case for
shapely
.– MatrixTai
Nov 21 '18 at 1:28
I will wait for your result and hope you can make it! Thanks!
– IoT user
Nov 21 '18 at 7:39
I will wait for your result and hope you can make it! Thanks!
– IoT user
Nov 21 '18 at 7:39
|
show 1 more comment
Import inside the function + setup.py:
class GeoDataIngestion:
def parse_method(self, string_input):
from shapely.geometry import Point
place = Point(float(values[2]), float(values[3]))
setup.py with:
REQUIRED_PACKAGES = ['shapely']
add a comment |
Import inside the function + setup.py:
class GeoDataIngestion:
def parse_method(self, string_input):
from shapely.geometry import Point
place = Point(float(values[2]), float(values[3]))
setup.py with:
REQUIRED_PACKAGES = ['shapely']
add a comment |
Import inside the function + setup.py:
class GeoDataIngestion:
def parse_method(self, string_input):
from shapely.geometry import Point
place = Point(float(values[2]), float(values[3]))
setup.py with:
REQUIRED_PACKAGES = ['shapely']
Import inside the function + setup.py:
class GeoDataIngestion:
def parse_method(self, string_input):
from shapely.geometry import Point
place = Point(float(values[2]), float(values[3]))
setup.py with:
REQUIRED_PACKAGES = ['shapely']
answered Nov 23 '18 at 9:12
IoT userIoT user
15912
15912
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53337828%2fgoogle-dataflow-global-name-is-not-defined-apache-beam%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Are you using
save_main_session = True
for global context as in this example?– Guillem Xercavins
Nov 16 '18 at 16:39
Yes:
pipeline_options.view_as(SetupOptions).save_main_session = True
andpipeline_options.view_as(StandardOptions).streaming = True
– IoT user
Nov 19 '18 at 7:57