Skip to content

SPI MPFS add second CS for polarfire SoC to use Fabric chip select#8

Open
RossWilliamson wants to merge 1 commit into
linux4microchip:linux-6.12-mchpfrom
RossWilliamson:linux-6.12-mchp
Open

SPI MPFS add second CS for polarfire SoC to use Fabric chip select#8
RossWilliamson wants to merge 1 commit into
linux4microchip:linux-6.12-mchpfrom
RossWilliamson:linux-6.12-mchp

Conversation

@RossWilliamson

Copy link
Copy Markdown

The current hard core SPI driver only allows a single CS line. When you want to pass the CS line through the fabric then that line is actually CS 1 not CS0. This change allows you to use the hard core SPI driver with a fabric CS line with the required change to the device tree to use the second spi CS line.

Tested on polarfire SoC only

@RossWilliamson RossWilliamson changed the title Updated to add second CS for polarfire SoC to use Fabric chip select SPI MPFS add second CS for polarfire SoC to use Fabric chip select Feb 24, 2026
@ConchuOD

ConchuOD commented Mar 3, 2026

Copy link
Copy Markdown
Member

How is your MSS configurator set up? Do you have "Data I/Os..." set to an MSS I/O Bank, and "Target Select 1" set to "Fabric I/O"?

@ConchuOD ConchuOD self-assigned this Mar 3, 2026
@RossWilliamson

RossWilliamson commented Mar 3, 2026

Copy link
Copy Markdown
Author

How is your MSS configurator set up? Do you have "Data I/Os..." set to an MSS I/O Bank, and "Target Select 1" set to "Fabric I/O"?

It's setup as below with all the lines directed through the fabric
image

Also note that in the smartdesign block only SS1 is passed out of the MSS, there is no SS0. Maybe this is a MSS block bug but it would be great to have 2 chip selects availble

image

@ConchuOD

ConchuOD commented Mar 3, 2026

Copy link
Copy Markdown
Member

When you say "tested on polarfire soc", what did you actually test with? Do you have a pseudo devicetree that you can share with me (you can of course just hide compatibles or anything like that if you cannot share that info, I just need to see the nodes and their reg properties)?

@ConchuOD

ConchuOD commented Mar 3, 2026

Copy link
Copy Markdown
Member

How these chip selects came up internally about two years ago, so I went to find the bug report that I got.

Unfortunately, the submitter never really followed up with the ticket, so I could never confirm that my findings solved their problem, but the conclusion pretty much was that there is actually only a single chip select (so register bit 0) and that when you set the first dropdown to Fabric I/Os, it gives you the impression that you're routing chip select 0 to the fabric, but actually it means that what it calls "Target Select 0" gets routed nowhere. What it calls "Target Select 1" is routed to a Fabric I/O, but is controlled by register bit 0, so you need to use reg = <0> in your devicetree despite what your screenshot would imply.

I think putting Fabric I/O in the first dropdown and Unused in the third produces no viable chip select - this you can probably confirm fairly easily from the smartdesign.

I do not recall if I tested having an MSS I/O in the first dropdown and Fabric I/O in the third one, but I was told that this does not actually produce two independant chip selects. This is why the number of chip selects is limited to 1. I originally interpreted your comment about "Tested on polarfire SoC only" as you hooked up two devices, with MSS I/O in the first dropdown and Fabric I/O in the third, with a dts like:

&spi0 {
  dev1@0 {
    ret = <0>;
  };
  dev2@1 {
    ret = <1>;
  };
};

and were able to access both devices. This could work, even though I was told that it doesn't (and the MSS Configurator warns if you try this mixture, although not about the chip select just about splitting I/O banks). As your screenshot shows, there's only one port marked "SS" per controller in the smartdesign, and you had both dropdowns set to Fabric I/O so this couldn't have been what you did.

it would be great to have 2 chip selects availble

You're using Fabric I/O, so you should be able to route a GPIO to that pin and use GPIO CS?

@RossWilliamson

RossWilliamson commented Mar 4, 2026

Copy link
Copy Markdown
Author

So for my use-case the I needed both the options set to fabric. I can double check but I think you are right in that If you don't set the target select 1 to fabric you don't actually get a SS out of the MSS block. I was not trying to talk to two devices with two CS just one device with one CS.

Logbook

  • The SPI was connected up to actual hardware on a board (ADC).
  • I checked the testpin that the fabric SS1 was constrained to and there was no activity.
  • I then noticed the MSS block stated SS1 (not SS0) so changed the device tree (see below) to reg <1> not reg <0> which then gave me the error regarding only one CS allowed (due to the kernel driver).
  • I hacked the driver to allow 2 chip selects and then with reg<1> I saw the correct activity on the CS line

Device Tree snippit

&spi1 {
	status = "okay";
	spidev1: spidev@1 {
		status = "okay";
		compatible = "linux,spidev", "rohm,bh2228fv";
		spi-max-frequency = <500000>;
		reg = <1>;
     };
};

Oh and yeah I forgot you can use GPIOs as the CS so thanks for reminder me

@ConchuOD

ConchuOD commented Mar 4, 2026

Copy link
Copy Markdown
Member

I hacked the driver to allow 2 chip selects and then with reg<1> I saw the correct activity on the CS line

Right, I'll have to reexamine this then. I must have made a mistake in my testing. This makes me think that doing the split bank thing I mentioned gives you two different chip selects.

FWIW, I think your driver hack isn't needed, you can just set num-cs to 2 in your dts for the same result. It'll produce a dtbs_check warning though. Dunno if that makes any difference to you though.

I'm not sure what the "correct" fix for this is, that still tries to produce warnings if people do things that are invalid.

@ConchuOD

Copy link
Copy Markdown
Member

@RossWilliamson I've done some more digging into this and what you're seeing still doesn't make sense to me.

I tried the configuration that you showed in your screenshot above (although I noticed that you showed a screenshot of spi0 but a dts snippet from spi1), and confirmed that bit 0 in the SLAVESELECT register controlled the chip select, just as it does when spi0 is routed to mssio and I use the chip select available on an mssio.

I did also try the "mixed" configuration, with the setting:
2026-03-23T112142Z
With this configuration, bit 0 in the register controlled the mssio chip select and bit 1 the one available at the fabric.
Something like this could explain what you are seeing, but the iomux configuration required to do this routes sclk/miso/mosi to mssios instead of the fabric. Your screenshot doesn't show this kind of configuration, and I think you would be able to notice if your fabric !chip-select spi signal stopped working.

Do you by any chance have spi1 configured in this mixed way?
Could you send me the .cfg output file from the mss configurator for me to take a look

@RossWilliamson

Copy link
Copy Markdown
Author

Let me go back and check to confirm. Note both my SPI are routed out to the fabric and both options are set to fabric for the data I/O and target select 0 as well as target select 1.

lranders pushed a commit to lranders/linux4microchip-linux that referenced this pull request Apr 27, 2026
commit 1ac22c8 upstream.

This leak will cause a hang when tearing down the SCSI host. For example,
iscsid hangs with the following call trace:

[130120.652718] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured

PID: 2528     TASK: ffff9d0408974e00  CPU: 3    COMMAND: "iscsid"
 #0 [ffffb5b9c134b9e0] __schedule at ffffffff860657d4
 linux4microchip#1 [ffffb5b9c134ba28] schedule at ffffffff86065c6f
 linux4microchip#2 [ffffb5b9c134ba40] schedule_timeout at ffffffff86069fb0
 linux4microchip#3 [ffffb5b9c134bab0] __wait_for_common at ffffffff8606674f
 linux4microchip#4 [ffffb5b9c134bb10] scsi_remove_host at ffffffff85bfe84b
 linux4microchip#5 [ffffb5b9c134bb30] iscsi_sw_tcp_session_destroy at ffffffffc03031c4 [iscsi_tcp]
 linux4microchip#6 [ffffb5b9c134bb48] iscsi_if_recv_msg at ffffffffc0292692 [scsi_transport_iscsi]
 linux4microchip#7 [ffffb5b9c134bb98] iscsi_if_rx at ffffffffc02929c2 [scsi_transport_iscsi]
 linux4microchip#8 [ffffb5b9c134bbf0] netlink_unicast at ffffffff85e551d6
 #9 [ffffb5b9c134bc38] netlink_sendmsg at ffffffff85e554ef

Fixes: 8fe4ce5 ("scsi: core: Fix a use-after-free")
Cc: stable@vger.kernel.org
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20260223232728.93350-1-junxiao.bi@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cristibirsan pushed a commit that referenced this pull request Jun 12, 2026
[ Upstream commit 54ef024 ]

ice_reset_all_vfs() ignores the return value of ice_vf_rebuild_vsi().
When the VSI rebuild fails (e.g. during NVM firmware update via
nvmupdate64e), ice_vsi_rebuild() tears down the VSI on its error path,
leaving txq_map and rxq_map as NULL. The subsequent unconditional call
to ice_vf_post_vsi_rebuild() leads to a NULL pointer dereference in
ice_ena_vf_q_mappings() when it accesses vsi->txq_map[0].

The single-VF reset path in ice_reset_vf() already handles this
correctly by checking the return value of ice_vf_reconfig_vsi() and
skipping ice_vf_post_vsi_rebuild() on failure.

Apply the same pattern to ice_reset_all_vfs(): check the return value
of ice_vf_rebuild_vsi() and skip ice_vf_post_vsi_rebuild() and
ice_eswitch_attach_vf() on failure. The VF is left safely disabled
(ICE_VF_STATE_INIT not set, VFGEN_RSTAT not set to VFACTIVE) and can
be recovered via a VFLR triggered by a PCI reset of the VF
(sysfs reset or driver rebind).

Note that this patch does not prevent the VF VSI rebuild from failing
during NVM update — the underlying cause is firmware being in a
transitional state while the EMP reset is processed, which can cause
Admin Queue commands (ice_add_vsi, ice_cfg_vsi_lan) to fail. This
patch only prevents the subsequent NULL pointer dereference that
crashes the kernel when the rebuild does fail.

 crash> bt
     PID: 50795    TASK: ff34c9ee708dc680  CPU: 1    COMMAND: "kworker/u512:5"
      #0 [ff72159bcfe5bb50] machine_kexec at ffffffffaa8850ee
      #1 [ff72159bcfe5bba8] __crash_kexec at ffffffffaaa15fba
      #2 [ff72159bcfe5bc68] crash_kexec at ffffffffaaa16540
      #3 [ff72159bcfe5bc70] oops_end at ffffffffaa837eda
      #4 [ff72159bcfe5bc90] page_fault_oops at ffffffffaa893997
      #5 [ff72159bcfe5bce8] exc_page_fault at ffffffffab528595
      #6 [ff72159bcfe5bd10] asm_exc_page_fault at ffffffffab600bb2
         [exception RIP: ice_ena_vf_q_mappings+0x79]
         RIP: ffffffffc0a85b29  RSP: ff72159bcfe5bdc8  RFLAGS: 00010206
         RAX: 00000000000f0000  RBX: ff34c9efc9c00000  RCX: 0000000000000000
         RDX: 0000000000000000  RSI: 0000000000000010  RDI: ff34c9efc9c00000
         RBP: ff34c9efc27d4828   R8: 0000000000000093   R9: 0000000000000040
         R10: ff34c9efc27d4828  R11: 0000000000000040  R12: 0000000000100000
         R13: 0000000000000010  R14:   R15:
         ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
      #7 [ff72159bcfe5bdf8] ice_sriov_post_vsi_rebuild at ffffffffc0a85e2e [ice]
      #8 [ff72159bcfe5be08] ice_reset_all_vfs at ffffffffc0a920b4 [ice]
      #9 [ff72159bcfe5be48] ice_service_task at ffffffffc0a31519 [ice]
     #10 [ff72159bcfe5be88] process_one_work at ffffffffaa93dca4
     #11 [ff72159bcfe5bec8] worker_thread at ffffffffaa93e9de
     #12 [ff72159bcfe5bf18] kthread at ffffffffaa946663
     #13 [ff72159bcfe5bf50] ret_from_fork at ffffffffaa8086b9

 The panic occurs attempting to dereference the NULL pointer in RDX at
 ice_sriov.c:294, which loads vsi->txq_map (offset 0x4b8 in ice_vsi).

 The faulting VSI is an allocated slab object but not fully initialized
 after a failed ice_vsi_rebuild():

  crash> struct ice_vsi 0xff34c9efc27d4828
    netdev = 0x0,
    rx_rings = 0x0,
    tx_rings = 0x0,
    q_vectors = 0x0,
    txq_map = 0x0,
    rxq_map = 0x0,
    alloc_txq = 0x10,
    num_txq = 0x10,
    alloc_rxq = 0x10,
    num_rxq = 0x10,

 The nvmupdate64e process was performing NVM firmware update:

  crash> bt 0xff34c9edd1a30000
  PID: 49858    TASK: ff34c9edd1a30000  CPU: 1    COMMAND: "nvmupdate64e"
   #0 [ff72159bcd617618] __schedule at ffffffffab5333f8
   #4 [ff72159bcd617750] ice_sq_send_cmd at ffffffffc0a35347 [ice]
   #5 [ff72159bcd6177a8] ice_sq_send_cmd_retry at ffffffffc0a35b47 [ice]
   #6 [ff72159bcd617810] ice_aq_send_cmd at ffffffffc0a38018 [ice]
   #7 [ff72159bcd617848] ice_aq_read_nvm at ffffffffc0a40254 [ice]
   #8 [ff72159bcd6178b8] ice_read_flat_nvm at ffffffffc0a4034c [ice]
   #9 [ff72159bcd617918] ice_devlink_nvm_snapshot at ffffffffc0a6ffa5 [ice]

 dmesg:
  ice 0000:13:00.0: firmware recommends not updating fw.mgmt, as it
    may result in a downgrade. continuing anyways
  ice 0000:13:00.1: ice_init_nvm failed -5
  ice 0000:13:00.1: Rebuild failed, unload and reload driver

Fixes: 12bb018 ("ice: Refactor VF reset")
Signed-off-by: Petr Oros <poros@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-5-cdcb48303fd8@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants